Elatable : Bradley Horowitz: user-generated content

Showing posts with label user-generated content. Show all posts

Sunday, December 2, 2007

Me v. Ze Frank (not so much…)

Gordon Luk has a really interesting post that I'll use as a launching pad to clarify a point I often make in public lectures... In the interest of saving you a click, see below.
This reminded me of Umair's article "Why Yahoo Didn't Build MySpace..." which basically suggests that the pyramid of participation I reference is a Yahoo "strategy." Nothing could be further from the truth. Destroying that pyramid is our strategy. The pyramid is more of a forensic, backward-looking empirical observation. The very next slide in the deck is also shown below.

Lesson: Of course, I take full responsibility for these misunderstandings. Gordon and Umair are brilliant guys. So as I'm dishing out soundbites, maybe I need to slow down and make sure that I'm clearer...

Gordon says:

Do you ever have posts sitting around in wordpress for months at a time, delayed for one reason or another? This is one of them, and after re-reading it, I think I’ll go ahead and post it, but remember that it’s kind of a warp back in time to October 2006.

Yahoo! Open Hack Day was a massive, massive success, and i’m glad to have been a part of it. Now that i’ve had a few days to rest and reflect upon my experiences, I want to discuss an observation of Bradley Horowitz’s that has stuck in my mind.

Bradley’s one of the foremost advocates for social search development here at Yahoo. He’s one of the brightest minds around, and always makes my head spin a little bit when I talk with him. You can check out his Keynote presentation here (warning, this was 4GB to download!). Around the end of minute five, Bradley says some really interesting stuff. First, he showed the famous grainy video clip of a monkey trained to perform martial arts kicks in the context of what the worst-case scenario behind user-filtered content could produce. Then he went on to show some beautiful photographs from Flickr’s Interestingness, as a way to demonstrate the better side of what can be efficiently extracted from collaborative participation. His point that these photos bubbled to the top because of implicit user activity is key; as he mentions, the aggregate human cost of photo moderation borne by the user community on Flickr dwarfs anything possible by simply paying employees to review and rate them.

Ze Frank, seen in this video speaking at TED, a design conference, seems to also think hard about the new culture of participation on the Internet. Ze often invites his viewership to participate with him on various flights of fancy, including making silly faces, creating short video clips, playing with flash toys and drawing tools, etc. During his TED presentation, and also at various times on The Show, Ze talked about the hold that various groups have on the perception of art, and how many people are able to participate and create in a new culture without being ostracized by an established hierarchy. He seems to hold that the “ugliness” which seems to permeate MySpace is, in fact, a manifestation of participation outside of the boundaries of hierarchical editorial control. Thus, his position seems to be that the silliness and ugliness of the huge amount of web “design” on myspace depends heavily on perspective. At the minimum, he seemed to believe that participation culture removes barriers to experimentation that could lead to an overthrow of traditional design aesthetics.

These perspectives seem to be at odds. On one side, Bradley appears to be advocating the harvesting of social participation to come to results that select traditionally valuable content. In other words, using New Media platforms to efficiently perform the job of the Old Media publishing empires (Kung Fu Monkeys should be buried!). On the other side is Ze, who seems to be advocating not only a disruption of Old Media distribution through mass publication, but also seems to be leading a charge to disrupt traditional aesthetic values (Kung Fu Monkeys are beautiful, and should be encouraged!).

I think it’s an interesting contrast, and I worry that i’m mischaracterizing the arguments of each.

My personal viewpoint is a bit more nuanced. I believe that one day, web platforms will also be able to efficiently cluster their users based upon interests or tastes, similar to how Flickr can cluster tags to disambiguate meaning. These clusters will probably be designed not around user surveys or self-reported demographics, but instead will most likely be extracted through efficient methods of recording implicit participation information over the long term. There may well be a cluster (which I would belong to!) of folks that do enjoy Kung Fu monkeys, and there is almost definitely a cluster that find it degrading and offensive. The difference here between traditional preference filtering and clustered audiences is similar - one requires a great deal of potentially inaccurate user feedback about their preferences, whereas the latter acts more on implicit activity, and is thus more likely to produce the desired effects.

Not only would such a model be able to try and target clusters of preferences among users, but it would also allow for users to participate in cultures in which they feel welcome from the beginning.

I responded:

My argument is not so much that Kung Fu monkeys = bad, or that they should be “buried.” But in a world where “anyone can say anything to everyone at once”, our most precious commodity becomes attention. I remember sitting at the Harvard Cyberposium Conference a few years ago when someone said… “It’s getting to the point where every moment of our life can now be digital recorded and preserved for posterity…. [pregnant pause…] Unfortunately, one doesn’t get a second life with which to review the first one.”

Coming up with the right tools to help me get to what matters to me becomes essential. But I don’t want to get prescriptive - what matters to the fans of Kung Fu monkeys is… Kung Fu monkeys! And we should be providing tools that help that community as much as any other…

Another way of putting it… I’m disinclined to subscribe the a Flickr feed for the tag “baby”. Just not interested in seeing random babies, thank you very much. But my brother’s baby? My neice? Cutest baby ever! I want to see every picture of her that exists!

Death to the monoculture and long live the long tail! Long live low-brow humor, stupid pet tricks and mentos and diet coke! And Ze Frank…

My point is that tools like Flickr interestingness allow us to leverage aggregate attention for the benefit of each user. I love interestingness, and use it as a sort criterion for just about every search I do on Flickr… But Flickr also uses a social graph with varying coefficients (me, family, friends, contacts, public) to provide another dimension that helps direct my attention to the right babies. ;-)

I think my thesis is simply that in democratizing the creation of content, we’ve created a high-class problem… There’s too much “on”… 500 channels, maybe. 500M channels? Never. The flip side of this wonderful revolution in publishing, destroying the hierarchical pyramid of participation, is that we (our industry) have a burden to provide people the means of actually getting to the content they want to see… (Perhaps sometimes, even before they know they want to see it.) This ought to keep us busy for a lifetime or so…

I think you captured my view pretty much in your closing paragraph. I’d guess Ze Frank agrees with us mostly too.

Friday, November 17, 2006

Yahoo buys Bix

I said it all over here.

So one thing to note... It turns out that the voice in my head does sound a lot better than the one that is being recorded. Something must be defective with the microphone, the Bix system, or whatever that keeps knocking it out of tune. Until I debug that system, I won't be posting any karaoke.

Also - we need to get some cooler songs into the Bix karaoke system.

Sunday, September 10, 2006

Cool Flickr Geotagging Examples

Stewart recently showed me some very cool (and in some cases surprising) Flickr geotagging examples. Here's a few I loved.

Where is the neighborhood in Manhattan known as Tribeca?

Get your kicks, on Route 66

Food tour of Asia

What I love about the "tribeca" and "route 66" examples is that they show emergent knowledge in the system. Collectively, the efforts of many photographers map out a geographic element... Neat.

Monday, August 14, 2006

Interesting(ness) post from O’Reilly

Chad told me that Tim O'Reilly posted about interestingness today. I've been contemplating another post about interestingness for a while, and I was glad Tim beat me to the punch! Some of this I hope to discuss at the Adaptive Path UX conference Wednesday too.

I've been starting out most talks that I've given lately by showing two examples of "user-generated content" back-to-back. First I show the numa-numa kid:

Then I say something like, "As amusing as this is... does anyone else find this kinda depressing? If stupid human tricks, pratfalls, fratboy pranks and skateboarding dogs are the future of media... let me off the bus!"

Then I say, "But fear not. This is also 'user-generated content'":

pandaTwins
Originally uploaded by Sevenof9FL

Originally uploaded by Caleroalvero

dress
Originally uploaded by anetbat

And I fire up a slideshow of the 100 most interesting photos on Flickr. It's hard to describe the unfailing impact that these photos have... they are alternately moving, funny, disturbing, provocative... I go on, "What's cool about these is that they are not only user-generated... They are also implicitly 'user-discovered'... It's not as if I spent a couple hours finding the 'good stuff' myself. The Flickr interestingness metric percolated the 'cream' to the top of the pile. By 'implicitly' I mean that there's no explicit 'rating system'. [I talk more about the value of implicit v. explicit means of deriving value here...] To be clear, Flickr is filled with plenty of junk. In fact, we like it that way. There's not just a low barrier to entry, there's virtually no barrier to entry. Got a camera? Bam! You're a 'photographer!'"

"So Flickr is a system that accommodates taking a 'worthless' picture of a hangnail, or a breathtaking Ansel Adams-like landscape. The cool thing is that while creating a frictionless environment that serves both scenarios, we can also determine which of the two is likely more 'interesting' to the community at large."

The ability to seperate wheat from chaff, or more accurately personally interesting from collectively interesting, is subtle but huge. And it does so without the use of link flux (i.e. PageRank) but rather uses 'in system' heuristics.

Usually after invoking the Flickr example, I transition to Y! Answers. If there's a complaint I hear about Y! Answers is that there's a lot of noise in the system. Admittedly, "Umm.. my boyfriend caught me sleeping with one of his best friends?", or "Why is the sky blue?", or "What's up?" do not necessarily resonate with the "expand all human knowledge" meme. But what's cool is that we can create a system that accommodates everything from the ridiculous to the sublime... but knows the difference between the two! (Or perhaps more accurately is taught the difference by millions of users.) This is the power of interestingness!

At this point I usually drop in a dry remark, "At Yahoo we have spent a fair amount of time and energy focusing on systems that are noisy, where anyone can say anything at anytime, etc. One of the most popular datasets and testbeds for these kinds of conditions is popularly known as... [prepare for punchline] the web... and we've been working on it for about a decade..." ;-)

I'm not sure why this post took on the flavor of a running commentary on my own talk, but that's how it came out!

I want to also remind folks that my relationship to the products I often invoke in this blog is best characterized as awed bystander. All hail Serguei, Yumio, Stewart, Tomi, etc!

Tuesday, June 27, 2006

Searching for what doesn’t exist…

As an industry, we've made a ton of progress in search over the last several years. Yet there is a subtle but profound limitation to "web search" as currently realized: search engines can only return results that... well... you know... exist.

At a glance this doesn't seem to be much of a hindrance. It's obvious, expected, rational. I've heard (a most excellent and engaging) schpiel from Google (Craig Silverstein) that acknowledges that their current search index captures only a fraction of the information that's "out there." The punchline of Craig's talk was that they'd only indexed a tiny fraction of what's possible - hence the efforts to digitize, crawl the "dark web", extend to other media types, etc. The spirit of the talk was indeed inspirational, in the vein of "we're just getting started..."

But the very comment that we're only x% "done" implies that there is some finite body of knowledge out there, and if we could only digitize faster, crawl harder, buy more servers, etc. then we'd be able to improve that percentage and ultimately get "all" that information into the index (and presumably sleep well at night again.)

Noble as this goal may be, if you pause to think about it, it's obvious (to me anyway) that humankind's "potential knowledge" is greater than our "realized knowledge" to date. This is admittedly "cosmic" or metaphysical, but I mean this in a practical sense as well. Barring apocalyptic scenarios, there are more web pages yet to be written than have already been written. (For the sake of discussion, let's use "web page" as proxy for discrete knowledge element while confessing that we've already moved beyond the "page" as a paradigm.)

Where am I going with this? Perhaps not surprisingly, Yahoo! Answers.

Some of the magic of Yahoo! Answers is revealed through examining its provenance. The category of knowledge search sprang up in Korea. In Korea exists what is arguably the world's most sophisticated online population... but they are disadvantaged by the lack of Korean language documents (relative to English language.) Didn't matter how hard we crawled, how much attention we put on ranking and relevance, etc. If the document itself did not exist, then web search wasn't going to find it, rank it, present it, etc.

Y! Answers turns the current search paradigm on its head. Rather than the current industry search paradigm (connecting the average 2.4 keywords to some extant "web page" out there), Y! Answers attempts to distill knowledge out of the very ether... Actually, "ether" is rather inappropriate term as Y! Answers attempts to distill knowledge from a very real asset: Yahoo!'s pool of half a billion monthly users. It turns this audience into the world's most liquid knowledge marketplace.

(This also reminds me a bit PubSub's schpiel about "prospective" vs. "retrospective" search. The premise here is that PubSub could "search the future." What's different about Y! Answers is that PubSub had a relatively passive relationship to the knowledge itself: "We'll tell you when..." Y! Answers actually has the reach, platform and mechanism to invoke the knowledge versus passively monitoring it. Moreover it evokes it in a "lazy migration", generating knowledge precisely in response to demand for that knowledge.)

It's fun and illuminating to think about all of the knowledge that doesn't yet exist on a web page. Trust me, there's lots. One obvious category is what might be referred to as "colloquial" knowledge, i.e. the shortcut to my house that the online mapping services always seem to get wrong. Or "Where's a good place to get authentic matzah ball soup in Times Sq. at noon where I won't have to wait in line?" The kind of stuff my mother and father know from a collective 142 years on the planet... but alas, they've never authored a web page (let alone written a book, made a movie, etc.) so the only beneficiaries of their wisdom to date have been their immediate friends and family. (Tom Coates will rap my knuckles for invoking the dreaded "parents as naive users" meme...)

Yahoo! Answers serves many, many more purposes than just colloquial knowledge however. It's fascinating to spend time in there... it's an incredibly revealing lens into the multitude of categories underserved by web search today. While the original motivation for knowledge search might be attributed to "lack of Korean language documents," the success of the product worldwide indicates that this was just the tip of the iceberg... there is something more substantial, subtle, and universal going on: knowledge yet to exist > knowledge that exists. I find something incredibly uplifting and optimistic about this.

And with a push of the "Publish" button, yet another web page springs into existence. This one unasked for, but hopefully useful all the same.

Ps.
Tempted to title this post, "I still haven't found what I'm looking for..." but reconsidered...

Thursday, March 2, 2006

Lowering Barriers to Participation

In a previous post, I mentioned our efforts around lowering barriers to entry for participation, i.e. empowering consumers with tools that transform them into creators. Tagging is perhaps the simplest and most direct example of how lowering a barrier to entry can drive and spur participation.

Tagging works, in part, because it's so simple. Rather than being forced to tag Rashi (the name of my puppy) in a hierarchical taxonomy: (Animal => Mammal => Canine => Rhodesian Ridgeback => Rashi) I can just type Rashi. The instructions for tagging on Flickr are vague; likely the less said the better. You learn by watching and doing, making mistakes and fixing them... sometimes tagging for oneself, sometimes for ones friends, sometimes for others. Tagging, while initially uncomfortably unstructured (staring into that blank field it's easy to freeze up with "taggers block"), becomes painless and thought-free. Note that there is no spellcheck against submitted tags. People commonly invent tags that have no meaning outside of a shared or personal context, for instance specific tags for events.

In the great taxonomy/folksonomy debate, dewey-decimal fans generally invoke semantic ambiguity as a place where tagging will breakdown. Stewart invoked these illustrative examples in his blog post that introduced the Flickr clustering feature. For instance, the word "turkey" has several different senses - turkey the bird, turkey the food, and Turkey the country.

Forcing a user to resolve this ambiguity at data entry time would be a drag, and we'd likely see a huge dropoff in the amount of user metadata that we collect. (Moreover, we really couldn't. As pointed out before, tags must be allowed to take on personal meaning - "turkey" might be the name of my school's mascot, e.g. the Tarrytown Turkeys, or a pejorative term I apply to a bad snapshot...) What Flickr can and does do, is provide an ipso facto means of resolving this ambiguity and browsing the data: Flickr's clustery goodness.

So check out the turkey clusters. Flickr uses the co-occurance of tags to cluster terms. In other words photos with the tags "turkey" and "stuffing" tend to be about the food, "turkey" and "mosque" tend to be about the country, and "turkey" and "feather" about the bird.

There are limitations with this approach. Co-occurance means that there exist more than a single tag for a given photo. Something tagged with just "turkey" is shit outta luck, and doesn't get to come to the clustering party. Precision and Recall tolerances within the Flickr system are very different than in a tradition information retrieval based system. A lot of what we're going for here is discovery as opposed to recall; there photos that don't come to clustering party aren't really hurting anything. Moreover, the system doesn't really know about the semantic clusters I defined in the above paragraph: "food", "country" and "bird". In fact I just assigned those names by looking at the results of the clusters and reverse engineering what I intuit is going on.

In fact, in addition to these tidy clusters onto which I can slap a sensible label, there are also several other clusters which aren't immediately recognizable. One is the "sea" cluster; apparently lots of people take pictures of the sea in Turkey. The other, which is harder to divine, seems to contain a lot of words in which appear to be in turkish. (Reflections on multi-lingual tagging deserve their own post.) This reverse engineering can be fun, and I'm sure there is a game in there somewhere that someone has already built. (Lots of folks have come up with interesting Flickr games, i.e. "Guess the tag!")

Ambiguous words like "turkey" or "jaguar" (cat, car, operating system) are illustrative. Clusters against tags like "love" (again an example Stewart invokes) are downright fascinating. Here we have clusters corresponding to (again reverse engineering/inventing labels) symbols of love, romantic love, women (perhaps loved by men), familial love, and pets. Pretty cool.

Another thing that's cool is that these clusters are dynamic. The clustering shifts to accommodate words that take on new meanings. As Caterina pointed out to me, for months Katrina was a tag mostly applied to women and girls; one day it suddenly meant something else. The clusterbase shifts and adapts to accommodate this.

Per my first post - I'm just documenting my observations, celebrating Flickr and not breaking any new ground here. Hooray for Stewart and Serguei and team that actually create this stuff! Hooray for Tom and the other pundits (like Clay and Thomas) who have already figured out most everything there is to know about tags!

The reason I'm hilighting this feature is that a few folks misunderstood the pyramid in my first post to be Yahoo's strategy... on the contrary it's just an empirical observation that these ratios exist, and that social software can be successful in the face of them. We're flattening, dismantling, and disrupting this pyramid every day!

Flickr clustering speaks to our unofficial tag line, "Better search through people." What I love about it is that it's not "human or machine", or heaven forbid "human versus machine", but "human plus machine". We let people do what they're really good at (understanding images at a glance) and keep it nice and simple for them. We then let machines do what their good at, and invoke algorithms and AI to squeeze out additional value. There's also a cool "wisdom of crowds" effect here, in that the clusters are the result of integrating a lot of data across many individuals.

Some of our folks at YRB in Berkeley will be prototyping some additional very cool "wisdom of crowds" or "collective intelligence" type stuff RSN (Real Soon Now.) More about their work in an upcoming post. In the meantime, get a taste of it in the ZoneTag application. It applies many of the these principles to the task of associating course location with cell phone tower IDs - a cheap, simple way to squeeze location out of phones before we've all got GPS.

Thursday, February 16, 2006

Creators, Synthesizers, and Consumers

As Yahoo! has been gobbling up many social media sites over the past year (Flickr, upcoming, del.icio.us) I often get asked about how (or whether) we believe these communities will scale.

The question led me to draw the following pyramid on a nearby whiteboard:

The levels in the pyramid represent phases of value creation. As an example take Yahoo! Groups.

1% of the user population might start a group (or a thread within a group)

10% of the user population might participate actively, and actually author content whether starting a thread or responding to a thread-in-progress

100% of the user population benefits from the activities of the above groups (lurkers)

There are a couple of interesting points worth noting. The first is that we don't need to convert 100% of the audience into "active" participants to have a thriving product that benefits tens of millions of users. In fact, there are many reasons why you wouldn't want to do this. The hurdles that users cross as they transition from lurkers to synthesizers to creators are also filters that can eliminate noise from signal. Another point is that the levels of the pyramid are containing - the creators are also consumers.

While not quite a "natural law" this order-of-magnitude relationship is found across many sites that solicit user contribution. Even for Wikipedia (the gold standard of the genre) half of all edits are made by just 2.5% of all users. And note that in this context user means "logged in user", not accounting for the millions of lurkers directed to Wikipedia via search engine traffic for instance.

Mostly this is just an observation, and a simple statement: social software sites don't require 100% active participation to generate great value.

That being said, I'm a huge believer in removing obstacles and barriers to entry that preclude participation. One of the reasons I think Flickr is so compelling is that both the production and consumption is so damn easy. I can (and do) snap photos and upload them in about 15s on my Treo 650. And I can, literally in a moment, digest what my friends did this weekend on my Flickr "Photos from Your Contacts" page. Contrast this with the production/consumption ratio of something like video or audio or even text. There is something instantly gratifying about photos because the investment required for both production/consumption is so small and the return is so great.

One direction we (i.e. both Yahoo and the industry) are moving is implicit creation. A great example is Yahoo! Music's LaunchCast service, an internet radio station. I am selfishly motivated to rate artists, songs and music as they stream by... the more I do this, the better the service gets at predicting what I might like. What's interesting is that the self-same radio station can be published as a public artifact. The act of consumption was itself an act of creation, no additional effort expended... I am what I play - I am the DJ (with props to Bowie.) Very cool.

I spoke a lot more about this in the Wired article. In the new paradigm of "programming" where there are a million things on at any instant, we're going to need some new and different models of directing our attention. In the transition from atoms-to-bits, scarcity-to-plenty, etc. instead of some cigar-puffing fat-cat at a studio or label "stoking the star-maker machinery behind the popular songs" we're going to have the ability to create dynamic affinity based "channels". Instead of NBC, ABC, CBS, HBO, etc. which control scarce distribution across a throttled pipe... we're going to have WMFAWC, WMNAWC, TNYJLC and a whole lot more. (The what my friends are watching channel, The what my neighbors are watching channel, The New York Jewish Lesbian Channel, etc.) I expect we'll also have QTC (the Quentin Tarantino channel) but this won't be media he made (necessarily) but rather media he recommends or has watched / is watching. Everyone becomes a programmer without even trying, and that programming can be socialized, shared, distributed, etc.

Another example of implicit creation is Flickr interestingness. The obvious (and broken) way to determine the most interesting pictures on Flickr would have been to ask users to cast votes on the matter. This would have been an explicit means of determining what's interesting. It also would have required explicit investment from users, the "rating" of pictures. Knowing the Flickr community, this would have led to a lot of discussion about how/why/whether pictures should be rated, the meaning of ratings, etc. It also would have led to a lot of "gaming" and unnatural activity as people tried to boost the ratings of their pictures.

Instead, interestingness relies on the natural activity on and traversal through the Flickr site. It's implementation is subtle, and Stewart has hinted that a photos interestingness score depends on putting a number of factors in a blender: the number of views, the number of times a photo has been favorited (and by whom), the number of comments on a photo, etc. I would guess that Flickr activity the day after interestingness launched didn't change much from the day before, i.e. the cryptic nature of the algorithm ("interestingness" is the perfect, albeit arcane term) didn't lead to a lot of deliberate gaming. But dammit, it works great.

Without anyone explicitly voting, and without disrupting the natural activity on the site, Flickr surfaces fantastic content in a way that constantly delights and astounds. In this case lurkers are gently and transparently nudged toward remixers, adding value to others' content.

Elatable : Bradley Horowitz

Pages