Pages

Thursday, March 2, 2006

Lowering Barriers to Participation

In a previous post, I mentioned our efforts around lowering barriers to entry for participation, i.e. empowering consumers with tools that transform them into creators.  Tagging is perhaps the simplest and most direct example of how lowering a barrier to entry can drive and spur participation.

Tagging works, in part, because it's so simple.  Rather than being forced to tag Rashi (the name of my puppy) in a hierarchical taxonomy: (Animal => Mammal => Canine => Rhodesian Ridgeback => Rashi) I can just type Rashi.  The instructions for tagging on Flickr are vague; likely the less said the better.  You learn by watching and doing, making mistakes and fixing them...  sometimes tagging for oneself, sometimes for ones friends, sometimes for others.  Tagging, while initially uncomfortably unstructured (staring into that blank field it's easy to freeze up with "taggers block"), becomes painless and thought-free.  Note that there is no spellcheck against submitted tags.  People commonly invent tags that have no meaning outside of a shared or personal context, for instance specific tags for events.

In the great taxonomy/folksonomy debate, dewey-decimal fans generally invoke semantic ambiguity as a place where tagging will breakdown.  Stewart invoked these illustrative examples in his blog post that introduced the Flickr clustering feature.  For instance, the word "turkey" has several different senses - turkey the bird, turkey the food, and Turkey the country. 

Forcing a user to resolve this ambiguity at data entry time would be a drag, and we'd likely see a huge dropoff in the amount of user metadata that we collect.  (Moreover, we really couldn't.  As pointed out before, tags must be allowed to take on personal meaning - "turkey" might be the name of my school's mascot, e.g. the Tarrytown Turkeys, or a pejorative term I apply to a bad snapshot...)  What Flickr can and does do, is provide an ipso facto means of resolving this ambiguity and browsing the data:  Flickr's clustery goodness.

So check out the turkey clusters.  Flickr uses the co-occurance of tags to cluster terms.  In other words photos with the tags "turkey" and "stuffing" tend to be about the food, "turkey" and "mosque" tend to be about the country, and "turkey" and "feather" about the bird.

There are limitations with this approach.  Co-occurance means that there exist more than a single tag for a given photo.  Something tagged with just "turkey" is shit outta luck, and doesn't get to come to the clustering party.   Precision and Recall tolerances within the Flickr system are very different than in a tradition information retrieval based system.  A lot of what we're going for here is discovery as opposed to recall;  there photos that don't come to clustering party aren't really hurting anything.  Moreover,  the system doesn't really know about the semantic clusters I defined in the above paragraph: "food", "country" and "bird".  In fact I just assigned those names by looking at the results of the clusters and reverse engineering what I intuit is going on.

In fact, in addition to these tidy clusters onto which I can slap a sensible label, there are also several other clusters which aren't immediately recognizable.  One is the "sea" cluster; apparently lots of people take pictures of the sea in Turkey.  The other, which is harder to divine, seems to contain a lot of words in which appear to be in turkish.  (Reflections on multi-lingual tagging deserve their own post.)  This reverse engineering can be fun, and I'm sure there is a game in there somewhere that someone has already built.  (Lots of folks have come up with interesting Flickr games, i.e. "Guess the tag!")

Ambiguous words like "turkey" or "jaguar" (cat, car, operating system) are illustrative.  Clusters against tags like "love" (again an example Stewart invokes) are downright fascinating.  Here we have clusters corresponding to  (again reverse engineering/inventing labels) symbols of love, romantic love, women (perhaps loved by men), familial love, and pets.  Pretty cool.

Another thing that's cool is that these clusters are dynamic.  The clustering shifts to accommodate words that take on new meanings.  As Caterina pointed out to me, for months Katrina was a tag mostly applied to women and girls; one day it suddenly meant something else.  The clusterbase shifts and adapts to accommodate this.

Per my first post - I'm just documenting my observations, celebrating Flickr and not breaking any new ground here.  Hooray for Stewart and Serguei and team that actually create this stuff!  Hooray for Tom and the other pundits (like Clay and Thomas) who have already figured out most everything there is to know about tags!

The reason I'm hilighting this feature is that a few folks misunderstood the pyramid in my first post to be Yahoo's strategy...  on the contrary it's just an empirical observation that these ratios exist, and that social software can be successful in the face of them.  We're flattening, dismantling, and disrupting this pyramid every day! 

Flickr clustering speaks to our unofficial tag line, "Better search through people."  What I love about it is that it's not "human or machine", or heaven forbid "human versus machine", but "human plus machine".  We let people do what they're really good at (understanding images at a glance) and keep it nice and simple for them.  We then let machines do what their good at, and invoke algorithms and AI to squeeze out additional value.  There's also a cool "wisdom of crowds" effect here, in that the clusters are the result of integrating a lot of data across many individuals.

Some of our folks at YRB in Berkeley will be prototyping some additional very cool "wisdom of crowds" or "collective intelligence" type stuff RSN (Real Soon Now.)  More about their work in an upcoming post.  In the meantime, get a taste of it in the ZoneTag application.  It applies many of the these principles to the task of associating course location with cell phone tower IDs - a cheap, simple way to squeeze location out of phones before we've all got GPS.

Tuesday, February 21, 2006

Compassionate Care

First off I want to thank everyone that read, linked to, blogged about, referenced, etc. my first post. Extremely gratifying and encouraging, and definitely left me feeling that blogging is going to be something I enjoy. Your comments and feedback are much appreciated, even as I need to learn how (and at what level) I'm going to be able to react and respond directly.

At the risk of losing my freshly-minted audience, I want to blog today about something that is neither technology nor business-related.

Krista (my girlfriend of nearly 5 years) serves as the Volunteer Coordinator at the Charlotte Maxwell Complementary Clinic. This clinic offers complementary alternative medicine treatments to low-income women with cancer. Their services seek to provide relief from the "terrible side-effects of cancer and its treatments". Today the San Francisco Chronicle featured the clinic in a very touching article.

Ways that you can help:
  • Ask your organization to become a corporate sponsor, and have them contact Linda.
  • Direct any practitioners (massage therapists, herbalists, acupuncturists, etc.) who might be interested in volunteering to the Clinic
  • Make a donation yourself
  • If you know someone who is a low-income woman with cancer, please share information about the Clinic and its services with her.
I've often said that I aspire toward balance in my life, but to observe that balance one would need to integrate over large chunks of time and experience... Krista and I are in very different lines of work, and again here the word "complementary" comes to mind. I feel privileged to have a partner whose work is so directly and obviously connected to alleviating suffering in the world. Krista has taught me so much about the politics of cancer, poverty, and how to truly and deeply care for people. I'm inspired by her example daily.

Thursday, February 16, 2006

Creators, Synthesizers, and Consumers

As Yahoo! has been gobbling up many social media sites over the past year (Flickr, upcoming, del.icio.us) I often get asked about how (or whether) we believe these communities will scale.

The question led me to draw the following pyramid on a nearby whiteboard:
Content Production Pyramid

The levels in the pyramid represent phases of value creation.  As an example take Yahoo! Groups.

  • 1% of the user population might start a group (or a thread within a group)

  • 10% of the user population might participate actively, and actually author content whether starting a thread or responding to a thread-in-progress

  • 100% of the user population benefits from the activities of the above groups (lurkers)


There are a couple of interesting points worth noting.  The first is that we don't need to convert 100% of the audience into "active" participants to have a thriving product that benefits tens of millions of users.  In fact, there are many reasons why you wouldn't want to do this.  The hurdles that users cross as they transition from lurkers to synthesizers to creators are also filters that can eliminate noise from signal.  Another point is that the levels of the pyramid are containing  - the creators are also consumers.

While not quite a "natural law" this order-of-magnitude relationship is found across many sites that solicit user contribution.  Even for Wikipedia (the gold standard of the genre) half of all edits are made by just 2.5% of all users.  And note that in this context user means "logged in user", not accounting for the millions of lurkers directed to Wikipedia via search engine traffic for instance.

Mostly this is just an observation, and a simple statement:  social software sites don't require 100% active participation to generate great value.

That being said, I'm a huge believer in removing obstacles and barriers to entry that preclude participation.  One of the reasons I think Flickr is so compelling is that both the production and consumption is so damn easy.  I can (and do) snap photos and upload them in about 15s on my Treo 650.  And I can, literally in a moment, digest what my friends did this weekend on my Flickr "Photos from Your Contacts" page.  Contrast this with the production/consumption ratio of something like video or audio or even text.  There is something instantly gratifying about photos because the investment required for both production/consumption is so small and the return is so great. 

One direction we (i.e. both Yahoo and the industry) are moving is implicit creation. A great example is Yahoo! Music's LaunchCast service, an internet radio station.  I am selfishly motivated to rate artists, songs and music as they stream by...  the more I do this, the better the service gets at predicting what I might like.  What's interesting is that the self-same radio station can be published as a public artifact. The act of consumption was itself an act of creation, no additional effort expended...   I am what I play - I am the DJ (with props to Bowie.)  Very cool. 

I spoke a lot more about this in the Wired article.  In the new paradigm of "programming" where there are a million things on at any instant, we're going to need some new and different models of directing our attention.  In the transition from atoms-to-bits, scarcity-to-plenty, etc. instead of some cigar-puffing fat-cat at a studio or label "stoking the star-maker machinery behind the popular songs" we're going to have the ability to create dynamic affinity based "channels".  Instead of NBC, ABC, CBS, HBO, etc. which control scarce distribution across a throttled pipe... we're going to have WMFAWC, WMNAWC, TNYJLC and a whole lot more.  (The what my friends are watching channel, The what my neighbors are watching channel, The New York Jewish Lesbian Channel, etc.)  I expect we'll also have QTC (the Quentin Tarantino channel) but this won't be media he made (necessarily) but rather media he recommends or has watched / is watching.  Everyone becomes a programmer without even trying, and that programming can be socialized, shared, distributed, etc.

Another example of implicit creation is Flickr interestingness.  The obvious (and broken) way to determine the most interesting pictures on Flickr would have been to ask users to cast votes on the matter.  This would have been an explicit means of determining what's interesting.  It also would have required explicit investment from users, the "rating" of pictures.  Knowing the Flickr community, this would have led to a lot of discussion about how/why/whether pictures should be rated, the meaning of ratings, etc.  It also would have led to a lot of "gaming" and unnatural activity as people tried to boost the ratings of their pictures. 

Instead, interestingness relies on the natural activity on and traversal through the Flickr site.  It's implementation is subtle, and Stewart has hinted that a photos interestingness score depends on putting a number of factors in a blender:  the number of views, the number of times a photo has been favorited (and by whom), the number of comments on a photo, etc.  I would guess that Flickr activity the day after interestingness launched didn't change much from the day before, i.e. the cryptic nature of the algorithm ("interestingness" is the perfect, albeit arcane term) didn't lead to a lot of deliberate gaming. But dammit, it works great.

Without anyone explicitly voting, and without disrupting the natural activity on the site, Flickr surfaces fantastic content in a way that constantly delights and astounds.  In this case lurkers are gently and transparently nudged toward remixers, adding value to others' content.

A shout out to all the people who make me think

As I get into my first blog post, I want to state for the record I am, at best, a remixer.

I happen to be surrounded by talented people (many of whom are known to the blogosphere, most of whom are not), and I learn something new every day. I often can't remember (let alone credit) where a lot of the great ideas I repurpose came from.  So apologies in advance, and thanks in advance, to all the sung (see the about page for more on my heroes) and unsung contributors that have influenced my thinking and whose creativity I will unabashedly exploit on this blog.

I haven't invested much (any) energy into the look, feel, functionality of this blog.  Makes me feel kinda lame.  But everyone who knows this medium is telling me to get on the horse and ride, and worry about such stuff later (if at all.)  So...

On with it!