Thursday, March 2, 2006

Lowering Barriers to Participation

In a previous post, I mentioned our efforts around lowering barriers to entry for participation, i.e. empowering consumers with tools that transform them into creators.  Tagging is perhaps the simplest and most direct example of how lowering a barrier to entry can drive and spur participation.

Tagging works, in part, because it's so simple.  Rather than being forced to tag Rashi (the name of my puppy) in a hierarchical taxonomy: (Animal => Mammal => Canine => Rhodesian Ridgeback => Rashi) I can just type Rashi.  The instructions for tagging on Flickr are vague; likely the less said the better.  You learn by watching and doing, making mistakes and fixing them...  sometimes tagging for oneself, sometimes for ones friends, sometimes for others.  Tagging, while initially uncomfortably unstructured (staring into that blank field it's easy to freeze up with "taggers block"), becomes painless and thought-free.  Note that there is no spellcheck against submitted tags.  People commonly invent tags that have no meaning outside of a shared or personal context, for instance specific tags for events.

In the great taxonomy/folksonomy debate, dewey-decimal fans generally invoke semantic ambiguity as a place where tagging will breakdown.  Stewart invoked these illustrative examples in his blog post that introduced the Flickr clustering feature.  For instance, the word "turkey" has several different senses - turkey the bird, turkey the food, and Turkey the country. 

Forcing a user to resolve this ambiguity at data entry time would be a drag, and we'd likely see a huge dropoff in the amount of user metadata that we collect.  (Moreover, we really couldn't.  As pointed out before, tags must be allowed to take on personal meaning - "turkey" might be the name of my school's mascot, e.g. the Tarrytown Turkeys, or a pejorative term I apply to a bad snapshot...)  What Flickr can and does do, is provide an ipso facto means of resolving this ambiguity and browsing the data:  Flickr's clustery goodness.

So check out the turkey clusters.  Flickr uses the co-occurance of tags to cluster terms.  In other words photos with the tags "turkey" and "stuffing" tend to be about the food, "turkey" and "mosque" tend to be about the country, and "turkey" and "feather" about the bird.

There are limitations with this approach.  Co-occurance means that there exist more than a single tag for a given photo.  Something tagged with just "turkey" is shit outta luck, and doesn't get to come to the clustering party.   Precision and Recall tolerances within the Flickr system are very different than in a tradition information retrieval based system.  A lot of what we're going for here is discovery as opposed to recall;  there photos that don't come to clustering party aren't really hurting anything.  Moreover,  the system doesn't really know about the semantic clusters I defined in the above paragraph: "food", "country" and "bird".  In fact I just assigned those names by looking at the results of the clusters and reverse engineering what I intuit is going on.

In fact, in addition to these tidy clusters onto which I can slap a sensible label, there are also several other clusters which aren't immediately recognizable.  One is the "sea" cluster; apparently lots of people take pictures of the sea in Turkey.  The other, which is harder to divine, seems to contain a lot of words in which appear to be in turkish.  (Reflections on multi-lingual tagging deserve their own post.)  This reverse engineering can be fun, and I'm sure there is a game in there somewhere that someone has already built.  (Lots of folks have come up with interesting Flickr games, i.e. "Guess the tag!")

Ambiguous words like "turkey" or "jaguar" (cat, car, operating system) are illustrative.  Clusters against tags like "love" (again an example Stewart invokes) are downright fascinating.  Here we have clusters corresponding to  (again reverse engineering/inventing labels) symbols of love, romantic love, women (perhaps loved by men), familial love, and pets.  Pretty cool.

Another thing that's cool is that these clusters are dynamic.  The clustering shifts to accommodate words that take on new meanings.  As Caterina pointed out to me, for months Katrina was a tag mostly applied to women and girls; one day it suddenly meant something else.  The clusterbase shifts and adapts to accommodate this.

Per my first post - I'm just documenting my observations, celebrating Flickr and not breaking any new ground here.  Hooray for Stewart and Serguei and team that actually create this stuff!  Hooray for Tom and the other pundits (like Clay and Thomas) who have already figured out most everything there is to know about tags!

The reason I'm hilighting this feature is that a few folks misunderstood the pyramid in my first post to be Yahoo's strategy...  on the contrary it's just an empirical observation that these ratios exist, and that social software can be successful in the face of them.  We're flattening, dismantling, and disrupting this pyramid every day! 

Flickr clustering speaks to our unofficial tag line, "Better search through people."  What I love about it is that it's not "human or machine", or heaven forbid "human versus machine", but "human plus machine".  We let people do what they're really good at (understanding images at a glance) and keep it nice and simple for them.  We then let machines do what their good at, and invoke algorithms and AI to squeeze out additional value.  There's also a cool "wisdom of crowds" effect here, in that the clusters are the result of integrating a lot of data across many individuals.

Some of our folks at YRB in Berkeley will be prototyping some additional very cool "wisdom of crowds" or "collective intelligence" type stuff RSN (Real Soon Now.)  More about their work in an upcoming post.  In the meantime, get a taste of it in the ZoneTag application.  It applies many of the these principles to the task of associating course location with cell phone tower IDs - a cheap, simple way to squeeze location out of phones before we've all got GPS.

7 comments:

nj said...

"Something tagged with just “turkey” is shit outta luck, and doesn’t get to come to the clustering party. "

How about letting random viewers tag ANY photo with any keyword, like this (see Viewer not User Taxonomy in this comment):
See 'viewer not user taxonomy' in this comment

Or why define by tags, why not define by image content, so you search by turkey, the results are clustered so that maps of the country turkey are together, pictures of turkeys in fields together, pictures of turkeys on a plate together and so on? It's not as difficult as it first seems.

Chris Fillius said...

How about having an ontologists/cataloger working in the background?

Not changing people's tags, just making sure the tags communicate with each other. BT, NT, RT, (broad, narrow, related terms) that kind of thing.

pansophia said...

While lowering barriers to participation expands the "grassroots" of the web, the other end of the problem is the way power concentrates at the top. Rank quickly gets parlayed into rankism. Will the size of the crowd really matter if they are all channeling HuffingtonPost?

Abu Hurayrah said...

Now, I'm not very familiar with how Flickr works, exactly, but short of creating a PageRank-ish style system, is there a way to link images together, giving additional symantic meaning to an image and its relationship to other images and their associated tags?

What I mean is that there is meaning to be gleaned from an image's own tags, and then there is meaning to be gleaned from other pages that might link to it. As I said before, I don't know if Flickr supports those.

Another route would be the method employed via MySQL's FULLTEXT search-indexing, and the way it calculated relevancy - in short, doing one search for the precise term searched, and then searching based on the words that were contained in the result of the first search (decipher that, and hopefully you'll understand what I meant). I think the simple tagging system can support this kind of meta-tagging - or am I way off?

renaissance chambara said...

I thought that you may want to have a look at Crystal Semanitcs. They created a look up table around clusters of words and then can put subject meaning aroud them.

For instance, with normal web search it would be hard to find a match report on a baseball game because the report may not mention baseball but would mention related words like bat, pitcher etc.

Anyway their web address is here: http://www.crystalsemantics.com/

Rohit Aggarwal » Blog Archive » Tagging, Collective intelligence, Clustering said...

[...] Mr. Horowitz of Yahoo has an interesting post on Tagging, Collective Intelligence, and Clustering. [...]

roulette to wheel said...

to roulette wheel roulette wheel analyze

 
Find me on G+!