Pages

Thursday, March 16, 2006

The Love Machine

Last week Prabhakar and I presented some of Yahoo's past and future strategies to a bunch of Benchmark Capital portfolio companies at their recent shindig in Half Moon Bay. Prabhakar presented his compelling vision for Yahoo Research (which I've seen umpteen times before but excites me anew each time.) He also touted some excellent recent hires (including an exciting one that I’m sorry I can’t talk about because it's not announced yet.) He covered the joint Yahoo and O’Reilly developed Tech Buzz Game. This game is a “fantasy prediction market for high-tech products, concepts, and trends.” Very intriguing concept, worth checking out if you haven’t yet.



One of the highlights of the day was giving Philip Rosedale a ride home to San Francisco which gave us a solid 45 minutes to catch up. I’ve been friendly with Philip since he was CTO of RealNetworks (a long time ago) and have stayed in touch and watched as he and team have developed SecondLife. What’s happening in SecondLife is mind-blowing and almost too much to get my head around. I'll take every chance I can get to talk to Philip and glean what insight I might from someone who is literally a "pioneer in cyberspace." (I'm quite deliberately using this vintage '96 colloquialism cuz it fits so damn well. Forgive me.)

Once we were cruising up Highway 92 back toward civilization, I asked Philip what ground-breaking unconventional management techniques he applied at Linden Lab (makers of SecondLife) certain this would be be good fodder for the ride... I wasn't disappointed and he told me about a few…

The first is “The Love Machine.” The Love Machine is a simple way for Linden employees to give and receive “love”… where “love” in this context is work-related appreciation. It’s a page on their intranet with three fields, “From”, “To”, and “Why” (an 80-character free text field.) That’s pretty much it. People can (and do) give “love” to each other. It’s a way of saying “attaboy” or “thanks” or “I noticed.” There’s visibility into all the love you’ve both given and received. What’s interesting about this is that “love” is not only a morale builder, and a way of getting peer feedback, but is directly tied to money. (Philip mentioned that given Linden’s stage as a company right now, this variable bonus is relatively small… but will grow as Linden grows.) Philip also talked about “Taskzilla”, a mod of Bugzilla that basically allows for transparency and collective prioritization around the company’s focus.

Against the backdrop of Prabhakar’s Tech Buzz Game, we talked about a scenario where employees acquired “whuffie” (or cred) within the company not because of a title, or a degree from a good school, or from their ability to schmooze with those that hold and confer the power, etc. but rather from empirical demonstration that they can make strategic decisions that are net beneficial for the company. Imagine upon entering the company, every employee is granted 1000 “shares” of decision currency. You can spend your currency by buying into (or out of) various corporate issues in an open marketplace (a la Taskzilla.) Decisions are forensically judged to be good or bad by the employee community itself, and dividends paid out to those that got it right. Imagine the hallway conversations:
  • “I went ‘all in’ for the broadcast.com acquisition, so I’m basically decision-bankrupt…” Or
  • “I made a killing by endorsing the Overture acquisition… I could basically single-handedly end the operations of Yahoo Germany if I wanted to...” the QA engineer said smugly.
Puh-lenty broken about the above scenario, and not suggesting this scheme would work, promoting it as viable, or any such thing. (I’m feeling increasingly required to make these disclaimers on this blog as I continue to get misinterpreted and quoted out of context.) As an example of the many, many ways such systems can unravel, check out Business 2.0's reference on how Microsoft's attempts to establish a "meritocracy" have devolved into a popularity contest. (Though note that the Microsoft system is not democratic and is closed-door... The hope is that cronyism can be at least partially mitigated through large sample sizes and more transparency.)

I once had a manager who said, "Plan for the day when the salaries of all the company's employees are found sitting on the printer. It's only a matter of time before it happens." Ironically, plan as one might, I'd guess that list is sure to piss off nearly everyone irrespective of how it's designed. It's also not clear that "minimizing employee angst" is the right objective function for this optimization anyways.

So I’m just saying... fun stuff to think about. A fun thought experiment... And interesting to contemplate how the next generation of enterprise software might allow for more and better metrics by which to acquire subjective measures of an employee's contribution. Right now, so much of this is anecdotal, tedious, and perfunctory. "It's review time people, so please fill out your self-assessment, your peer reviews, review your direct reports, etc. and submit by next Wednesday." Something like The Love Machine provides a perpetual feedback loop that is easy, fun, instantly gratifying... and meaningful (to a degree.) Note Philip doesn't base an employees entire salary on this data... just a small discretionary spiff. Love gets you icing, not cake. The Love Machine should be primarily a measurement tool and not have the quantum effect of changing the system it's measuring. Though you wouldn't want people gaming the system too much in order to acquire Love, if the Love Machine tipped the culture toward becoming more conscientious, more aware and connected to how one's contributions affected others, etc. - that's probably not a bad thing.

Tacit is an example of a company that's doing extremely cool social engineering within the enterprise. By installing a proxy next to your mail server, they passively monitor email traffic and can autogenerate a "yellow pages" for your company that can answer questions like "Who's our resident expert on sockets-based networking protocols?" Putting (for now) the huge privacy and policy issues aside, this is pretty friggin' cool. One of the things that's interesting about it is the implicit harvesting of this information (vs. requiring me to fill out a skills survey or profile.) "Expertise mining." An aside: I think Tacit is one of the coolest names for a company I've heard, partly because it captures so well what they're about. They've got a bunch of a-list investors (including Esther), but the company has been around a while and has yet to realize its potential. Hope they can put the pieces together and make it work. Their CEO David Gilmour is a seriously bright (and nice) guy.

Cameron innovated around this idea recently (and is threatening to do more on Hack Day) but sadly I can't say any more publicly.

Tuesday, March 7, 2006

Zoom into the Room

Hot on the heels of ZoneTag, we're releasing another spiffy mobile prototype - a "mobile friend finder" we're calling CheckMates

venueIn addition to some very cool features (an incredibly intuitive mobile interface, leveraging Flickr for both the social network and a place to park my geopresence, etc.) I love the support for "private maps".  This allows for pinpointing my location within a venue.  It rocks IMHO.  Great job Chad, Karon, Sam, Ed, Jonathan, etc.

Chad does a great job describing how this work evolved.

And Edward gives yet more detail.

Sunday, March 5, 2006

Capture v. Derive

Universal Law: It is easier, cheaper and more accurate to capture metadata upstream, than to reverse engineer it downstream.

Back at Virage, we worked on the problem of indexing rich media - deriving metadata from video. We would apply all kinds of fancy (and fuzzy) technology like speech recognition, automatic scene change detection, face recognition, etc. to commercial broadcast video so that you could later perform a query like, "Find me archival footage where George Bush utters the terms 'Iraq' and 'weapons of mass destruction.'"

What was fascinating (and frustrating) about this endeavor is that we were applying a lot of computationally expensive and error-prone techniques to reverse engineer metadata that by all rights shoulda and coulda been easily married to the media further upstream. Partly this was due to the fact that analog television signal in the US is based on a standard that is more than 50 years old. There's no convenient place to put interesting metadata (although we did some very interesting projects stuffing metadata and even entire websites in the vertical blanking interval of the signal.) Even as the industry migrates to digital formats (MPEG2), the data in the stream generally is what is minimally needed to reconstitute the signal and nothing more. MPEG4 and MPEG7 at least pay homage to metadata by having representations built into the standard.

Applying speech recognition to derive a searchable transcript seems bass-ackwards since for much video of interest the protagonists are reading material that is already in digital form (whether from a teleprompter or a script.) So much metadata is needlessly thrown away in the production process.

In particular, cameras should populate the stream with all of the easy stuff, including:

  • roll
  • pitch
  • yaw
  • altitude
  • location
  • time
  • focal length
  • aperture setting
  • gain / white balance settings
  • temperature
  • barometric pressure
  • heartrate and galvanic skin response of the camera operator
  • etc.
  • Heartrate and galvanic skin response of the camera operator? Ok, maybe not... I'm making a point. That point is that it is relatively easy and cheap to use sensors to capture these kinds of things in the moment... but difficult (and in the case of barometric pressure) impossible to derive them post facto. Why would you want to know this stuff? I'll be the first to confess that I don't know... but that's not the point IMHO. It's so easy and cheap to capture these, and so expensive and error-prone to derive them that we should simply do the former when practical.


    An admittedly slightly off-point example... When the Monika Lewinsky story broke, the archival shot of her and Clinton hugging suddenly became newsworthy. Until that moment she was just one of tens of thousands of bystanders amongst thousands of hours of archival footage. Point being - you don't always know what's important at time of capture.

    So segueing to today... Marc, Ellen, Mor and the rest of the team at Yahoo Research Berkeley have recently released ZoneTag. One of the things that ZoneTag does is take advantage of context. I carry around a Treo 650 with Good software installed for email, calendar, contact sync'ing. When I snap a photo the device knows a lot of context automagically, such as: who I am, time (via the clock), where I am supposed to be (via the calendar), where I actually am (via the nearest cell phone tower's ID), who I am supposed to be with (via calendar), what people / devices might be around me (via bluetooth co-presence), etc. Generally most of this valuable context is lost when I upload an image to Flickr via the email gateway. I end up with a raw JPG (in the case of the Treo even the EXIF fields are empty.)

    ZoneTag lays the foundation for fixing this and leveraging this information.

    It also dabbles in the next level of transformation from signal to knowledge. Knowing the location of the closest cell phone tower ID gives us course location, but it's not in a form that's particularly useful. Something like a ZIP code, a city name, or a lat/long would be a much more conventional and useful representation. So in order to make that transformation, ZoneTag relies on people to build up the necessary look-up tables.

    This is subtle, but cool. Whereas I've been talking about capturing raw signal from sensors, once we add people (and especially many people) to the mix we can do more interesting things. To foreshadow the kinds of things coming...

    • If a large sample of photos coming from a particular location have the following tag sets [eiffel tower, emily], [eiffel tower, john, vacation], [eiffel tower, lisette], we can do tag-factoring across a large data set to tease out 'eiffel tower.'
    • Statistically, the tag 'sunset' tends to apply to photos taken at a particular time each day.
    • When we've got 1000s of Flickr users at an event like Live8 and we see an upload spike clustered around a specific place and time (i.e. Berlin at 7:57pm) that likely means something interesting happened at that moment (maybe Green Day took the stage.)

    All of the above examples lead to extrapolations that are "fuzzy." Just as my clustering example might have problems with people "eating turkey in Turkey", it's one thing to have the knowledge - it's another to know how to use it in ways that provide value back to users. This is an area where we need to tread lightly, and is worth of another post (and probably in fact a tome to be written by someone much more cleverer than me.)

    Even as I remain optimistic that we'll eventually solve the generalized computer vision problem ("Computer - what's in this picture?"), I wonder how much value it will ultimately deliver. In addition to what's in the picture, I want to know if it's funny, ironic, or interesting. Much of the metadata people most care about is not likely to be algorithmically derived against the signal in isolation. Acoustic analysis of music (beats per minute, etc.) tends to be a poor predictor of taste, while collaborative filtering ("People who liked that, also liked this...") tends to work better.

    Again - all of this resonates nicely with the "people plus machines" philosophy captured in the "Better Search through People" mantra. Smart sensors, cutting-edge technology, algorithms, etc. are interspersed throughout these systems, not just at one end or the other. There are plenty of worthwhile problems to spend our computrons on, without burdening the poor machines with the task of reinventing the metadata we left by the side of the road...