Digital Recycling: Turning “data exhaust” into information (and money)

One of Seattle’s best (and least-known) entrepreneurs isn’t in the software business. Dave Lahaie is the President and founder of Evergreen Recycling, a global company that matches industrial waste products from one industry with raw materials supply chains in another. By matching waste outputs with manufacturing inputs, Dave’s company saves millions of tons of material from entering the waste stream each year and makes money doing it. (He also has the coolest apartment in the city of Seattle).

Dave’s business model – turning waste into money – is also being applied behind the scenes across the Web: dozens of companies are in the business of aggregating and analyzing the huge volumes of customer, transaction and traffic data that are thrown off by other people’s websites and reselling it to others.

The financial services and direct marketing industries are the pioneers of this approach, hoovering up vast amounts of consumer spending, demographic and census data to build profiles for both risk management and offer targeting purposes. Experian is the grandaddy of this sector, maintaining consumer credit profiles on 215 million U.S. consumers and powering targeting for 20 billion pieces of direct mail every year.

Unsurprisingly, web-based equivalents have sprung up to perform the same function for online consumer data. Rapleaf is probably the best-known name in this segment, binding together social media profile data from around the web to allow marketers to match names, email addresses and behavioral data. The company recently got in hot water for acquiring and reselling Facebook IDs to marketers, requiring a change in business practices, but remains a major player in the commercial market for online identity data.

The broad description for this theme – “Big Data” – has already produced some big exits for investors who leaned in early. A few weeks ago Teradyne paid $263 million for Aster Data Systems, a First Round Capital investment that Josh Kopelman introduced with this description:

“As more companies seek to transform their data exhaust into data value (hey wait a minute — perhaps that’s the Web 2.0 version of “clean tech” — converting messy data into clean insight) — I think they will need tools like Aster Data to help them discover deep insights on massive data sets.”

Wall Street vet Roger Ehrenberg has based his entire venture investing firm on this thesis, describing IA Ventures (where IA = Information Arbitrage) as focused on “early-stage companies developing breakthrough tools and technologies for extracting value from big data”.

The more our work and personal lives are mediated by digital tools – email, social networking sites, mobile devices – the more granular and voluminous a digital crumbtrail we leave, and the more massive these aggregated digital data sets become. As investor and media analyst Paul Kedrosky pondered in a recent post,

“What are the consequences of an instrumented planet? In the financial markets we have long been used to the idea that data flows constantly, some of it spurious, some of it meaningful. But we are heading down a path toward a planet where pretty much everything throws off data, not all of it intentional.”

The massive scale of this opportunity can be both both enticing and daunting to early-stage startups living the lean and agile life: ingesting and extracting intelligence from huge and high-velocity data sets requires significant investments in both software development and data management infrastructure. But the core insight – turning hard-to-grok data streams into more easily consumable business intelligence – can be applied at almost any scale. A few examples from my own experience illustrate the point:

Simply Measured (a Founders Co-op portfolio company)

  • So you have a million Twitter followers and 10,000 Gmail contacts. Who are these people, what do they care about and what can you do for them? Simply Measured is a lightweight data exporting and analytics platform that turns the firehose of social media data into easy-to-understand visualizations to help marketers and brand stewards make better decisions.

Colligent (I’m an angel investor)

  • Want to land a marquee brand advertiser for your new TV show? Need to know what celebrity endorser will make your product’s loyal customers swoon? Colligent continuously ingests public data from 211 million consumer profiles on Twitter and Facebook and extracts affinities (mentions, likes, hashtags, etc) for 4,000+ brands (e.g., Coke, Starbucks), 6,000+ media entities (TV/Radio stations + shows) and 23,000+ entertainment entities (actors, artists + bands). The result is a 3-dimensional data cube that allows any one of those brands, stations or shows to see how strong or weak their affinity is to any other based on the overlap in their consumer fan base, and how that affinity is changing in response to promotional activity.

The principle can even be applied within the boundaries of a single product, e.g.

Massively Fun (another Founders Co-op portfolio company), makers of…

  • Wordsquared, a massively mutiplayer online casual game for word lovers. With over 45 million letter tiles representing 15 million words played by hundreds of thousands of players, Wordsquared emits data about words played and points earned at a furious rate. By making every played work clickable and adding a summary page for every word played, the company recently turned the exhaust of each players’ actions into a new game dimension, connecting every player and word to their peers across the gamespace.

Once you start looking at your offering through this lens, almost any web service (at least those operating at meaningful scale) creates “data exhaust” opportunities. Entrepreneurs should be on the lookout for smart ways to mine their own data and stitch it together with others’ to drive both revenue and enterprise value (just make sure your Terms of Use give you clear title to the data you’re using, and steer clear of any privacy-related issues by disclosing patters in the aggregate, not at the individual level).