Brad Feld wrote a post about the Defrag Conference over the weekend where he notes that the amount of information in the world is exploding and that there will be a wave of software innovation to help us make sense of all the data (a prediction he originally made in 2006).

Taking the underlying trend first – the volume of information growth is staggering.  I can’t find any easily digestible stats on the web right now, but numbers I’ve seen before shows that the amount of data produced in the last few years exceeds that created in the entire rest of history.

For the doubters, Brad offers these explanations:

For the foreseeable future, there will be a continuous and rapid increase of information as more of the world gets digitized, more individuals become content creators, more systems open up and provide access to their data, and more infrastructure for creating, storing, and transmitting information (and data) gets built.

Implicit within Brad’s explanation but worth bringing out are increases in the amount of sensor data, that media consumption is becoming trackable for the first time, and the huge volumes of government data that are starting to be published/exposed (in the UK context I hear that Sir Tim Berners-Lee has secured a commitment from Gordon Brown which should see the UK as a leading nation in this regard).

Some months ago I tweeted something about the problems of information overload and my friend Jof Arnold replied that he didn’t see any of his non-techie buddies thinking or worrying about this problem and questioning if there is much of an opportunity here (excluding web search as ‘already done’).

I have returned to that thought a number of times, driven by the conviction that a trend this big has to be creating some kind of opening for startups, and my current view is that there are two types of opportunity, and that they might form the basis for investment themes.

Firstly there is the obvious tools and filters to help us manage all the data. Now that information is abundant time is the new scarcity and these tools and filters are really productivity aids. There is nothing new here, the spreadsheet is a good example of an innovation in this area and Google is another.  Going forward I think the startups will come in vertical markets, e.g. news, something about which I have written a lot on this blog and where we are seeing a lot of innovation at the moment.  Medical is another area where we will all soon need help interpreting the vast volumes of data that are becoming available (I have recently heard of two businesses that are taking the cost of sequencing individual human genomes down to mass market levels).

To Jof’s point – these tools will need to be *very* user friendly to get mass adoption, a hurdle at which many will fall, but which some will clear.  For example, an application which shows you the news stories most read by your friends could well get traction without the users ever thinking they were solving their information overload problem.

The following quote from the Foundry Group blog in 2008 gives further insight into how these tools and filters will work.  The eighteen months since it was written it have maybe rendered the insights more obvious, but they are no less relevant, and nor has the opportunity passed.

We think of the technologies that fall under the implicit web [which I have called ‘tools and filters] theme as a next-generation set of applications, tools and infrastructure that stitch together a long list of interrelated and overlapping ideas: the academic and theoretical ideas behind the Semantic Web, the utility of social networks and social media, crowd sourcing/wisdom-of-crowds, folksonomy, user attention data, advanced search and content analysis tools, lifestream analysis and numerous others.

When combined, these technologies offer the promise of a more unified computing environment that spans the applications where a user consumes and creates information (email clients, web browsers, RSS readers, etc) and is aware of the user’s preferences, interests and interpersonal relationships without requiring a ton of heavy lifting on the user’s part to get useful work done.

The second area perhaps has more promise, and that is the idea of building on the newly available information to create services to create products and services which simply weren’t possible before.  GPS/SatNav devices are a good example here – building on digital maps and GPS data to offer real time driving instructions.  Similarly LastFM built a music service based on data that only became available after people started using digital music players.  Looking forward the concept of VRM is built on the notion of using data generated by online purchase and surfing behaviour to turn the advertising model on its head.  Businesses like this are hard to describe in the abstract because they are solving problems we didn’t necessarily know we had, and in there early stages can expect to encounter lots of naysayers arguing that it will never take off/people don’t need it/they will never pay etc. etc.

Simply writing this post has clarified my thinking, but ideas like these really come to life when they are discussed, so I look forward to your comments.

