How to value big data ? Methodological issues and a glance at the future

Everyone has data, and weâ€™re many to talk about them. But how can we monetize, or at least value social media data ? Hereâ€™s a few ideas given by companies as well as researchers during a brilliant conference hosted in Paris by the Social Media Club. On the panel : Franck Rebillard (a Professor at UniversitÃ© Paris III, working on IPRI project that I suggest to follow), Adrien Schmidt (CEO and founder at Squid Solutions), SÃ©bastien Lefebvre (co-founder at Mesagraph) and ValÃ©rie Peugeot (Future Studies Project Manager at Orange Labs).

Data needs a context

One big issue with data is its context. It seems easy to give data a set of adjectives which hide its reality. The data is often supposed to be easily accessible, objective (it is, in this this meaning, different from an opinion), Â«Â fluidÂ Â», and, eventually, marketable/bankable. Most of the time, itâ€™s not. Data may lack structure, may be partial, may be non transferable, and may not interest anyone (yet).

Mesagraph, for instance, works on tweets to sell apps for social TV. A given show will have, in most cases, a hashtag. Which is not enough to qualify the conversation about this show. Mesagraph takes time to Â«Â cleanÂ Â» the data, to include the names of the anchor or (ever-changing) invitees, to take into account the different ways, including slang, to talk of the show or its attributes. Then, the data flux may be used, in Mesagraphâ€™s case in order to give back the producer or broadcaster a live twitter stream of the show straight on the TV screen.

Data needs a method

Another problem to solve with data is its collection. Scientists talk of new set of Â«Â digital methodsÂ Â», aimed at providing guidelines to work with internet data. Rogers (2010) states that the Internet is now a field itself, which demands specific methods (and not just Â«Â digitalizeingÂ Â» traditional research methods). You may check his webpage here.

Then, data collection needs improving. The Â«Â meme trackerÂ Â» of Leskovec, Backstrom and Kelinberg (Cornell and Standford) shown at the conference, is impressive, as it shows for the whole US presidential campaign of 2008 a datavizualization of some 900 000 news stories from a millions online sources… which, in turn, can be criticized (what Google News provide is biased, as well as Â«Â buyingÂ Â» blog sources from private providers).

Data needs safeguards

Itâ€™s hard to talk data without telling a few words about the ever-existing risk of a total control of our data by companies or state whose future is not stable. Facebook and its tons of personal data experiences from times to times what it feels to be suspected of having too much control on a data most people donâ€™t actively or consciously Â«Â giveÂ Â» to the owner.

This question, unfortunately, seems to be bound to be repeated as a motto or an amulet at every geeky hangout, without, for the time being, any massive consequence (anyone on Diaspora anymore ?)

The future of data and vendor relationship management

However, there may be a way to make people agree again on the data issue, with what is now called Vendor Relationship Management. This new trend puts back your data in your hands – in a kind of personal data store – giving you the capability to agregate and Â«Â offerÂ Â» parts of your data to vendors you believe in so that they could compete to make you their best offers.

Say you want a car. Instead of googling it or asking your friends on Facebook or even crawling endless forum threads, you could just decide to take 3 or 4 data in you store (male, 3 kids, vintage Mini lover, fond of ski) to have selected brands (GM, Nissan, Volkswagen) competing with personalized offers. Doc Searls is actively working on this at Harvard.

The next few years will be interesting, with most likely new actors and giants to rise (Facebook still has a good lead, but other data stay out of the social network), new rights and legal issues to appear, and, hopefully, opportunities to shape whatâ€™s next !

Martin Pasquier