The question of who owns the data we generate on Internet is a common and challenging problem (e.g. all privacy issues about Facebook). However, we don’t talk too much about our real life. We also generate digital traces when we interact with more traditional systems like bank, supermarket, or telecommunication providers. Such data are stored, used for marketing purposes and sometime sold by the companies. Just for fun I recently started to think about new ways to capture and own data I generated via these three big industrial sectors. I report here some technics about low-cost instruments to gather and enrich such data.
First, let me suggest why self-monitoring, owning and sharing such data could be greatly valuable for you and the whole community. Here are few data usages for food consumption data, for instance.
- Toward an augmented perception: By mixing such data with other sources of information, we could augment our perception of our consumption, from a unique indicator (e.g. the price) to a diversity of them enabling a better understanding of its complexity.
- Healthcare. For example, by enriching food consumption data with related nutrition data, I could analyze and understand my consumption from a healthy diet point of view (do I eat too much meat, not enough seasonal foods? etc) and have pro-active personal recommendation to support a change of behavior.
- local obesity studies. At a collective level, sharing such data with other could also provide insights in obesity related issues in specific communities or at a regional level.
- Economy/Market. Idem, a participatory sensing approach could be used as a monitoring system to observe indirectly the evolution of pricesand quantities consumed by aggregating [a representative view of] all the data, interesting for customers(which store is the cheapest according to my daily consumption?) and for small/local producers to negotiate with supermarkets stores (what is the current consumption of tomato?)
- Sustainability. Mixing food data with environmental related data to generate new indicators: a) location of production (do I eat local or not?). b) waste indicators. If I can observe what I consume, I can indirectly observe my waste. Furthermore, if we have geographical information about the location of each participant, we can build a real-time monitoring system of the generated waste of a community.
- Distributed ownership. Instead of having a centralisation of data owned by companies using and selling these data, we could use the cloud computing to enable each individual to store and own their data and create collectively virtual databanks with distributed ownership.
- Research in science. Such instruments will enable an alternative low-cost infrastructure to gather data for large studies in healthcare, sociology, economics, environmental science etc: volunteer could participate to scientific projects by accepting to share their data or building a data commons. A much lower effort approach than questionnaire driven studies.
- Business model. The change of ownership could also lead to a switch in the business to sell marketing data. Each individual is now free to sell his consumption profile by participating to a collective selling (a group of people deciding to sell their current profile) to a third company for market studies.
1- Grocery: Food consumption sensor
My experience about data openness
I am going to start this article with an experience I had about the accessibility of data owned by the supermarket sector. Like any nice costumer, I have a loyalty card of my supermarket (InterMarché). I use this card when I purchase products and thus accumulate points I will never use. Therefore, the supermarket has a huge database with data about the personalized consumption of their customers. However, as costumer, I do not have access to such data: no list, no graph, and no raw data. Recently I sent several emails to InterMarché to request the history of my purchases. According to a French law, you are allowed to request any personal data. The only answer I received was: “Sorry we don’t have such data”. Then I sent an official letter. And then I received the same official answer ” we don’t store such data”. Come on guys! Why do they propose me this card if it is not to get personalized data about my consumption behavior? Unbelievable!
Your phone as a food sensor
After my unsuccesfull request to get the data from InterMarché, I decided to bypass it. I wanted an easy and low-cost way to monitor my consumption.
Digital world solution: First, I thought about capturing online supermarket, the future…Observing digital purchases is much easier. I developed a plug-in for the browser so my sister -the only person I know using online grocery market- could record their food consumption (this is a background and automatic process, no effort is required from the participants). However, online supermarket is still marginal in the (French?) culture. People need to touch fresh foods, to move in this social space, to live a full real hands-on experience before buying.
Real world solution: In real world situation, you have existing applications with code bar recognition , but 1/ their philosophy is different: it about getting more information about a product during your purchase (augmented reality), but not to capture the user’s consumption. 2/ it doesn’t work for fresh food. 3/you need to have database with all the products, with updated prices + the quantity to use them as sensors. A clear barrier. Therefore, I developed a c++ computer vision software trained to recognize and extract food information from a ticket. I used a set of image processing/computer vision techniques to extract information from a photo of a receipt (A) and get as output a table (B). This table contains the price, the quantity, the unit and the description of each article, a bit more information to bootstrap a database than the previous solutions :). We can image a mobile application that send this picture to a server, and then use my program to extract the extracted foods (a further step will be to ask to the user to correct the final step if there are errors -crowdsourcing).
The application takes a picture of a receipt as input and use different computer vision technics (contour detection, adaptative threshold, receipt specific trained OCR) to extract food related information.
The accuracy of the recognition is quite high (very few errors) with InterMarché tickets (the only supermarket I have tried). Of course, the data are still poor in term of semantic, but linking description with food categories is, I think, manageable via crowdsourcing because individuals have habits limiting repetitive efforts. Of course this solution doesn’t resolve all monitoring related issues like how to sense what I eat If I go frequently to restaurants (even if receipts are sometime quite explicit) or if I buy my foods in a local market (generally any receipt is given). If you know alternative low effort and non-intrusive approach to sense food consumption, let me know!
2- Bank transaction sensor
Limited access to transaction data
The bank industry is a still one of the last industries restricting the openness of their data to the web, which is a bit understandable. Such lack of openness limits the potential to link data with other web data in order to build innovative services and the development of ecosystems. Despite thousands existing web public API  about weather, shopping, multimedia-related data, you will not find any API from a bank company facilitating the automatic exportation of their customer’s data. Well, you have few web services to manage personal money, like Mint.com for US banks, moneydashboard.com for UK banks launched this month, but 1/ no public API is provided and 2/nothing for French banks. Imagine you have personal artificial agents observing your bank stream and able to see abnormal behaviors or error of the bank and alert you.
Passing the Turing Test
The main complexity lies in the security constraints. I’m at “La Banque postale” Bank. The authentication process is a kind of improved captcha: to enter my digital password, I need to move my mouse over images representing the numbers of my password. Once connected, it’s easy to extract the last 30 transactions, but again I don’t have access to all my transactions! I finally managed to hack this captcha using computer vision techniques (I don’t know if it’s legal…). My program can now automatically log into my bank account and observe my transaction stream; As far as I know this is the first (unofficial) Web API for personal bank account.
Geolocalizing & tagging transaction
I enriched the data with automatic processes: tagging category (e.g. transaction about transport, withdrawal, supermarket, etc.) and geolocalizing transactions. Mixing location+ transaction enabling to build a map of a transaction data (first time I see such visualization!)
Web interface about the stream of my tagged transaction data
a geographical representation of my bank transaction activity
Artificial agents: my personal bank assistant
I have also programmed agents enabled to observe my transaction stream and alert me about salient events e.g. (top-down events e.g. receiving (or not) money I’m waiting, bottom-up events e.g. when I’m closed to the bankrupt, when I have a burst of transaction activities etc. These are for simple scenarios. Datamining for more complex temporal behavior patterns.The point is that I’m free to create or extend scenarios. Furthermore we could easily implement some tricks for saving money from the video “Making Money Less Abstract” of Dan Ariely’s, Professor of Behavioral Economics (“what will you not be able to do in the future if you bought this ?”=>Translating purchases in terms of the things you are interested in. + automatic deductions + creating envelop for each category).
3- Phone call & TV sensors
You watch too much the TV!
This part is dedicated to teenagers :). Did you remember when your parents told you to “stop to make calls because it’s expensive” or the famous sentence “you watch too much the TV!” I wanted to observe my TV consumption. I’m not a big fan of TV, but having some statistics will be fun. In addition, many agencies pay to get audience indicators. Concerning phones, I still don’t have mobile phone… (yes I know:) . I only use normal phone. I would like to get a history of my phone calls to analyze social patterns for instance. I first used my phone bill (in pdf) as a sensor proxy. However, it is only the paying calls e.g. the calls I emitted, not the one I received. How to record every call? (and every SMS for normal people having mobile phones).
Hacking Internet box
Hopefully we have now “Triple play” Internet box. Such box is, I think, an underestimated tool with lots of potential, a mini always-on server that could be used for doing distributed computing, or an interface to control home and collect sensors etc… In theory. Because practically you don’t have access to the internal OS. So I changed the firmware by a hacked version to get a shell access. After having explored the content, I’m now able to get access to the history of the phone calls and the current watched TV channel, store the data to an external infrastructure and thus trigger actions in real time.
Enriching Data and Acting via agents
I then enriched the data. I link the phone numbers with their related person/organization’s name by reversing them using my Gmail Contact or external web services (pagesjaunes). Idem for the TV, I used program TV services to tag the category of emission, the actors etc.
I also implemented some scenarios: when a call is missed, an artificial agent detects it and sends me an email with the name of the person who calls at my home and her/his number. Concerning the TV, pro-active channel recommendation could be provided: imagine if an agent considers the show, I watched as too stupid, it could stop the TV or change the channel to a better one, a documentary; it would be funny 🙂 We can also imagine real-time multi-canal advertising system: advertising specific agents know what I’m watching in real time and can deliver personal advertising, etc. connected with facebook profile
I still don’t know if such instruments are really interesting but they have been as least useful for me to think about the empowerment and impacts related to such data. They also provide me some insights about lightweight infrastructure of sensors-actors networks. Linking data forge relationship between two kinds of information, two kinds of sensors: 1/ Numerical: Machine-based sensors e.g. camera, microphone, gps, acceleratometer, time, capturing separatly different physical elements of our world. 2/ Symbolic: People play also the role of semantic sensors when they add tags or comments to a picture to capture more symbolic elements of the user experience (e.g. event, social ties, person name etc). NOTE: Social information could be used as a proxy sensor to measure physical reality e.g. using twitter as an environmental sensor to map environmental phenomena such as weather or noise pollution by gathering the status of the people. How both types of sensor (numerical , symbolic) could be easily merged to create new hybrid sensors is I think a classical question in A.I.
I’m not an expert in Semantic Web but representation frameworks and visualization tools mixing semantics and numerical data seems for me quite poor. For instance numerical representation of phenomena badly fits with RDF , the standard concept representation in the Semantic Web world. How to link the world of classifiers / pattern recognition working in numerical spaces with the less organic world of RDF? How linking a numerical set with a concept in RDF? How integrate dynamics in the knowledge i.e. a temporal dimension in RDF. How to find statistical patterns within a dataset mixing symbolic and numerical data? Some questions dont have clear answers for me.
(if you are interested in such research or in testing some prototypes, contact me :))