On the need for research on Citizen’s data, big and small

NB This post is not about Citizen Science, but about the data trail that each and everyone generates, willingly or not, volunteered or not. It’s also a bit longer than usual. And yes, of course I focus on geographic data.

Isn’t there already a “Big Citizen Data” research band wagon?

Yes, indeed, that’s true. There is a large and still rapidly growing body of research on the collection, analysis and utility of information from Citizens. The labels are just as diverse as the research, and include volunteered geographic information, neogeography, user-generated geographic content, or crowdsourced mapping – and that’s the geospatial domain only! The objectives range from improving humanitarian assistence for those in imminent danger and need, to improving your dinner experience by removing spam from peer rating platforms.

What I am missing, though, is research that explicitly aims to help Citizens in protecting their political rights and their ability to determine what information on them is available to whom. Call it critical geographic information science or counter mapping 2.0. (btw, I would be delighted by comments that prove me wrong on this one!).

Who cares? Well, everyone should. We miss a broad and informed public debate on the issue, despite

  • the ongoing disclosures on the various electronic surveillance programs of several prominent intelligence agencies,
  • the increasing demand of businesses to reveal data on yourself if you want to do business with them, and
  • the carelessness of your social network friends when posting pictures in which you are depicted or posts in which they mention you, The confusion around the new Facebook Graph Search  shows that few people have sufficient knowledge on the technical issues.

My argument is that research on and use of citizen data is more beneficial than risky, because we need knowledge and tools for citizens to help them manage “their” information in this information age.

So are you Post-Privacy or in Denial?

So it seems that we have almost lost control on who is able to know what about us (although I am not sure we ever had real control). We leak data and information on us in many ways:

  • Involuntary: When criminals or government agencies break into accounts of yours or eaves-drop on communication. For those who observe a few basic precautions, this is probably the least likely cause, but also potentially the most harmful one. More common is the re-packaging and sharing of information on you by companies. Although this rather belongs to the next bullet point, since we all accepted the TOS/EULAs, didn’t we?
  • Unknowingly: Who can say they fully understood Facebook’s privacy controls? The myriad options and constant changes to them, coupled with the unpredictable behaviour of your friends, make it practically impossible to control who can read what. During my research utilizing Twitter, I have been wondering numerous times whether people are actually aware that what they post is public and can be retrieved and read by anyone…
  • Willingly: For some, sharing information about oneself seems to be an addictive habit. However, often it is just fun and elicits interesting comments or even conversations.

So there are basically two responses to the problem: Going Post-Privacy, or Denial. In the first case, people just give up, or never cared in the first place, or value an indifferent feeling of security higher than privacy. It’s the dream of businesses and law enforcement alike. A transparent customer is a customer well-exploited well-served, and a transparent citizen is a citizen well-controlled well-protected. Not very appealing, if you ask me. But the alternative seems just as bleak: Basically one would have to forego all electronic services altogether. It is still possible, but it will lead to social and policital isolation and subsistence farming in the long run, because more and more of our social and business interactions will be based on electronic processes. Even the use of cash will become much more restricted to small transactions (as is the case already in Italy) with the justificaiton that it limits money-laundering.

And scientific research?

In a way, science has looked at the issue from the other side of the glass: There is a wealth of information out there, and it is growing. But is it ethical to use this information if it hasn’t been volunteered for this particular purpose [1]? Can we just use someone’s Tweets to find out more about political sentiments at a given location? Can we display a distressed person’s request for help in a crisis situation without their explicit consent? If so, for how long? Do we have to delete the information once the situation might have changed? When making socio-economic small-scale (micro) analysis, what is the level of detail we can go for? Usually it’s restricted to building blocks, but with modern ways to link heterogeous data together, even supposedly anonymous information can be de-anonymized with little or no ancillary information [2][3]. And with advanced machine learning algorithms, one can even predict where we are going to be [4]. To a certain extent, the response by the scientific community mirrors the public discourse. Either a shoulder-shrugging “That’s the way it is, now let’s crunch those numbers and process that text”, or a refusal to use new electronic data, often accompanied by a defensive “There’s no valuable information to be learned anyway in this ocean of trivia.”

But science should do more than just teach use more about our fellow citizens: First of all, it should also enable them to learn something about the consequences of their behavior, and provide options and alternative to change these consequences. In a way, this is only fair, since academic research is funded largely by society through taxes, so that is one obvious reason why research should benefit the members of society, a.k.a. citizens. Further, it should also critically investigate the use of Big Data. I have come to think of Big Data as the Positivist’s Revenge, because it seems we are repeating some positivist mistakes with Big Data: The illusive promise of some objective truth to be found amidst all the patterns. Because now that we have so much data, it means there must be so much information in it, and we can forget 20th century issues like sampling bias, base rate fallacy, reproduction of power structures, marginality, , etc. etc., right? Right?

Well, of course, no. The users and abusers of social media are a highly biased sample of the whole population, and those most likely to be in need of information and empowerment are those who are underrepresented. Examples include the content found on GoogleMaps vs the content found on OpenStreetMap [5][6], and the gender bias of Neogeographers [7]. The social components of the Web 2.0 and the availability of powerful open source software does not automatically result a democratization of power [8].

So what now?

So all is lost? Do we really have to give up and become either digital emigrants, or exploited and controlled subjects? Is constant Sousveillance [9] the only way to fight back? Of course not. There are a number of ideas to turn Big Citizen Data into a citizen’s asset that goes beyond improving his restaurant experience or movie theater choice. Big Data can serve well in the context of crisis management [10]. Companies could become data philanthropists [11]. And we researchers can step up to the challenge and become a bit more idealist again. Yes, idealist researchers could approach the Big Data issues on two fronts:

First, research on methodology that allows the retrieval and analysis of large data sets with low hardware specs, so as to empower those with little knowledge and resources, reducing any Digital Divide, and giving them at least the ability to monitor their own information output. This won’t of course enable them to find out about the information on them that is in private hands. For that, we are desperately in need of legislation: Anyone who uses de-anonymized information on me should be required to inform me about it. Second, research on issues that show the digital divide and approach the digital representation of citizens critically. To be fair, the role and importance of citizens has been acknowledged already by various research funding schemes. But unless researchers step up to the challenges and really care and do something about them, the ubiquitous and frequent mentioning of the term “Citizen” in research proposals will produce nothing but longer and more convuluted proposals.


