Just read an interesting editorial by Kenneth Field (aka The Marauding Carto-nerd). He examines his experiences from recently attending various conferences in Germany, the UK and the US. As subjective as these experiences may be, I think he makes some astute observations and valuable conclusions on the matter of Cartographic Tribalism, with a focus on the neo- vs. traditional and proprietary vs. open source cartographers. A bit lengthy, but definitely worth reading: Cartographic tribalism.
Instead of working on my backlog of half-finished drafts, Big Data issues keep on popping up. A while ago, I posted a longer piece on Big Citizen Data, and remarked that a lot of seemingly 20th century issues on data quality and sampling bias are being steadfastly ignored nowadays. Jonas Lerman has published an excellent argument on the Standford Law Review on the matter of exclusion through digital invisibility. To cite the abstract:
“Legal debates over the “big data” revolution currently focus on the risks of inclusion: the privacy and civil liberties consequences of being swept up in big data’s net. This Essay takes a different approach, focusing on the risks of exclusion: the threats big data poses to those whom it overlooks. Billions of people worldwide remain on big data’s periphery. Their information is not regularly collected or analyzed, because they do not routinely engage in activities that big data is designed to capture. Consequently, their preferences and needs risk being routinely ignored when governments and private industry use big data and advanced analytics to shape public policy and the marketplace. Because big data poses a unique threat to equality, not just privacy, this Essay argues that a new “data antisubordination” doctrine may be needed.” (source: Stanford Law Review, 03.09.2013).
The article is well worth reading, even if the second part is unfamiliar territory for those not well-versed in US law (e.g. me).
It made me rethink (though not change) my attitude towards some of the popular means of getting citizen (customer) information: If no precautions and countermeasures are taken, the socially and financially disadvantaged may actually want to share as much of their data on shopping, leisure activities and other preferences in order to prevent being completely marginalized…
Just a brief addendum to last week’s post: I am puzzled how I could forget to mention initiatives like DataKind. Watch Jake Porway’s uplifting TEDx talk:
NB This post is not about Citizen Science, but about the data trail that each and everyone generates, willingly or not, volunteered or not. It’s also a bit longer than usual. And yes, of course I focus on geographic data.
Isn’t there already a “Big Citizen Data” research band wagon?
Yes, indeed, that’s true. There is a large and still rapidly growing body of research on the collection, analysis and utility of information from Citizens. The labels are just as diverse as the research, and include volunteered geographic information, neogeography, user-generated geographic content, or crowdsourced mapping – and that’s the geospatial domain only! The objectives range from improving humanitarian assistence for those in imminent danger and need, to improving your dinner experience by removing spam from peer rating platforms.
What I am missing, though, is research that explicitly aims to help Citizens in protecting their political rights and their ability to determine what information on them is available to whom. Call it critical geographic information science or counter mapping 2.0. (btw, I would be delighted by comments that prove me wrong on this one!).
Who cares? Well, everyone should. We miss a broad and informed public debate on the issue, despite
- the ongoing disclosures on the various electronic surveillance programs of several prominent intelligence agencies,
- the increasing demand of businesses to reveal data on yourself if you want to do business with them, and
- the carelessness of your social network friends when posting pictures in which you are depicted or posts in which they mention you, The confusion around the new Facebook Graph Search shows that few people have sufficient knowledge on the technical issues.
My argument is that research on and use of citizen data is more beneficial than risky, because we need knowledge and tools for citizens to help them manage “their” information in this information age.
So are you Post-Privacy or in Denial?
So it seems that we have almost lost control on who is able to know what about us (although I am not sure we ever had real control). We leak data and information on us in many ways:
- Involuntary: When criminals or government agencies break into accounts of yours or eaves-drop on communication. For those who observe a few basic precautions, this is probably the least likely cause, but also potentially the most harmful one. More common is the re-packaging and sharing of information on you by companies. Although this rather belongs to the next bullet point, since we all accepted the TOS/EULAs, didn’t we?
- Unknowingly: Who can say they fully understood Facebook’s privacy controls? The myriad options and constant changes to them, coupled with the unpredictable behaviour of your friends, make it practically impossible to control who can read what. During my research utilizing Twitter, I have been wondering numerous times whether people are actually aware that what they post is public and can be retrieved and read by anyone…
- Willingly: For some, sharing information about oneself seems to be an addictive habit. However, often it is just fun and elicits interesting comments or even conversations.
So there are basically two responses to the problem: Going Post-Privacy, or Denial. In the first case, people just give up, or never cared in the first place, or value an indifferent feeling of security higher than privacy. It’s the dream of businesses and law enforcement alike. A transparent customer is a customer
well-exploited well-served, and a transparent citizen is a citizen well-controlled well-protected. Not very appealing, if you ask me. But the alternative seems just as bleak: Basically one would have to forego all electronic services altogether. It is still possible, but it will lead to social and policital isolation and subsistence farming in the long run, because more and more of our social and business interactions will be based on electronic processes. Even the use of cash will become much more restricted to small transactions (as is the case already in Italy) with the justificaiton that it limits money-laundering.
And scientific research?
In a way, science has looked at the issue from the other side of the glass: There is a wealth of information out there, and it is growing. But is it ethical to use this information if it hasn’t been volunteered for this particular purpose ? Can we just use someone’s Tweets to find out more about political sentiments at a given location? Can we display a distressed person’s request for help in a crisis situation without their explicit consent? If so, for how long? Do we have to delete the information once the situation might have changed? When making socio-economic small-scale (micro) analysis, what is the level of detail we can go for? Usually it’s restricted to building blocks, but with modern ways to link heterogeous data together, even supposedly anonymous information can be de-anonymized with little or no ancillary information . And with advanced machine learning algorithms, one can even predict where we are going to be . To a certain extent, the response by the scientific community mirrors the public discourse. Either a shoulder-shrugging “That’s the way it is, now let’s crunch those numbers and process that text”, or a refusal to use new electronic data, often accompanied by a defensive “There’s no valuable information to be learned anyway in this ocean of trivia.”
But science should do more than just teach use more about our fellow citizens: First of all, it should also enable them to learn something about the consequences of their behavior, and provide options and alternative to change these consequences. In a way, this is only fair, since academic research is funded largely by society through taxes, so that is one obvious reason why research should benefit the members of society, a.k.a. citizens. Further, it should also critically investigate the use of Big Data. I have come to think of Big Data as the Positivist’s Revenge, because it seems we are repeating some positivist mistakes with Big Data: The illusive promise of some objective truth to be found amidst all the patterns. Because now that we have so much data, it means there must be so much information in it, and we can forget 20th century issues like sampling bias, base rate fallacy, reproduction of power structures, marginality, , etc. etc., right? Right?
Well, of course, no. The users and abusers of social media are a highly biased sample of the whole population, and those most likely to be in need of information and empowerment are those who are underrepresented. Examples include the content found on GoogleMaps vs the content found on OpenStreetMap , and the gender bias of Neogeographers . The social components of the Web 2.0 and the availability of powerful open source software does not automatically result a democratization of power .
So what now?
So all is lost? Do we really have to give up and become either digital emigrants, or exploited and controlled subjects? Is constant Sousveillance  the only way to fight back? Of course not. There are a number of ideas to turn Big Citizen Data into a citizen’s asset that goes beyond improving his restaurant experience or movie theater choice. Big Data can serve well in the context of crisis management . Companies could become data philanthropists . And we researchers can step up to the challenge and become a bit more idealist again. Yes, idealist researchers could approach the Big Data issues on two fronts:
First, research on methodology that allows the retrieval and analysis of large data sets with low hardware specs, so as to empower those with little knowledge and resources, reducing any Digital Divide, and giving them at least the ability to monitor their own information output. This won’t of course enable them to find out about the information on them that is in private hands. For that, we are desperately in need of legislation: Anyone who uses de-anonymized information on me should be required to inform me about it. Second, research on issues that show the digital divide and approach the digital representation of citizens critically. To be fair, the role and importance of citizens has been acknowledged already by various research funding schemes. But unless researchers step up to the challenges and really care and do something about them, the ubiquitous and frequent mentioning of the term “Citizen” in research proposals will produce nothing but longer and more convuluted proposals.
 Harvey, F. (2012). To Volunteer or to Contribute Locational Information? Towards Truth in Labeling for Crowdsourced Geographic Information. In D. Sui, S. Elwood, & M. Goodchild (Eds.), Crowdsourcing Geographic Knowledge: Volunteered Geographic Information (VGI) in Theory and Practice (pp. 31–42). Berlin: Springer.
New article focusing on the system design and architecture behind our approach to filtering volunteered social media information:
Spinsanti, L, & Ostermann, F. (2013). Automated Geographic Context Analysis for Volunteered Information. Applied Geography 43 (September): 36–44. doi:10.1016/j.apgeog.2013.05.005.
If you can’t access the article, don’t hesitate to drop me a line for a pre-print.
#hochwasser 2013 in Germany
I’d like to summarize my perception of the use of social media during the European floods of 2013, with a special emphasis on Germany (NB most of the links are for German sources; for an excellent blog post focused on Dresden, go here). Since I have been travelling during the event, I had to gather my information just recently, i.e. after the actual event. Therefore, the information certainly is incomplete, and I’d be happy for additional information and corrections by the gentle readers…
For those outside of Germany, here’s a brief overview of what happened:
- The floods were mainly caused by a cold and wet spring resulting in saturated soils, coupled with abnormal meteorological situation and heavy rains for several days end of May and beginning of June.
- The floods affected most countries of central Europe, however I will focus on Germany here.
- In Germany, several Länder were affected, with the worst damage occuring in the South and East.
- The two weeks saw a massive mobilisation of around 75.000 fire fighters, plus 19.000 soldiers.
- Several cities reported record high water lines, several dams burst, and large areas were flooded. There were 14 deaths.
- The situation now mostly under control, only some areas still flooded. Compare the official information here.
Examples of social media use (Facebook pages, maps) include:
- A Google map for the city of Magdeburg curated by four collaborators, with over one million hits and a corresponding Facebook page.
- Another Google map for the city of Dresden, curated by eight collaborators and with almost four million hits.
- A third Google map for Halle, a bit smaller in scope with two contributors and half a million hits.
- Additionally, there are many pages of Facebook, usually focusing on a geographic area or place.
- On Twitter, the most used hashtag seems to be #hochwasser, but many others were also used. On a dedicated channel, requests and offers for help for Dresden could be posted (see also a corresponding website).
As I mentioned, I wasn’t able to collect any data – if someone has data and would like to attempt an analysis, I’d be happy to help out.
For Germany, the use of social media during a disaster was a new experience – fortunately, there are not that many large-scale disasters happening occuring, and the last one (floods of 2002) happened before the advent of social media. In consequence, the use of social media found an echo in more traditional broadcast media (e.g. Handelsblatt, Neue Osnabrücker Zeitung, and Spiegel Online).
Highlights and lowlights
In other words, what worked and what didn’t?
Positive experiences include:
- Many volunteers can be mobilised within little time.
- More information (channels) were available for everyone (with internet connection).
- Self-organizing help (who does what) works overall, with volunteers gathering and providing information, helping in the deployment of sandbags, and aiding the volunteers through infrastructure and consumables.
Some negative experiences were:
- No weighting or ranking available, making it difficult to estimate the importance and urgency of information and requests. Subjective criteria like proximity and local knowledge can help but may be misleading.
- A blurring between private and official channels.
- A lack of feedback and checks led to occasional proliferation of wrong information.
- Too many helpers and a lack of coordination can have a negative impact (coordination, gawkers, …).
But apparently, a lack of coordination can also affect public authorities (article on Cicero).
Algorithms to the rescue?
It’s obvious that the problems described above are not specifically German or flood-related. They are problems that haunt any undertaking of a large crowd. In my humble opinion, there are two main avenues to overcome the problems and thereby increase the utility of social media: Improved filtering and ranking, and improved platforms.
I have been an advocate for algorithmic filtering and ranking of social media messages for some time now (see my research publications and this blog). Various studies show that even in critical situations like disasters, algorithmic approaches can provide two important advantages: First, they can filter out noise and redundant messages. And second, they can organize and enrich the remaining information to faciliitate human curation. Examples for algorithmic approaches include Swiftriver and GeoCONAVI, with ongoing research for example at the QCRI. The Ushahidi platform and the Stand-By-Task-Force are examples for successful human (crowd sourced) filtering and curation.
I have also been a long-time skeptic of the utility of information streams, which are one of the dominating characteristics of Web 2.0 (from the proverbial Twitter streams to Facebook’s Timeline to the increasing number of “live tickers” on news sites that replace journalistic and editorial care taking with unfiltered and raw data). These relentless streams of information don’t stop for important news, and marginal (but nevertheless important events) risk being overlooked. He who shouts the loudest and the longest wins (the battle for attention). In order to organize the flood of information, a more interactive interface is necessary, such as … a map! Putting the textual information from Facebook posts, Tweets and other sources on a combined map and make the information searchable by place, time and content would be a significant improvement. While I wish to express my sincere congratulations and respect to the map makers linked above, it is also obvious that for larger events and more up-to-date information, more resources are needed. Either computing power and algorithms, or volunteers and professionals. Or even better, both.
Can we do it?
It seems that the current state of affairs in Germany resembles the situation of the Californian wildfires of 2007. I’m not trying to be condescending here – this is not surprising because there are fewer natural disasters in Germany, and the infrastructure for dealing with those is generally good (and it seems there is still room for improvement in the US, too).
However, simply tapping into the gigantic information stream is not the solution per se (as Patrick Meier argues as well), but a first step. There are many examples that show it’s possible, and our GeoCONAVI system used off-the-shelf hardware to monitor four European countries for social media on forest fires. In my opinion, the big problems are not computational, but ethical, legal and organizational. Legal implications include issues of privacy (although if only public messages are being used, this is less of a problem), and liability - what if wrong information leads to property damage, or even worse to the loss of human life? Organizational and political obstacles at least in Germany are the many agencies involved in civil protection: On the Federal level (strictly for defence issues), the Länder level (strictly for natural disasters and such, and each Land has its own agency), plus the various organizations such as (volunteer) fire departments, Technisches Hilfswerk, etc etc. Since disasters don’t stop at geographical or organizational borders, this could be a real problem, although it seems that the during the 2013 flood the public authorities coordinated their work rather closely and well (with the exception mentioned above). The EU has also a new Emergency Response Centre based on the capabilities and knowledge from the JRC.
I’d like to recommend two excellent critical papers on user-generated geographic content and the geosocial web. The first one is by Muki Haklay and raises important issues on the democratizing effects of the Web 2.0 and neography, while the second one by Crampton et. al. takes up the issue and suggests possible solutions to improve the study and analysis of geosocial media.
In his study , Haklay argues that neographic theory and practice assume an instrumentalist view of technology, i.e. that technology is value-free and that there is a clear seperation between the means and the ends. Obviously, Haklay does not agree with this view and argues that there is less empowerment and democratization to be found than commonly assumed. In order to realize the full potential of neographic tools and practices, anyone implementing neogeographic tools or practices needs to take into account economic and political aspects. There is a substantial body of work supporting Haklay, including the research by Mark Graham , which I recommended in my last post. Patrick Meier on iRevolution has a in-depth commentary of Haklay’s paper  and provides a somewhat more optimistic interpretation. My own point of view is running along similar lines as Haklay’s, in that the contemporary digital divides are a continuation of old power divides that participatory GIS sought to overcome in the 90s. And while I have no ill will towards companies that add value to user-generated content, I am highly skeptical of such “involuntary crowdsourcing”, in which the crowd provides freely the raw material but in the end has to pay for access to derived products . There is some similarity to the argument for Open Government Data – why should the tax payers (and tax paying companies) pay again for the use of the data, when they already payed for the creation of it?
Crampton et al.  investigate critically the hype around the “Big Data” geoweb. They remind the reader of (a) the limitations inherent in “big-data”-based analysis and (b) shortcomings of the simple spatial ontology of the geotag. Concerning (a), the data used often has limited explanatory value or informational richness, something our research has shown as well . Further, geocoded social media are still a non-representative sample, no matter how many of them one has collected. Concerning (b), Crampton et al. point out a number of problems with the geotag, e.g. that it is difficult to ascertain whether it refers to the origin of the content or the topic of the content, its lineage and accuracy, and its oversimplification of geography by limiting place geometry to points or lat/lon pairs (see also ). As a consequence of their analysis, the authors suggest that studies of the geoweb should try to take into account:
- social media that is not explicitly geographic
- spatialities beyond the “here and now”
- methodologies that are not focused on proximity only
- non-human social media
- geographic data from non-user generated sources.
I have to admit that I am a little bit proud to say that our research has addressed three of those suggestions: We haven’t limited our sample to geo-coded social media, instead we have re-geo-coded even those with existing coordinates to ensure that we capture the places the social media was about. We have also gone beyond the “here and now” by spatio-temporal clustering data. Finally, a core concept of our approach is the enrichment of the social media data with explicitly geographic data from non-user generated (i.e. authoritative) sources (a paper describing the details has just been accepted but not published yet, an overview can be found here ).
Crampton et. al. conclude their paper with the important reminder that caution is needed regarding the surveillance potential of such research, with intelligence agencies around the world focusing more and more on open source intelligence (OSINT). Indeed it seems that even in Really Big Data, our spatial behaviour is unique enough to allow identification .