This blog post is unfortunately a bit late – my attention and time got diverted by many other things. Still, I find the issues too important to just let it rest. The first part, political implication of research, was sparked by a research article I discovered, while the second part is motivated by recent the youth riots in London. This is a far stretch, it seems, yet the connection is closer (and potentially more worrisome) than one might think.
The research article I am referring to is titled “Social Media Analytics for Radical Opinion Mining in Hate Group Web Forums” (Yang, M., et al.; full info and article here). In short, they collected posts from two US neo-Nazi/racist forum sites, had them manually annotated, then combined machine-learning and semantic methods to create data sets containing syntactic, stylistic, content-specific, and lexicon features. These were used to train Support Vector Machines (SVM), naive Bayes, and Adaboost text classifiers, which in turn were validated using a second set of collected posts from a later period. While the results itself are not surprising (the more features you include, the higher the F1-score, with SVM consistently being the best technique), I was surprised by the accuracy which the authors report. But then I am still an amateur at machine learning and text classification techniques (though I am trying to catch up). My main point is not so much the methodology, which seems sound to me, but the implications. Replace “hate group” with “democratic dissidents”, racist slurs in the posts with calls for freedom of speech, and researchers with secret police, and you have the picture. I am fully aware that the benefits of research like this can be enormous when it helps us to manage the flood of data and information we are producing. Hey, my own research has similar objectives. However, this article made me wonder how much research there is on protecting user privacy and anonymity and intentions. (Hints are most welcome). It seems an important research objective for me, considering for example the recent events in London.
While the youth riots seem mostly a-political in nature to me, the response of the British government was equally frightening. Lots of pressure on the company selling the device purportedly used to orchestrate the riots (RIM), and calls for monitoring or even shutting down social media entirely. I find it difficult and pointless to speculate on the capabilities of governments (or private companies…) to actually control and monitor communication, because I am convinced that even in democratic countries we are not being told the full picture. Yet the research presented above lets me doubt the optimistic opinion expressed in Techcrunch, for example, where this post argues that predicting a riot (or peaceful demonstration…) is impossible on social media. Social media played an important and beneficial role during the Arab spring, because the authoritarian regimes were not very tech-savvy. Surviving authoritarian regimes will have learned their lesson.