Monthly Archives: August 2011

Of all the social media platforms I know, I find Twitter the hardest to put to good use. In fact, I haven’t succeeded yet, which is the reason why you won’t find my Twitter account linked here, although I do have several… For those of you who know me a little bit, this may come as a surprise, since my current research project relies heavily on using Tweets. However, having roughly 25 million Tweets at my disposal and mining  them for information on forest fire events has only increased my scepticism. This is not to say that I believe it has no use at all (otherwise I would look for another job), just that it seems to suffer from all the known Facebook problems without offering (yet) any substantial added benefits. My main “problem” with Twitter, both professionally and for private use, is twofold: First, the incredibly low quality of many posts. This observation is mainly the result of my professional exposure to Tweets, since as I said I am not particularly active on Twitter personally.  The amount of junk that is being sent around is amazing. After filtering and sharding, I manually annotated (together with a colleague) roughly 6,000 Tweets. I think my brain has taken irreperable damage from that exercise. Now I know why some other research team used Amazon’s MTurk to do that (I have my issues with MTurk, but it can be a sanity-saver). The second problem is, of course, the amount of information that is being posted. I would consider myself to be a fairly average FB user, having less than 200 “friends”, of which only a fraction posts stuff. Yet it is already overwhelming. And of course, all my friends post only cool and relevant and useful stuff, so I don’t want to miss a single post ;-). On Twitter, this problem is excerbated. At least in the set I investigated (you want numbers, my friend? Please, it’s Friday afternoon, some other day… after all, you trust me, don’t you?).

So, when I try to imagine a filter, I see three main approaches to build my little semi-permeable filter bubble: First, I can filter for content, e.g. using some keywords. Easily enough done, but one of the reasons I am on a social network is that I want to learn about new, unknown stuff. Then, there is the second filter of interaction. A person that I interact with very often can be considered to be closer or more relevant than others with whom I exchange only a virtual birthday card once every year. This is probably one of the filters already employed by FB to present your “main” messages. Even better yet, if I am able to categorize my connections myself and prioritize their messages (circles, anyone?). But once you have several hundred connections, categorizing them can become a real pain in the ****, with lots of errors sneaking in. Or did you design a thorough logical schema of connection “types” or “categories” beforehand, and then meticulously assigned your friends, acquaintances, collegues to them? Please let me know, so I can make sure I put you on my ignore list…. Ahem, just kidding, since you certainly put me in the “Don’t miss a single post!” category, right? Anyways, I would like to have a third-party opinion on this. I would like my connections to have some measure of credibility, trustworthiness… or authority.

So finally, I have arrived at the original topic of this. Sorry for getting carried away a little bit. I am also aware that I am not the first person to make these observations or comments. (In fact, I have been reading through a big stack of research literature dealing with this topic. I’ll take the liberty to not reference them here, except for one, see below). Actually, a few years ago there seems to have been a flame war raging that I totally missed (speak of parallel universes.. if you want an overview, you can look for example at Techcrunch and Twittermaven). Suffice it to say that someone suggested the number of followers would be an indicator of authority, and therefore a suitable way to rank or filter Tweets. This sparked several developments. First, of Twitority, which ranks a search based on the authority of the source. However, there are obvious problems with this approach, such as its vulnerability to relatively simple spamming techniques. Then, there was another approach by Daniel Tunkelang, first posted on his blog thenoisychannel, which got implemented as TunkRank. Unfortunately, at the time of this writing, I could not test it, as my queries were always queued. Fortunately, someone else has done already a great job at comparing different Twitter ranking algorithms: Daniel Gayo-Avello has collected a large data set of Tweets, and thoroughly investigated the performance of several ranking algorithms to detect spammers. This blog entry gives an overview, and a link to the full research article (highly recommended!).  “Winner” of the contest is TunkRank, which outperforms all other ranking algorithms.

So is the solution to my problems with Twitter? Will you have me as a new follower soon? Don’t know. First I’ll have to apply this new knowledge to our dataset…

On a final note, if you’d like to read up on the topic, both the journal paper and another blog post by Daniel Gayo-Avello give lots of references. As soon as we managed to write up all our stuff, you’ll get a first glimpse here.