My summer project this year involves Wikipedia again. I’ve already studied the motivations and profiles of Wikipedia translators as well as revision trends in translated Wikipedia articles, so I’m now moving on to tracking translation flows in English Wikipedia. The things I’ll be looking at include how the demand for translation from any of the 290 Wikipedia languages into English has changed over time, how this demand matches up with Wikipedia activity in those languages, how often translations from a given language are flagged for revision, how these revision requests change over time, how often translated articles in need of revision are deleted and why, and whether the change in the number of active users mirrors the changes in translation activity over time.
Are these questions important? I would argue that they’re worth studying for various reasons, but mainly because the discourse about crowdsourcing often emphasizes that the diverse backgrounds of the volunteers who participate in crowdsourced translation initiatives means that translations from and into “exotic” languages can take place and may even be more frequent than in traditional translation projects, where the cost of localizing a website into a language with just a few thousand speakers would be prohibitive (see this interview with Twitter’ Engineering manager Gaku Ueda, for instance). So it’s worth asking questions like whether some languages are prioritized over others and whether some have more activity than others. The answer is likely “yes” in both cases, but if we look at which languages are receiving the most attention, we might be surprised by the results. After all, the five largest Wikipedias, based on the number of articles available in each version, are currently English, Swedish, German, Dutch and French, in that order, but just one year ago, Swedish was ranked 8th. Swedish Wikipedia is not, of course, comprised solely of articles translated from other languages, but the fact that it currently has more articles (and far fewer native speakers) than say German or French led me to wonder whether more translations are flowing into and out languages like Swedish, which are typically less well represented online than languages like English, Chinese and Spanish.
I’m not very far into this project yet, so I don’t have much data to share, but here’s a graph of the demand for translation from French into English, Spanish into English and German into English. I compiled the data using the current Category:Articles needing translation from French [or Spanish or German] Wikipedia page and comparing it with previous versions approximately every six months for the last six years, based on data from in the Internet Archive’s Wayback Machine. (The gap in both the Spanish and German versions exists because the Wayback Machine did not crawl these two pages in 2010).
As the graph shows, requests for translation into English from French, Spanish or German have increased substantially over the past six years. From what I’ve seen so far, French seems to be an anomaly, because the number of articles listed as being good candidates for translation varied widely, sometimes increasing or decreasing by 1000 in just six months. These numbers don’t tell us much yet, but I’ll be digging into them more over the summer: I want to see, for instance, how long it takes for an article to be removed from the list and whether demand for translation from English into these languages is similar. This could help explain whether the number of articles listed as needing translation is increasing because little translating is taking place and a backlog of translations is accumulating, whether it is increasing because Wikipedians are becoming more interested in translation and are therefore adding articles to the lists more frequently now than in the past, whether articles are being translated and are simply not being removed from the list, causing demand for translation to appear inflated, etc. I hope to have more to share soon.
In the meantime, I’d certainly welcome any comments on the project, or thoughts on translation in Wikipedia!