Translation flows in English Wikipedia

My summer project this year involves Wikipedia again. I’ve already studied the motivations and profiles of Wikipedia translators as well as revision trends in translated Wikipedia articles, so I’m now moving on to tracking translation flows in English Wikipedia. The things I’ll be looking at include how the demand for translation from any of the 290 Wikipedia languages into English has changed over time, how this demand matches up with Wikipedia activity in those languages, how often translations from a given language are flagged for revision, how these revision requests change over time, how often translated articles in need of revision are deleted and why, and whether the change in the number of active users mirrors the changes in translation activity over time.

Are these questions important? I would argue that they’re worth studying for various reasons, but mainly because the discourse about crowdsourcing often emphasizes that the diverse backgrounds of the volunteers who participate in crowdsourced translation initiatives means that translations from and into “exotic” languages can take place and may even be more frequent than in traditional translation projects, where the cost of localizing a website into a language with just a few thousand speakers would be prohibitive (see this interview with Twitter’ Engineering manager Gaku Ueda, for instance). So it’s worth asking questions like whether some languages are prioritized over others and whether some have more activity than others. The answer is likely “yes” in both cases, but if we look at which languages are receiving the most attention, we might be surprised by the results. After all, the five largest Wikipedias, based on the number of articles available in each version, are currently English, Swedish, German, Dutch and French, in that order, but just one year ago, Swedish was ranked 8th. Swedish Wikipedia is not, of course, comprised solely of articles translated from other languages, but the fact that it currently has more articles (and far fewer native speakers) than say German or French led me to wonder whether more translations are flowing into and out languages like Swedish, which are typically less well represented online than languages like English, Chinese and Spanish.

I’m not very far into this project yet, so I don’t have much data to share, but here’s a graph of the demand for translation from French into English, Spanish into English and German into English. I compiled the data using the current Category:Articles needing translation from French [or Spanish or German] Wikipedia page and comparing it with previous versions approximately every six months for the last six years, based on data from in the Internet Archive’s Wayback Machine. (The gap in both the Spanish and German versions exists because the Wayback Machine did not crawl these two pages in 2010).

Number of pages listed in the Category:Articles needing translation from [French, Spanish or German] Wikipedia pages, 2009-2015.

Number of pages listed in the Category:Articles needing translation from [French, Spanish or German] Wikipedia pages, 2009-2015.

As the graph shows, requests for translation into English from French, Spanish or German have increased substantially over the past six years. From what I’ve seen so far, French seems to be an anomaly, because the number of articles listed as being good candidates for translation varied widely, sometimes increasing or decreasing by 1000 in just six months. These numbers don’t tell us much yet, but I’ll be digging into them more over the summer: I want to see, for instance, how long it takes for an article to be removed from the list and whether demand for translation from English into these languages is similar. This could help explain whether the number of articles listed as needing translation is increasing because little translating is taking place and a backlog of translations is accumulating, whether it is increasing because Wikipedians are becoming more interested in translation and are therefore adding articles to the lists more frequently now than in the past, whether articles are being translated and are simply not being removed from the list, causing demand for translation to appear inflated, etc. I hope to have more to share soon.

In the meantime, I’d certainly welcome any comments on the project, or thoughts on translation in Wikipedia!

Should I fix this mistake or not? On the ethics of researching Wikipedia

I’ve just finished some of the final edits for an article that will soon be published in has just been published in Translation Studies, and it reminded me that I’ve been meaning to write a blog post about an ethical dilemma I faced when I was preparing my research. So before I turn to a new project and forget all about this one again, here’s what happened.

The paper focuses a corpus of 94 Wikipedia articles that have been translated in whole or in part from French or Spanish Wikipedia. I wanted to see not just how often translation errors in the articles were caught and fixed, but also how long it took for errors to be addressed. It will probably not come as any surprise that almost all of the articles I studied initially contained both transfer problems (e.g. incorrect words or terms, omissions) and language problems (e.g. spelling errors, syntax errors), since they were posted on Wikipedia:Pages needing translation into English, which lists articles that are included in English Wikipedia but which contain content in another language, content that requires some post-translation editing, or both. Over the course of the two years leading up to May 2013, when I did the research, some of the errors I found in the initial translations were addressed in subsequent versions of the articles. In other cases, though, the errors were still there, even though the page had been listed as needing “clean-up” for weeks, months, or even years.

And that’s where my ethical dilemma arose: should I fix these problems? It would be very simple to do, since I was already comparing the source and target texts for my project, but it felt very much like I would be tampering with my data. For instance, in the back of my mind was the thought that I might want to conduct a follow-up study in a year or two, to see whether some of the errors had been resolved with more time. If I were to fix these problems, I wouldn’t be able to check on the status of these articles later, which would prevent me from finding out more about how quickly Wikipedians resolve translation errors.

And yet, I was torn, partly due to a Bibliotech podcast I’d listened to a few years ago that made a compelling argument for improving Wikipedia’s content:

When people tell me that they saw something inaccurate on Wikipedia, and scoff at how poor a source it is, I have to ask them: why didn’t you fix it? Isn’t that our role in society, those of us with access to good information and the time to consider it, isn’t it our role to help improve the level of knowledge and understanding of our communities? Making sure Wikipedia is accurate when we have the chance to is one small, easy way to contribute. If you see an error, you can fix it. That’s how Wikipedia works.

In the end, I didn’t make any changes, but this was mainly because I didn’t have the time. I didn’t want to tamper with my data while I was writing the paper, and after I had submitted it, I didn’t get around to going back through the list of errors I’d compiled to starting editing articles. Most of the corrections would have been for very minor problems, such as changing a general word (“he worked for”) to a word that more specifically reflected the source text (“he volunteered for”), or changing incorrect words for better translations, although the original version would have given users the gist of the meaning (e.g. “the caves have been exploited” vs. “the caves have been mined”). I had trouble justifying the need to invest several hours correcting details that wouldn’t really affect the overall meaning of the text, and yet this question still nagged at me. So I thought that instead I would write a blog post to see what others thought: what is more ethical, making the corrections myself, or leaving the articles as they are, to see how they change over time without my influence?

Wikipedia translation projects: Take 2

Last year was the first time I assigned a Wikipedia translation project in my Introduction to Translation into English course, and I was happy enough with the experience that I tried it again this year. Now that I’m more familiar with Wikipedia, I was able to change the assignment in ways that I hope improved both the student experience and the translations we produced. Here’s an overview of how I modified the assignment this year and what students thought about the project:

Overview of the assignment

For this assignment, students are required to work in groups to translate an article they choose. Like last year, I recommended they select from this list of 7000+ articles needing translation from French into English. Also like last year, as part of the assignment, students had to submit a report about how they divided the work and made their translation decisions. Finally, they had to do a presentation in front of the class to show their finished article and explain the challenges they faced when translating it.

First change: More training on Wikipedia policies and style guides

This year, I spent more class time talking about Wikipedia policies and discussing what would make a “good” English article. During our second week of class, for instance, we covered Wikipedia’s three Wikipedia’s Core Content Policies: neutral point of view, no original research, and verifiability of sources. I asked students to consider how this might affect the articles they chose to translate, and reminded them that they should be aware that even a single word in the source text (e.g. “talented”, “greatest”, “spectacular”) could run counter to these three policies. Last year, for instance, the group translating an article about a historic site in France found adjectives like “spectacular” and “great” used in the French article to describe a tower that stood on the site. In their translation, they deleted these adjectives, because they found them too subjective. After we discussed this example, I asked students to think of other evaluative words they might encounter in their source texts, and then we came up with some strategies for addressing these problems in their translations, including omitting the words and finding a reliable secondary source to quote instead (“X and Y have described the tower as ‘spectacular’).

On Weeks 3 and 4, we took a closer look at the Wikipedia Manual of Style, and in particular, at the Manual of Style for articles about the French language or France and the Manual of Style for Canada-related articles. Though students could choose to translate articles on French-speaking regions other than France and Canada, only those two French-speaking countries have their own style guide. I pointed out the recommendations for accented characters and proper names and we discussed what to do in cases where no rule existed, or where considerable controversy continues to exist, as is the case for capitalization of French titles and expressions. In this case, we created our own rule (follow typical English capitalization rules), but students could still choose to do something else: they just had to justify their decision in the commentary accompanying their translation.

Second change: revised marking scheme

Last year, I’d intended to mark the translations just like any other assignment: I told students I would give them a grade for the accuracy of their translation, based on whether they had any errors like incorrect words and shifts in meaning, and a grade for English-language problems like grammar errors, spelling mistakes, and ambiguous wordings. But a good Wikipedia article also needs to have hyperlinks to other articles, citations to back up any facts, and various other features that are mentioned in the Manual of Style. My marking scheme from last year couldn’t accommodate these things. This year, I marked the translations out of 50, broken down as follows: 15 marks for the accuracy of the translation, 15 marks for the language, 10 marks for conforming to the Manual of Style and adding relevant hyperlinks to other Wikipedia articles, 5 marks for citing references and ensuring hyperlinks are functional, and a final 5 marks for ensuring the translation is posted to Wikipedia, with the corrections I suggested. I also had students submit their translations earlier so I could start marking them before the end of the semester, giving them time to post their final versions before the course was over. Together, these changes made the assignment work much better, and I noticed a big improvement in the quality of the final articles.

Student reactions to the assignment

At first, some students were very nervous about working within the Wikipedia environment. In the first week of class, when I asked how many had ever edited a Wikpipedia article, no one raised their hand. As the weeks went on, I heard comments from the groups about how they needed to spend some time figuring out the markup language, and how to use the sandbox, but by the end of the term, everyone succeeded in posting their translations online.

During their presentations this week, some students even noted that the markup language was fairly easy to learn and that they were glad to have more experience with it because it’s a tool they might need to use in the future. As I’d hoped, many students discovered that researching an article is a lot of work and that just because you’re interested in a topic doesn’t mean it will be easy to translate an article about it. Some students commented that adapting their texts to an English audience was challenging, particularly when English sources about the people and places they’d chosen to write about weren’t readily available. And nearly all of them felt the assignment has made them look at Wikipedia more critically: some students said they would check how recently an article had been updated (since their French article had out-of-date tourism statistics, for instance, or dead hyperlinks), while others said they would be looking to see whether the article cited reliable sources.

Not all of the translations have been corrected and posted online yet, but here are a few that have. I’ll update the list later, when everyone’s done: [List updated April 19]:

  • Aubagne (Students translated the “History”, “Politics” and “Environment and Environmental Policies” sections)
  • Fundy National Park (Students translated the “Natural Environment” and “Tourism and Administration” sections)
  • Louis Calaferte (Students translated the introduction, along with the “early life” and “Career” sections)
  • Lyonnaise cuisine (Students translated the “Terroirs and culinary influences” and “The Mères” sections)
  • Die2Nite

Translation in Wikipedia

I’ve been busy working on a new paper about translation and revision in Wikipedia, which is why I haven’t posted anything here in quite some time. I’ve just about finished now, so I’m taking some time to write a series of posts related to my research, based on material I had to cut from the article. Later, if I have time, I’ll also write a post about the ethics and challenges of researching translation and revision trends in Wikipedia articles.

This post talks about the corpus I’ve used as the basis of my research.

Translations from French and Spanish Wikipedia via Wikipedia:Pages needing translation into English

I wanted to study translation within Wikipedia, so I chose a sample project, Wikipedia:Pages needing translation into English, and compiled a corpus of articles translated in whole or in part from French and Spanish into English. To do this, I consulting recent and previous versions of Wikipedia:Pages needing translation into English, which has a regularly updated list split into two categories: the “translate” section, which lists articles Wikipedians have identified as having content in a language other than English, and the “cleanup” section, which lists articles that have been (presumably) translated into English but require post-editing. Articles usually require cleanup for one of three reasons: the translation was done using machine translation software, the translation was done by a non-native English speaker, or the translation was done by someone who did not have a good grasp of the source language. Occasionally, articles are listed in the clean-up section even though they may not, in fact, be translations: this usually happens when the article appears to have been written by a non-native speaker of English. (The Aviación del Noroeste article is one example). Although the articles listed on Wikipedia:Pages needing translation into English come from any of Wikipedia’s 285 language versions, I was interested only in the ones described as being originally written in French or Spanish, since these are my two working languages.

I started my research on May 15, 2013, and at that time, the current version of the Wikipedia:Pages needing translation into English page listed six articles that had been translated from French and ten that had been translated from Spanish. I then went back through the Revision History of this page, reading through the archived version for the 15th of every month between May 15, 2011 and April 15, 2013: that is, the May 15, 2011, June 15, 2011, July 15, 2011 all the way to April 15, 2013 versions of the page, bringing the sample period to a total of two years. In that time, the number of articles of French origin listed in either the “translate” or “clean-up” sections of the archived pages to a total of 34, while the total number of articles of Spanish origin listed in those two sections was 60. This suggests Spanish to English translations were more frequently reported on Wikipedia:Pages needing translation into English than translations from French. Given that the French version of the encyclopedia has more articles, more active users and more edits than the Spanish version, the fact that more Spanish to English translation was taking place through Wikipedia:Pages needing translation into English is somewhat surprising.

Does this mean only 94 Wikipedia articles were translated from French or Spanish into English between May 2011 and May 2013? Unlikely: the articles listed on this page were identified by Wikipedians as being “rough translations” or in a language other than English. Since the process for identifying these articles is not fully automated, many other translated articles could have been created or expanded during this time: Other “rough” translations may not have been spotted by Wikipedia users, while translations without the grammatical errors and incorrect syntax associated with non-native English might have passed unnoticed by Wikipedians who might have otherwise added the translation to this page. So while this sample group of articles is probably not representative of all translations within Wikipedia (or even of French- and Spanish-to-English translation in particular), Wikipedia:Pages needing translation into English was still a good source from which to draw a sample of translated articles that may have undergone some sort of revision or editing. Even if the results are not generalizable, they at least indicate the kinds of changes made to translated articles within the Wikipedia environment, and therefore, whether this particular crowdsourcing model is an effective way to translate.

So what are these articles about? Let’s take a closer look via some tables. In this first one, I’ve grouped the 94 translated articles by subject. Due to rounding, the percentages do not add up to exactly 100.

Subject Number of French articles Percentage of total (French) Number of Spanish articles Percentage of total (Spanish)
Biography 20 58.8% 20 33.3%
Arts (TV, film, music, fashion, museums) 3 8.8% 8 13.3%
Geography 2 5.8% 12 20%
Transportation 2 5.8% 4 6.7%

(includes company profiles)

2 5.8% 2 3.3%
Politics 1 2.9% 4 6.7%
Technology (IT) 1 2.9% 1 1.6%
Sports 1 2.9% 1 1.6%
Education 1 2.9% 1 1.6%
Science 1 2.9% 0 0%
Architecture 0 0% 2 3.3%
Unknown 0 0% 3 5%
Other 0 0% 2 3.3%
Total 34 99.5% 60 99.7%

Table 1: Subjects of translated articles listed on Wikipedia:Pages needing translation into English (May 15, 2011-May 15, 2013)

As we can see from this table, the majority of the translations from French and Spanish listed on Wikipedia:Pages needing translation into English are biographies—of politicians, musicians, actors, engineers, doctors, architects, and even serial killers. While some of these biographies are of historical figures, most are of living people. The arts—articles about TV shows, bands, museums, and fashion—were also a popular topic for translation. In the translations from Spanish, articles about cities or towns in Colombia, Ecuador, Spain, Venezuela and Mexico (grouped here under the label “geography”) were also frequent. So it seems that the interests of those who have been translating articles from French and Spanish as part of this initiative have focused on arts, culture and politics rather than specialized topics such as science, technology, law and medicine. That may explain why articles are also visibly associated with French- and Spanish-speaking regions, demonstrated by the next two tables.

I created these two tables by consulting each of the 94 articles, except in cases where the article had been deleted and no equivalent article could be found in the Spanish or French Wikipedias (marked “unknown” in the tables), and I identified the main country associated with the topic. A biography about a French citizen, for instance, was counted as “France”, as were articles about French subway systems, cities and institutions. Every article was associated with just one country. Thus, when a biography was about someone who was born in one country but lived and worked primarily in another, I labelled the article as being about the country where that person had spent the most time. For instance, was born in Barcelona, but became a French citizen over thirty years ago and is a politician in France’s Parti socialiste, so this article was labelled “France.”

Country/Region Number of articles
France 24
Belgium 2
Cameroon 1
Algeria 2
Canada 1
Western Europe 1
Switzerland 1
Romania 1
n/a 1
Total: 34

Table 2: Primary country associated with translations from French Wikipedia


Country Number of articles
Spain 13
Mexico 10
Colombia 10
Argentina 7
Chile 3
Venezuela 3
Peru 2
Ecuador 2
Nicaragua 1
Guatemala 1
Uruguay 1
United States 1
Cuba 1
n/a 1
Unknown 4
Total 60

Table 3: Primary country associated with translations from Spanish Wikipedia

Interestingly, these two tables demonstrate a marked contrast in the geographic spread of the articles: more than 75% of the the French source articles dealt with one country (France), while 75% of the Spanish source articles dealt with three (Spain, Colombia and Mexico), with nearly equal representation for each country. The two tables do, however, demonstrate that the vast majority of articles had strong ties to either French or Spanish-speaking countries: only two exceptions (marked as “n/a” in the tables) did not have a specific link to a country where French or Spanish is an official language.

I think it’s important to keep in mind, though, that even though the French/Spanish translations in Wikipedia:Pages needing translation into English seem to focus on biographies, arts and politics from France, Colombia, Spain and Mexico, translation in Wikipedia as a whole might have other focuses. Topics might differ for other language pairs, and they might also differ in other translation initiatives within Wikipedia and its sister projects (Wikinews, Wiktionary, Wikibooks, etc.). For instance the WikiProject:Medicine Translation Task Force aims to translate medical articles from English Wikipedia into as many other languages as possible, while the Category: Articles needing translation from French Wikipedia page lists over 9,000 articles that could be expanded with content from French Wikipedia, on topics ranging from French military units, government, history and politics to geographic locations and biographies.

I’ll have more details about these translations in the coming weeks. If you have specific questions you’d like to read about, please let me know and I’ll try to find the answers.

Wikipedia survey IV (Motivations)

While I’ve still got the survey open in my browser, I thought I’d finish writing about the results. This last post will look at the motivations the 76 respondents gave for translating, editing or otherwise participating in a crowdsourced translation initiative. (I should point out that although the question asked about the “last crowdsourced translation initiative in which [respondents] participated”, 63 of the 76 respondents (83%) indicated that Wikipedia was the last initiative in which they had participated, so their motivations are mainly for Wikipedia, with a few for Pirate Parties International,, open-source software, iFixit, Forvo, and Facebook)

The survey asked two questions about motivations. Respondents were first asked to select up to four motivations for participating.[*] They were then given the same list and asked to choose just one motivation. In both cases, they were offered motivations that can be described as either intrinsic (done not for a reward but rather for enjoyment or due to a sense of obligation to the community) or extrinsic (done for a direct or indirect reward). They were also allowed to select “Other” and add their own motivations to the list, as 11 respondents chose to do.

When I looked at the results, it became clear that most respondents had various reasons for participating: only 4 people choose one motivation when they were allowed to list multiple reasons (and one person skipped this question). All four wanted to make information available to others. Here’s a chart that shows which motivations were most commonly cited. (Click on the thumbnail to see a full-size image):
wikipedia translators-4 motivations

As the chart shows, intrinsic motivations (making information available to others, finding intellectual stimulation in the project, and supporting the organization that launched the initiative) were the motivations most often chosen by respondents. However, a significant number also had extrinsic reasons for participating: they wanted to gain more experience translating or practice their language skills. In the article I wrote about this survey, I broke these motivations down by type of respondent (those who had worked as professional translators vs. those who had not), so I won’t go into details here, except to say that there are some differences between the two groups.

Respondents who chose “Other” described various motivations: one was bored at work, one wanted “to be part of this network movement”, one wanted to improve his students’ translation skills by having them translate for Wikipedia, two thought it was fun, one wanted to quote a Wikipedia article in an academic work but needed the information to be in English, and three noted that they wanted to help or gain recognition within the Wikipedia community. Some more detailed motivations (often with a political/social emphasis) were also cited, either with this question, or in the final comments section:

I am not a developer of software, but I am using it for free. To translate and localise the software for developers is a way to say thank you – Only translated software has a chance to spread and prosper – I get to know new features and/or new software as soon as it is available

As a former university teacher I believe that fighting ignorance is an important way of making world a better place. Translating local knowledge into trans-national English is my personal gift for the humanity 🙂

I’m not sure how you found me because I’m pretty sure I only translated one Wikipedia page… I did it mainly because the subject of the article is almost unknown in the Jewish world, and I wanted more people to know about her and one of the few ways in which I can help make her story more widely known is by translating it into French. That being said I think I’ll try to do more!

The main reason I became involved in crowdsourced translation is that, in my opinion, the translation of science involves more than linguistic problems. It also requires an awareness of context; of why the scientific activities were undertaken, as well as how they fit into the “world” to which they belong. Many crowdsourced translation projects do not take this into account, treating the translation of science as a linguistic problem. This is fallacious. So I participate to fix the errors that creep in.

My translations are generally to make information freely available, especially to make Guatemalan cultural subjects available in Spanish to Guatemalan nationals.

I taught myself German, by looking up every single word in a couple of books I wanted to read about my passionate hobby. I have translated a couple of books in that hobby for the German association regarding that hobby (gratis). Aside from practice, practice, practice, I have had no training in translation. I began the Wiki translations when I was unemployed for a considerable amount of time and there was an article in the German Wiki on my hobby that had a tiny article in English. The rest is history. It’s been a few years since I’ve contributed to Wikipedia, but it was a great deal of fun at the time. Translation is a great deal of work for me (I have several HEAVY German/English dictionaries), but I love the outcome. Can I help English speakers understand the information and the beauty of the original text?

There were very few Sri Lankans editing on English Wikipedia at that time and I manage to bring more in and translate and put content to Wikipedia so other language speakers can get to know that information. I was enjoying my effort and eventually I got the administrator-ship of Sinhala Wikipedia. From then onwards I was working there till I had to quit as I was started to engage more with my work and studies. Well that’s my story and I’m not a full time translator and I have no training or whatsoever regarding that translating.

As these comments show, the respondents had often complex reasons for helping with Wikipedia translations. Some saw it as an opportunity to disseminate information about certain language, cultural or religious groups (e.g. Guatemalans, Sri Lankans) to people within or outside these communities; others wanted to give back to communities or organizations they believed in (for instance, by helping other Wikipedians, by giving free/open-source software a wider audience). But intrinsic reasons seem most prominent. This is undoubtedly why, when respondents were asked to select just one reason for participating in a crowdsourced translation initiative, 47% chose “To make information available to language speakers”, 21% said they found the project intellectually stimulating, and 16% wanted to support the organization that launched the initiative. No one said that all of their previous responses were equally important, which shows that while many motivations are a factor, some played a more significant role than others in respondents’ decisions to volunteer for Wikipedia (and other crowdsourced translation initiatives).

That’s apparent, too, in the responses I received for the question “Have you ever consciously decided NOT to participate in a crowdsourced translation initiative?” The responses were split almost evenly between Yes (49%) and No (51%). The 36 respondents who said Yes were then asked why they had decided not to participate, and what initiative they hadn’t wanted to participate in. Here’s a chart that shows why respondents did not want to participate:
wikipedia translators-4 motivations for not participating

Unlike last time, when only a few respondents chose 1 or 2 motivations for participating, 15 of the 36 respondents chose only 1 reason, and 11 chose only two to explain why they decided not to participate (although they could have chosen up to four motivations). This means that almost 75% of respondents did not feel that their motives for not participating were as complex as their motives for participating. (Of course, it’s also possible that because this was one of the last questions on the survey, respondents were answering more quickly than they had earlier). I had expected that ideological reasons would play a significant role in why someone would not want to participate in a crowdsourced translation initiative (ie. that most respondents, being involved in a not-for-profit initiative like Wikipedia, would have reservations about volunteering for for-profit companies like Facebook), but the most common reason respondents offered was “I didn’t have time” (20 respondents, or 56%), followed by “I wasn’t interested” (12 respondents, or 33%). Only 7 didn’t want to work for free (in four cases, it was for Facebook, while the 3 other respondents didn’t mention what initiative they were thinking of), and only 9 said they didn’t want to support the organization that launched the initiative (Facebook in four cases, a local question-and-answer type service in another, Wikia and Wikipedia in two other cases). There was some overlap between these last two responses: only 12 respondents in all indicated that they didn’t want to work for free and/or support a particular organization.

I think these responses show how attitudes toward crowdsourced translation initiatives are divided, even among those who have participated in the same ones. Although 16 respondents had translated for Facebook (as I discussed in this post), and therefore did not seem ideologically opposed to volunteering for a for-profit company, 12 others had consciously decided not to do so. And even though respondents most commonly said they didn’t participate because they didn’t have time, we have seen that many respondents participated in Wikipedia translation projects because they found it satisfying, fun, challenging, and because they wanted to help disseminate information to people who could not speak the language in which the information was already available. So factors like these must also play a role in why respondents might not participate in other crowdsourced translation initiatives.

On that note, I think I’ll end this series of blog posts. If you want to read more about the survey results, you’ll have to wait until next year, when my article appears in The Translator. However, I did write another article about the ethics of crowdsourcing, and that’s coming out in Linguistica Antverpiensia in December, so you can always check that one out in the meantime. Although I was hoping to conduct additional surveys with participants in other crowdsourced translation initiatives like the TED Open Translation Project, I don’t think I’ll have time to do so in the near future, unless someone wanted to collaborate with me. If you’re interested, you can always email me to let me know.

[*] The online software I used for the survey didn’t allow me to prevent respondents from selecting more than four reasons. However, only 14 people did so: of the 76 respondents, 4 chose 5 reasons, 7 chose 6 reasons, and 3 chose 7 reasons. I didn’t exclude these 14 responses because the next question limited respondents to just 1 reason.

Wikipedia survey III (Recognition, Effects)

It’s been quite some time now since my last post about the Wikipedia survey results, and for that I must apologize. I was side-tracked by some unrelated projects and found it hard to get back to the survey. But I’ve just finished revising my article on this topic (which will be published in the November 2012 issue of The Translator), and that made me sit down to finish blogging about the survey results. This is the third of four posts. I had planned to look at motivations, effects and recognition all in one post, but it became too long, so I’ve split it into two. This one looks at the ways respondents were recognized for participating in crowdsourced projects and what impact (if any) their participation has had on their lives. The next one (which I will post later this week), looks at respondents’ motivations for participating in crowdsourced initiatives.

For anyone who comes across these posts after the article is published, I should mention that the discrepancy between the number of survey respondents in the article and on this blog (75 vs. 76) is because I received another response to the survey after I’d submitted the article for peer review. It was easier to include all 76 responses here, since I’m creating new graphs and looking at survey responses I didn’t have space to explore in the Translator article, but I didn’t update the data in the article because the new response didn’t change much on its own (+/-0.5% here and there) and would have required several hours work to recalculate the figures I cited throughout the 20+ pages.

I also want to thank Brian Harris for discussing these survey results on his blog. You can read his entry here or visit his blog, Unprofessional Translation, to read a number of very interesting articles about translation by non-professionals, including those working in situations of conflict, war, and natural disasters.

And on to the survey results:

The survey asked respondents what (if any) recognition they had received for participating in a crowdsourced translation initiative. Although the question asked about the last initiative in which respondents had participated (rather than Wikipedia in particular), 63 of the 76 respondents indicated that Wikipedia was the last initiative in which they had been involved, so the responses are mainly representative of the recognition they received as Wikipedia translators. Here’s a chart summarizing the responses (click on it for a full-sized image):
wikipedia translators-recognition
As the chart illustrates, no respondents received financial compensation, either directly, by being paid for their work, or indirectly, by being offered a discount on their membership fees or other services. This really isn’t surprising, though, because most respondents were Wikipedia translators, and contributors to Wikipedia (whether translators or writers) are not paid for their work. In addition, since Wikipedia does not charge membership fees, there is nothing to discount. Unexpectedly, though, 20 respondents reported receiving no recognition at all–even though 17 of them listed Wikipedia as the last initiative in which they had been involved. Because Wikipedia records the changes made to its pages, anyone who had translated an article would have been credited on the history page. These 20 respondents may not have been aware of the history feature, or–more likely–they didn’t consider it a form a recognition.[*]

Receiving credit for the translation work (either directly beside the translation or via a profile page) was the most common type of recognition. Of the 18 respondents who selected “Other”, 10 reported being credited on the Wikipedia article’s history page, 1 said their name appeared in the translated software’s source code, 1 noted they had received some feedback on the Wiki Talk page, 1 mentioned receiving badges from Facebook, and the others mentioned their motivations (e.g. just wanted to help, translation became better, could refer to the translation in other academic work) or the effect their involvement had on their careers (e.g. higher rate of pay for translating/interpreting duties at work). I discuss the advantages and disadvantages of this enhanced visibility for translators and translation in an article that will appear in Linguistica Antverpiensia later this year, so I won’t elaborate here, except to say that crediting translators, and providing a record of the changes made to translations makes translation a more visible activity and provides researchers with a large corpus of texts that can be compared and analyzed. In fact, I think Wikipedia’s translations are an untapped wealth of material that can help us better understand how translations are produced and revised by both professional and non-professional translators.

Finally, I asked respondents whether/how their participation in a crowdsourced translation initiative had impacted their lives. Here’s another chart that summarizes the results (again, click on the image to see it in full size):
Wikipedia translators-impact
I was surprised to see that 38 respondents (or 51%) didn’t feel their participation had had some sort of impact: after all, why they would continue volunteering if they were not getting something (even personal satisfaction) out of the experience? However, this may be a problem with the question itself, as I hadn’t listed “personal satisfaction” as an option. If I had, (and I would definitely make this change to the next survey), the responses might have been different. As it is, of the 16 respondents who selected “Other”, 8 indicated that participating gave them personal satisfaction, a sense of pride in their accomplishments, a feeling of gratification, etc. Here are a few of their comments:

Pride in my accomplishments, although I am an amateur translator. I did some cool stuff!

I have the immense satisfaction of knowing that my efforts are building a free information project and hope that my professionalism raises the quality bar for other contributors who see my work (e.g. edit summaries, citations of sources, etc.)

I was spending my spare time on Wikipedia and sharing my knowledge. Moreover I was enjoying what I was doing. That’s it.

As for the rest of the responses in the “Other” category: One person noted that they had been thanked by other Wikipedia users for the translation, another remarked that they had been thanked by colleagues for contributing to “open-source intellectual work”, two said they had learned new things, one had met new Facebook friends, one said they had been asked to do further translation work for the project, two noted they had been invited to participate in this survey, and one (a part-time translation professor) said “My students consider my classes as a useful and positive learning experience” because they help translate for Wikipedia together.

Nearly 1/3 of respondents (22 of the 76) felt they had received feedback that helped improve their translation skills, and I think this point is important: the open nature of Wikipedia (and many other crowdsourced projects) provides an excellent forum for exchanging ideas and commenting on the work of others. But this is also a point that deserves further study, since so few of the respondents reported having training or professional experience translating.

Interestingly, some of the more tangible effects of participating in a crowdsourced initiative, such as receiving job offers and meeting new clients or colleagues, were not often experienced by the survey respondents. I wonder whether the results would be the same if this survey were given to participants in other types of initiatives (translation of for-profit websites such as Facebook, or localization of Free/open-source software such as OmegaT). The results do show, however, that volunteering for crowdsourced translation initiatives has had some positive (and a few negative) effects on the careers and personal lives of the participants, and that personal satisfaction is also an important motivator.

An interesting aside is that of the 20 respondents who reported receiving no recognition, 5 also indicated they had received other forms of recognition, such as their names appearing beside the translation, an updated profile page, or feedback on their work. Respondents may have been thinking of all projects in which they had been involved, instead of the last one, which the question asked about. These 5 respondents all indicated that Wikipedia was the last initiative in which they had been involved.

Wikipedia survey II (Types of Participation)

This is a follow-up to last month’s post describing preliminary results from a survey of Wikipedia translators. To find out about the survey methodology and the respondent profiles, please read this post first.

I initially planned for this survey to be one of several with translators from various crowdsourced projects, so I wrote the participation-related questions hoping to compare the types of crowdsourced translation initiatives people decide to participate in and what roles they play in each one. I haven’t yet had time to survey participants in other initiatives (and, truth be told, I probably won’t have time to do so in the near future), so the responses to the next few questions will have to be only a partial glimpse of the kinds of initiatives crowdsourcing participants get involved in. Here’s a table illustrating the responses to the question about which crowdsourced translation initiatives respondents had participated in. As expected, virtually all respondents had helped translate for Wikipedia. The one respondent who did not report translating for Wikipedia participated in, with a focus on MediaWiki, the wiki platform originally designed for Wikipedia.

Initiative No. of respondents Percentage
Wikipedia 75 98.7%
Facebook 16 21.3%
Free/Open-source software projects (software localization and/or documentation translation for F/OSS projects such as OmegaT, Concrete5, Adium, Flock, Framasoft) 7 9.2% 2 2.7%
TEDTalks 2 2.7%
The Kamusi Project 1 1.3%
Ifixit 1 1.3%
Forvo 1 1.3% 1 1.3%
Anobii 1 1.3%
Science-fiction fandom websites 1 1.3%
Traduwiki 1 1.3%
Orkut 1 1.3%
Der Mundo (Wordwide Lexicon) 1 1.3%
The Lied, Art Song, and Choral Texts Page 1 1.3%

A few points I found interesting. First, I was surprised to see that respondents had participated in such a diverse range of projects. I had expected that because Wikipedia was a not-for-profit initiative, participants would be less likely to have helped translate for for-profit companies like Facebook and Twitter; however, after Wikipedia, Facebook was the initiative that had attracted the most participants. Second, I was intrigued by the fact that almost 10% of respondents were involved in open-source software translation/localization projects. I hypothesized that the respondents who had reported working in the IT sector or studying computer science would be the ones involved in the F/OSS projects, but that was not always the case: when I broke down the data, I found that people from a variety of fields (a high school student, an economics student, two medical students, a translator, a software developer, a fundraiser, etc.) had helped translate/localize F/OSS projects. I think these results really indicate a need to specifically study F/OSS translation projects to see whether the Wikipedia respondents are representative of the participants.

Next, I asked respondents how they had participated in crowdsourced translation projects (as translators, revisers, project managers, etc.) and how much time per week, on average, they had spent participating in their last crowdsourced translation initiative.

Here’s a graph illustrating how respondents had participated in various crowdsourced translation projects. They were asked to select all ways they had been involved, even if it varied from one project to another. This means that the responses are not indicative of participation in Wikipedia alone:
wikipedia translators-roles played

As the graph shows, translation was the most common means of participation, but that wasn’t surprising, because I had invited respondents based on whether they had translated for Wikipedia. However, a significant number of respondents had also acted as revisers/editors, and some had participated in other ways, such as providing links to web resources and participating in the forums. I think this graph shows how crowdsourced translation initiatives allow people with various backgrounds and experiences to participate in ways that match their skills: for instance, someone with weaker second-language skills can help edit the target text in his or her mother tongue, catching typos and factual errors. And someone with a background in a particular field can share links to resources or answer questions about concepts from that field, without necessarily having to do any translating. So when we speak of crowdsourced translation initiatives, it’s important to consider that these initiatives allow for more types of involvement than translating in the narrow sense of providing a TL equivalent for a ST.

Finally, I asked participants how many hours they spent on average, per week, participating in the last crowsourced translation initiative in which they were involved. Here’s a graph that illustrates the answers I received:
wikipedia translators-hours per week

As you can see, most respondents spent no more than five hours per week participating in a crowdsourced translation initiative. On the surface, this may seem to provide some comfort to the professional translators who object to crowdsourcing as a platform for translation, since these Wikipedia respondents did not spend enough time per week on a translation to equal a full-time job; however, hundreds of people volunteering four or five hours per week can still produce enough work to replace several full-time professionals. Not-for-profit initiatives like Wikipedia, where article authors, illustrators and translators all volunteer their time are probably not as problematic to the profession, since professional translators would probably never have been hired to translate the content anyway, but for-profit initiatives such as Facebook are more ethically ambiguous. I’ve discussed some of these ethical problems in an article that will be published in Linguistica Antverpiensia later this year, in an issue focusing on community translation.

In a few weeks, I’ll post the results of the last few survey questions, the ones focusing on motivations for participating, the rewards/incentives participants have received and the effect(s) their participation has had on their lives and careers.

Wikipedia survey I (Respondent profiles)

This is the first in a series of posts about the results of my survey of Wikipedians who have translated content for the Wikimedia projects (e.g. Wikipedia). Because I’ve already submitted an article analyzing the survey, these posts will be less analytical and more descriptive, although I will be able to discuss some of the survey questions I didn’t have space for in the paper. This post will look at the profiles of the 76 Wikipedians who responded to the survey (and whom I’d like to thank once again for their time).

Survey Methodology
I wanted to randomly invite Wikipedia translators to complete the survey, so I first consulted various lists of English translators (e.g. the Translators Available page and the Translation/French/Translators page) and added these usernames to a master list. Then, for each of the 279 languages versions on the List of Wikipedias page*, I searched for a Category: Translators page for translations from that language into English (ie. Category: Translators DE-EN, Category: Translators FR-EN, etc.). I added the usernames in the Category: Translators pages to the names on the master list, and removed duplicate users. This process led to a master list with the names of 1866 users who had volunteered to translate Wikipedia content into English. I then sent out invitations to 204 randomly selected users from the master list, and 76 (or 37%) of them responded. A few caveats: additional Wikipedians have probably translated content for the encyclopedia without listing themselves on any of the pages I just mentioned. Moreover, anyone can generally edit (and translate) Wikipedia pages without creating an account, so the results of the survey probably can’t be generalized for all English Wikipedia translators, let alone Wikipedia translators into the other 280 languages, who are not necessarily listed on the English Wikipedia pages I consulted. Finally, although 76 Wikipedians may not seem like many respondents, it is important to note that many of the users on the master list did not seem to have ever translated anything for Wikipedia: when I consulted their user contribution histories, I found that some Wikipedians had added userboxes to their profile pages to indicate their desire to translate but had not actually done anything else. I was interested only in the views of people who had actually translated, so the 76 respondents actually represents a much larger share of actual Wikipedia translators than it appears.

The vast majority of the respondents (64 respondents, or 84%) were male and most were 35 years of age or younger (57 of the respondents, or 75% were under 36). This result is not surprising, given the findings of a 2008 general survey of more than 176,000 Wikipedia users, where 50% of the respondents were 21 years of age or under (in all, 76% were under 30) and 75% were male.

When respondents were asked about translation-related training, most (51 respondents or 68%) responded that they had no formal training in translation. Here’s a graph with a breakdown for each response:
Wikipdia translators-training

Given that respondents were generally young and usually did not have formal training in translation, it’s not surprising that 52 of the 76 respondents (68.4%) had never worked as translators (ie. they had never been paid to produce translations). Only 11 respondents (or about 14%) were currently working as translators on a full- or part-time basis, while 13 (or about 17%) had worked as translators in the past but were not doing so now. So it’s not surprising either that only two respondents were members of a professional association of translators.

Finally, when asked about their current occupations, respondents reported working in range of fields. I’ve grouped them as best I could, using the Occupational Structure proposed by Human Resources and Development Canada. Two respondents did not answer this question, but here’s an overview of the 74 other responses:

Occupation No. of respondents Percentage
    6 High school students
    4 College/University students (languages)
    17 College/University students (other fields)
27 36%
Works in IT sector 11 15%
Works in language industry 9 12%
Works in another sector (e.g. graphic design, law, education) 8 11%
Works in business, finance or administration 7 9%
Unemployed/stay-at-home parent/retired 5 7%
Academic 3 4%
Engineer 2 3%
Works in sales and service industry 2 3%
Total: 74 100%

Later this week (or early next week), I’ll look at the types of crowdsourced translation initiatives the respondents were involved in (other than Wikipedia, of course), and the roles they played in these initiatives. After that, I’ll discuss respondent motivations for volunteering and the impact their participation has had on their lives.

* There are now 281 Wikipedia versions.

Crowdsourcing experiment with translation students

In Howe’s 2008 book Crowdsourcing: Why the Power of the Crowd is Driving the Future of Business, which I reviewed here, Howe describes TopCoder Inc., a company that develops software for industry partners by administering programming competitions. Twice weekly, competitions are posted and any of the 200,000+ members of the TopCoder community can compete to complete small components of a piece of software that will eventually be delivered to a client. As Howe describes the process, each project is broken down into manageable chunks that are then posted on the TopCoder website so that programmers can compete against one another to write the most efficient, bug-free solution to the problem. After the entries are submitted, TopCoder members then try to find bugs in the submissions, and only bug-free versions pass to be vetted by TopCoder reviewers. Competitions are also posted to see who can compile the components into a single piece of software and run the program bug-free. Members are ranked according to how often they win competitions, and they also receive monetary rewards ranging from $25-$300 for submitting winning entries.

I decided to try out this crowdsourcing model in the classroom by organizing a similar translation competition. Before we started, I spoke a little about translation and crowdsourcing, describing some of the pros and cons, and showing examples of some of the organizations that have relied on crowdsourcing for their translations (e.g. TED, Facebook, Twitter). Then we moved on to translating a text together, with the TopCoder competition as the model for the translation process.

A few days ago, I’d broken up a short text into one- or two-sentence chunks and posted these chunks on an online form, with an empty text box under each one for the translations. Here’s a screen capture of the form, which I created with Google Docs:
crowdsourcing activity form

All students were given two minutes to translate the first segment, and then click on the “Submit” button at the bottom of the page, which automatically uploaded the translated segments to a Google Docs spreadsheet so that we were able to view all the submissions together. Here’s a screen capture of the spreadsheet, to give you an idea of what we were working with in class:
crowdsourcing activity spreadsheet

Once the first sentence had been submitted, students were then able to vote on which translation they preferred. To help speed up the process, each student was allowed to vote only once for their favourite version and then, after one version was declared the “winner”, students were able to make any revisions they wanted, provided a majority of the class agreed with the change. The revised sentence was then added to a Word document so that a final translation could be pieced together from the winning sentences. Students were then given two minutes to translate the second sentence, and, once they had done so, they were invited to vote on the winner and make any corrections to the translation. After we had translated three or four sentences (and were almost out of time), students were asked to comment on the final version and the translation process.

So what did the students think? Most noted that although our translation process (basically a popularity contest, with the possibility of adding a few touch-ups to the winner) worked well enough in our small group, it might not be as successful outside the classroom, since the most popular answer isn’t necessarily the best. Some raised some very valid concerns about how such a process would work on a larger scale or in another context. For instance, they wondered how well would the final text would hold together if it had been translated by multiple people, and how various linguistic groups (e.g. Brazilian and Iberian Portuguese speakers) would settle on an acceptable version.

It seemed, though, that many students enjoyed the exercise, regardless of whether they felt this method of translating would work outside the classroom. One student liked being able to compare the various versions laid out on the screen, because that way, when revisions/corrections were made to the winning translation, students could incorporate ideas from the versions that did not win. Another noted that one person translating alone might get stuck or lack inspiration at certain points, but that this problem would not arise if many people were working on the same text.

Overall, I think this experiment worked well. Using a Google Docs Form really simplified the setup on my end, since I needed no programming skills and was able (in less than 15 minutes) to create an interface we could work with in class. Next year, I’d do this exercise on a week when we had three hours together instead of one where I had scheduled a test in the second half of class. I think this exercise lends itself well to a 2- or 3-hour class, with 20 to 30 minutes for a talk about crowdsourcing, 1 to 1.5 hours to translate, 15-20 minutes to go over the final translation (since I didn’t give any input while students were voting on and revising the submissions and the final version did have some minor language and translation errors), and then 10-15 minutes for students to reflect on the process and the result.

Howe, Jeff. (2008). Crowdsourcing: Why the Power of the Crowd is Driving the Future of Business. New York: Crown Publishing.

Survey on crowdsourced translation initiatives launched

This weekend, I finally began sending out the invitations for the survey I’ve been preparing on crowdsourced translation initiatives. It asks respondents about their backgrounds, whether they have any formal training in translation, why they have decided to participate (or not to participate) in crowdsourced translation projects, and whether their participation has impacted their lives (e.g. whether they received job offers or met new colleagues because of their participation).

I’ve begun with Wikipedia, but I plan to invite respondents who have participated in other crowdsourced translation initiatives, including TedTalks, Kiva and Global Voices Online. I’ve just finished randomly sampling the Wikipedians who have helped translate content into English, but I will now start randomly sampling those who have translated from English into French, Spanish and/or Portuguese. I’m hoping to determine whether participant profiles differ from one project to another: for instance, does the average age of participants vary from one project to another? Do some projects seem to attract more people who have some formal training in translation? Do motivations differ when participants are translating for non-profit initiatives vs. for-profit companies?

Responses have started to trickle in, and I’m already starting to see some trends, but I won’t say anything more until all of the surveys have been submitted and I’ve had a chance to analyze the results. If you’re interested in finding out more details about the survey, please let me know. And if you want to see some of the results, check back in a few months. I expect to have some details to discuss by late March or early April.