Wikipedia translation projects: Take 2

Last year was the first time I assigned a Wikipedia translation project in my Introduction to Translation into English course, and I was happy enough with the experience that I tried it again this year. Now that I’m more familiar with Wikipedia, I was able to change the assignment in ways that I hope improved both the student experience and the translations we produced. Here’s an overview of how I modified the assignment this year and what students thought about the project:

Overview of the assignment

For this assignment, students are required to work in groups to translate an article they choose. Like last year, I recommended they select from this list of 7000+ articles needing translation from French into English. Also like last year, as part of the assignment, students had to submit a report about how they divided the work and made their translation decisions. Finally, they had to do a presentation in front of the class to show their finished article and explain the challenges they faced when translating it.

First change: More training on Wikipedia policies and style guides

This year, I spent more class time talking about Wikipedia policies and discussing what would make a “good” English article. During our second week of class, for instance, we covered Wikipedia’s three Wikipedia’s Core Content Policies: neutral point of view, no original research, and verifiability of sources. I asked students to consider how this might affect the articles they chose to translate, and reminded them that they should be aware that even a single word in the source text (e.g. “talented”, “greatest”, “spectacular”) could run counter to these three policies. Last year, for instance, the group translating an article about a historic site in France found adjectives like “spectacular” and “great” used in the French article to describe a tower that stood on the site. In their translation, they deleted these adjectives, because they found them too subjective. After we discussed this example, I asked students to think of other evaluative words they might encounter in their source texts, and then we came up with some strategies for addressing these problems in their translations, including omitting the words and finding a reliable secondary source to quote instead (“X and Y have described the tower as ‘spectacular’).

On Weeks 3 and 4, we took a closer look at the Wikipedia Manual of Style, and in particular, at the Manual of Style for articles about the French language or France and the Manual of Style for Canada-related articles. Though students could choose to translate articles on French-speaking regions other than France and Canada, only those two French-speaking countries have their own style guide. I pointed out the recommendations for accented characters and proper names and we discussed what to do in cases where no rule existed, or where considerable controversy continues to exist, as is the case for capitalization of French titles and expressions. In this case, we created our own rule (follow typical English capitalization rules), but students could still choose to do something else: they just had to justify their decision in the commentary accompanying their translation.

Second change: revised marking scheme

Last year, I’d intended to mark the translations just like any other assignment: I told students I would give them a grade for the accuracy of their translation, based on whether they had any errors like incorrect words and shifts in meaning, and a grade for English-language problems like grammar errors, spelling mistakes, and ambiguous wordings. But a good Wikipedia article also needs to have hyperlinks to other articles, citations to back up any facts, and various other features that are mentioned in the Manual of Style. My marking scheme from last year couldn’t accommodate these things. This year, I marked the translations out of 50, broken down as follows: 15 marks for the accuracy of the translation, 15 marks for the language, 10 marks for conforming to the Manual of Style and adding relevant hyperlinks to other Wikipedia articles, 5 marks for citing references and ensuring hyperlinks are functional, and a final 5 marks for ensuring the translation is posted to Wikipedia, with the corrections I suggested. I also had students submit their translations earlier so I could start marking them before the end of the semester, giving them time to post their final versions before the course was over. Together, these changes made the assignment work much better, and I noticed a big improvement in the quality of the final articles.

Student reactions to the assignment

At first, some students were very nervous about working within the Wikipedia environment. In the first week of class, when I asked how many had ever edited a Wikpipedia article, no one raised their hand. As the weeks went on, I heard comments from the groups about how they needed to spend some time figuring out the markup language, and how to use the sandbox, but by the end of the term, everyone succeeded in posting their translations online.

During their presentations this week, some students even noted that the markup language was fairly easy to learn and that they were glad to have more experience with it because it’s a tool they might need to use in the future. As I’d hoped, many students discovered that researching an article is a lot of work and that just because you’re interested in a topic doesn’t mean it will be easy to translate an article about it. Some students commented that adapting their texts to an English audience was challenging, particularly when English sources about the people and places they’d chosen to write about weren’t readily available. And nearly all of them felt the assignment has made them look at Wikipedia more critically: some students said they would check how recently an article had been updated (since their French article had out-of-date tourism statistics, for instance, or dead hyperlinks), while others said they would be looking to see whether the article cited reliable sources.

Not all of the translations have been corrected and posted online yet, but here are a few that have. I’ll update the list later, when everyone’s done: [List updated April 19]:

  • Aubagne (Students translated the “History”, “Politics” and “Environment and Environmental Policies” sections)
  • Fundy National Park (Students translated the “Natural Environment” and “Tourism and Administration” sections)
  • Louis Calaferte (Students translated the introduction, along with the “early life” and “Career” sections)
  • Lyonnaise cuisine (Students translated the “Terroirs and culinary influences” and “The Mères” sections)
  • Die2Nite

What translators can learn from the F/OSS community

Looking through my blog archives late last year, I was disappointed to discover I’d posted only seven articles in all of 2013: usually my goal is to get at least one post up every month, and last year was the first time since 2009 that I hadn’t been able to achieve that. So my goal for this year is to blog more frequently and more consistently. And with that, here is my first post of 2014:

In November, I came across a blog post that hit on a number of issues relevant to the translation industry, even though it was addressed to the Free/Open-Source Software (F/OSS) community. It’s called The Ethics of Unpaid Labour and the OSS Community, and it appeared on Ashe Dryden’s blog. Ashe writes and speaks about about diversity in corporations, and so her post focused on how unpaid OSS work creates inequalities in the workforce. As she argues, the demographics of OSS contributors differs from that of proprietary software developers as well as the general population, with white males overwhelmingly represented among OSS contributors: One source Ashe cites, for instance, remarks that only 1.5% of F/OSS contributors are female, compared to 28% of contributors for proprietary software. Ashe notes that lack of free time among groups that are typically marginalized in the IT sector (women, certain ethnic groups, people with disabilities, etc.) is the main reason these groups are under-represented in OSS projects.

These demographics are problematic for the workforce because many software developers require their employees (and potential new hires) to have contributed to F/OSS projects. And while some large IT firms do allow employees to contribute to such projects during work hours, people from marginalized groups often do not work at these kinds of companies. This means people who would like to find paid employment as software developers probably need to be able to devote unpaid hours to F/OSS projects so they have a portfolio of publicly available code for employers to consult.

So how is this relevant to translators, or the translation industry? It’s relevant because the same factors affecting the demographics of the F/OSS community are also likely to affect the demographics of the crowdsourced translation community. People can volunteer to translate the Facebook interface only if they have free time and access to a computer; likewise, people with physical disabilities that make interacting with a computer difficult are likely to spend less time participating in crowdsourced projects than people with no disabilities. And since, in many cases, the community of translators participating in a crowdsourced project will largely determine how quickly a project is completed, what texts are translated and what language pairs will be available, the profile of participants is important.

Unfortunately, we don’t have a lot of data about the profiles of people who participate in crowdsourced (or volunteer) projects. The studies that have been done do hint at a larger question worth exploring: O’Brien & Schäler’s 2010 article on the motivations of The Rosetta Foundation’s volunteer translators noted that the group of translators identifying themselves as “professionals” was overwhelmingly female (82%), while the gender of those identifying themselves as amateurs was more balanced (54% female). The source languages of the volunteers were mainly English, French, German and Spanish. My own survey of Wikipedia translators found that 84% of the respondents were male and 75% were younger than 36. Because both these projects show that people with certain profiles participated more than others, it’s clear there’s a need for more research. If we had a better idea of the profiles of those who participate in other crowdsourced translation projects, we would be able to get see whether some projects seem more attractive to one gender, which language pairs are most often represented, and what kinds of content is being translated for which language communities. And we could then try to figure out whether (and if so, how) to make these projects more inclusive.

Since it’s still a point of debate whether relying on crowdsourcing to translate the Twitter interface, a Wikipedia article or a TED Talk is beneficial to the translation industry, let’s leave that question aside for a moment and just consider the following: one of the benefits both novice and seasoned translators are supposed to be able to reap from participating in a crowdsourced project is visible recognition for their work. Online, accessible users profiles are common in crowdsourcing projects (as in this example from the TED translator community), and translators are usually given visible credit for their contributions to a project (as in this Webflakes article, which credits the author, translator and reviewer). If certain groups of people are more likely to participate in crowdsourced projects, that means this kind of visibility is available only to them. If we take a look at where the software development industry is headed (with employers actively seeking those who have participated in F/OSS projects, putting those who cannot freely share their code at a disadvantage), translators could eventually see a similar trend, putting those who are unable (or who choose not) to participate in crowdsourced projects at a disadvantage.

I think this is a point worth considering when crowdsourced projects are designed, and it’s certainly a point worth exploring in further studies, as it raises a host of ethical questions that deserve a closer look.

Translation in Wikipedia

I’ve been busy working on a new paper about translation and revision in Wikipedia, which is why I haven’t posted anything here in quite some time. I’ve just about finished now, so I’m taking some time to write a series of posts related to my research, based on material I had to cut from the article. Later, if I have time, I’ll also write a post about the ethics and challenges of researching translation and revision trends in Wikipedia articles.

This post talks about the corpus I’ve used as the basis of my research.

Translations from French and Spanish Wikipedia via Wikipedia:Pages needing translation into English

I wanted to study translation within Wikipedia, so I chose a sample project, Wikipedia:Pages needing translation into English, and compiled a corpus of articles translated in whole or in part from French and Spanish into English. To do this, I consulting recent and previous versions of Wikipedia:Pages needing translation into English, which has a regularly updated list split into two categories: the “translate” section, which lists articles Wikipedians have identified as having content in a language other than English, and the “cleanup” section, which lists articles that have been (presumably) translated into English but require post-editing. Articles usually require cleanup for one of three reasons: the translation was done using machine translation software, the translation was done by a non-native English speaker, or the translation was done by someone who did not have a good grasp of the source language. Occasionally, articles are listed in the clean-up section even though they may not, in fact, be translations: this usually happens when the article appears to have been written by a non-native speaker of English. (The Aviación del Noroeste article is one example). Although the articles listed on Wikipedia:Pages needing translation into English come from any of Wikipedia’s 285 language versions, I was interested only in the ones described as being originally written in French or Spanish, since these are my two working languages.

I started my research on May 15, 2013, and at that time, the current version of the Wikipedia:Pages needing translation into English page listed six articles that had been translated from French and ten that had been translated from Spanish. I then went back through the Revision History of this page, reading through the archived version for the 15th of every month between May 15, 2011 and April 15, 2013: that is, the May 15, 2011, June 15, 2011, July 15, 2011 all the way to April 15, 2013 versions of the page, bringing the sample period to a total of two years. In that time, the number of articles of French origin listed in either the “translate” or “clean-up” sections of the archived pages to a total of 34, while the total number of articles of Spanish origin listed in those two sections was 60. This suggests Spanish to English translations were more frequently reported on Wikipedia:Pages needing translation into English than translations from French. Given that the French version of the encyclopedia has more articles, more active users and more edits than the Spanish version, the fact that more Spanish to English translation was taking place through Wikipedia:Pages needing translation into English is somewhat surprising.

Does this mean only 94 Wikipedia articles were translated from French or Spanish into English between May 2011 and May 2013? Unlikely: the articles listed on this page were identified by Wikipedians as being “rough translations” or in a language other than English. Since the process for identifying these articles is not fully automated, many other translated articles could have been created or expanded during this time: Other “rough” translations may not have been spotted by Wikipedia users, while translations without the grammatical errors and incorrect syntax associated with non-native English might have passed unnoticed by Wikipedians who might have otherwise added the translation to this page. So while this sample group of articles is probably not representative of all translations within Wikipedia (or even of French- and Spanish-to-English translation in particular), Wikipedia:Pages needing translation into English was still a good source from which to draw a sample of translated articles that may have undergone some sort of revision or editing. Even if the results are not generalizable, they at least indicate the kinds of changes made to translated articles within the Wikipedia environment, and therefore, whether this particular crowdsourcing model is an effective way to translate.

So what are these articles about? Let’s take a closer look via some tables. In this first one, I’ve grouped the 94 translated articles by subject. Due to rounding, the percentages do not add up to exactly 100.

Subject Number of French articles Percentage of total (French) Number of Spanish articles Percentage of total (Spanish)
Biography 20 58.8% 20 33.3%
Arts (TV, film, music, fashion, museums) 3 8.8% 8 13.3%
Geography 2 5.8% 12 20%
Transportation 2 5.8% 4 6.7%
Business/Finance

(includes company profiles)

2 5.8% 2 3.3%
Politics 1 2.9% 4 6.7%
Technology (IT) 1 2.9% 1 1.6%
Sports 1 2.9% 1 1.6%
Education 1 2.9% 1 1.6%
Science 1 2.9% 0 0%
Architecture 0 0% 2 3.3%
Unknown 0 0% 3 5%
Other 0 0% 2 3.3%
Total 34 99.5% 60 99.7%

Table 1: Subjects of translated articles listed on Wikipedia:Pages needing translation into English (May 15, 2011-May 15, 2013)

As we can see from this table, the majority of the translations from French and Spanish listed on Wikipedia:Pages needing translation into English are biographies—of politicians, musicians, actors, engineers, doctors, architects, and even serial killers. While some of these biographies are of historical figures, most are of living people. The arts—articles about TV shows, bands, museums, and fashion—were also a popular topic for translation. In the translations from Spanish, articles about cities or towns in Colombia, Ecuador, Spain, Venezuela and Mexico (grouped here under the label “geography”) were also frequent. So it seems that the interests of those who have been translating articles from French and Spanish as part of this initiative have focused on arts, culture and politics rather than specialized topics such as science, technology, law and medicine. That may explain why articles are also visibly associated with French- and Spanish-speaking regions, demonstrated by the next two tables.

I created these two tables by consulting each of the 94 articles, except in cases where the article had been deleted and no equivalent article could be found in the Spanish or French Wikipedias (marked “unknown” in the tables), and I identified the main country associated with the topic. A biography about a French citizen, for instance, was counted as “France”, as were articles about French subway systems, cities and institutions. Every article was associated with just one country. Thus, when a biography was about someone who was born in one country but lived and worked primarily in another, I labelled the article as being about the country where that person had spent the most time. For instance, http://en.wikipedia.org/wiki/Manuel_Valls was born in Barcelona, but became a French citizen over thirty years ago and is a politician in France’s Parti socialiste, so this article was labelled “France.”

Country/Region Number of articles
France 24
Belgium 2
Cameroon 1
Algeria 2
Canada 1
Western Europe 1
Switzerland 1
Romania 1
n/a 1
Total: 34

Table 2: Primary country associated with translations from French Wikipedia

 

Country Number of articles
Spain 13
Mexico 10
Colombia 10
Argentina 7
Chile 3
Venezuela 3
Peru 2
Ecuador 2
Nicaragua 1
Guatemala 1
Uruguay 1
United States 1
Cuba 1
n/a 1
Unknown 4
Total 60

Table 3: Primary country associated with translations from Spanish Wikipedia

Interestingly, these two tables demonstrate a marked contrast in the geographic spread of the articles: more than 75% of the the French source articles dealt with one country (France), while 75% of the Spanish source articles dealt with three (Spain, Colombia and Mexico), with nearly equal representation for each country. The two tables do, however, demonstrate that the vast majority of articles had strong ties to either French or Spanish-speaking countries: only two exceptions (marked as “n/a” in the tables) did not have a specific link to a country where French or Spanish is an official language.

I think it’s important to keep in mind, though, that even though the French/Spanish translations in Wikipedia:Pages needing translation into English seem to focus on biographies, arts and politics from France, Colombia, Spain and Mexico, translation in Wikipedia as a whole might have other focuses. Topics might differ for other language pairs, and they might also differ in other translation initiatives within Wikipedia and its sister projects (Wikinews, Wiktionary, Wikibooks, etc.). For instance the WikiProject:Medicine Translation Task Force aims to translate medical articles from English Wikipedia into as many other languages as possible, while the Category: Articles needing translation from French Wikipedia page lists over 9,000 articles that could be expanded with content from French Wikipedia, on topics ranging from French military units, government, history and politics to geographic locations and biographies.

I’ll have more details about these translations in the coming weeks. If you have specific questions you’d like to read about, please let me know and I’ll try to find the answers.

Experimenting with Wikipedia in the classroom

Late last year, I came across a very insightful podcast series called BiblioTech on the University Affairs website. Each episode focuses on technology and higher education–Twitter in the classroom, for instance, or storage in the cloud–so of course I was immediately hooked. I had missed the first thirteen episodes, but they’re all quite short–usually between ten and fifteen minutes long–so I managed to catch up after two jogs and a commute to work.

Episodes 12 (Wikipedia) and 13 (Plagiarism) in particular piqued my interest and actually inspired me to change the format of the courses I’m teaching this term: an MA-level Theory of Translation and a BA-level Introduction to Translation into English course.

First, I listened to the Plagiarism episode, which mainly discussed how to design tests and assignments that discourage students from cheating. As host Rochelle Mazar, an emerging technologies librarian at the University of Toronto’s Mississauga campus, argued:

We need to create assignments that have students produce something meaningful to them, but opaque to everyone else.

Her suggestions included having students use material from the classroom lectures and discussions in their assignments (e.g. by blogging about each week’s lectures, and then using these blog posts to write their final paper), having students build on peer interactions via Twitter, Facebook or the course website to develop their assignments, or having students contribute to open-access textbooks through initiatives like Wikibooks.

I then listened to the Wikipedia episode, where Mazar made the following argument about why instructors should integrate Wikipedia into classroom assignments:

When people tell me that they saw something inaccurate on Wikipedia, and scoff at how poor a source it is, I have to ask them: why didn’t you fix it? Isn’t that our role in society, those of us with access to good information and the time to consider it, isn’t it our role to help improve the level of knowledge and understanding of our communities? Making sure Wikipedia is accurate when we have the chance to is one small, easy way to contribute. If you see an error, you can fix it. That’s how Wikipedia works.

Together, these two episodes got me thinking about the assignments I would be designing for my courses, and it didn’t take me long to decide that I would incorporate Wikipedia and blogging into my courses: translation of Wikipedia articles for the undergraduate translation course, and blogging as the medium for submitting, producing and collaborating on written work in the graduate theory course. Next month, I’ll write a post about how I decided to integrate blogs into my graduate theory class, but right now, I want to focus on Wikipedia and its potential as a teaching tool in translation classrooms.

But first, a short digression: A couple of years ago, I had students in my undergraduate translation classes work in group or partners to translate texts for non-profit organizations as a final course assignment. The students seemed to really like translating texts that would actually be used by an organization instead of texts that were nothing more than an exercise to be filed away at the end of term. And I enjoyed being able to submit a large project to a non-profit at the end of the term. But it was a lot of work on my part, mainly because I acted as a project manager by finding a non-profit with a text of just the right length and just the right difficulty, then splitting up the text for the class, correcting the final submissions, and finally translating the rest of the text, since the documents we were given to translate were inevitably too long for me to assign entirely to the students. So after two years, I went back to having students translate less taxing texts, like newspaper or magazine articles, since it’s easier to correct twenty translations of the same text than it is to correct twenty excerpts from a longer project. But I did miss the authentic assignments.

So, when I listened to the BiblioTech podcasts, I realized Wikipedia might be a good solution to the problem. Students can choose their own articles to translate (freeing me from the project-management aspect), and the wide variety of subjects needing translation–Wikipedians have tagged over 9000 articles as possible candidates for French-to-English translation–means we should be able to find something to interest everyone, and something just the right length for the assignment (around 300 words per student). I still expect to have to spend more time correcting the translations, but I think this will be less work overall than the previous projects.

As I was planning out the project, I was pleasantly surprised to discover that the Wikimedia Foundation has established an education program in Canada, the United States, Brazil and Egypt. The Canada Education Program is intended to help university professors integrate Wikipedia projects into their courses, and it offers advantages like an online ambassador for every class to help students navigate the technical challenges of editing in the Wikipedia environment. In addition, there’s an adviser who works closely with professors who join the program. Fortunately for me, he’s based in Toronto, which means I was able to chat with him earlier this month about the program. His recent article in the Huffington Post offers some good arguments for why Wikipedia is a useful classroom tool. He suggests, for instance, that since companies like the CIA use wikis in their work environments, students are likely to need to be familiar with wiki technology and culture after they graduate. In addition, students gain exposure by contributing to articles that are visible online, and they learn to engage in debates with classmates and Wikipedians as their contributions are reviewed and edited by others.

I’m still in the early stages of this experiment… I don’t yet know, for instance, whether students will have a lot of trouble editing their articles, and whether the technical challenges can all be solved by the online ambassador who will be working us. I’ve asked students to use Google Documents to do most of the translating work, but I’m expecting students to add the final versions to Wikipedia before the end of the term, so many of these problems may crop up only in March or April. I also expect a lot of in-class discussion about Wikipedia’s Translation Guidelines, which encourage omission of irrelevant information and adaptation or explanation of cultural references:

Translation between Wikipedias need not transfer all content from any given article. If certain portions of an article appear to be low-quality or unverifiable, use your judgment and do not translate this content. Once you have finished translating, you may ask a proofreader to check the translation.
[…]
A useful translation may require more than just a faithful rendering of the original. Thus it may be necessary to explain the meaning of terms not commonly known throughout the English-speaking world. For example, a typical reader of English needs no explanation of The Wizard of Oz, but has no idea who Zwarte Piet might be. By contrast, for a typical reader of Dutch, it might be the other way around.

Because students may find they have more freedom to make their own judgements about the relevance of information, I’ve asked them to do in-class presentation about their translation decisions and the experience of working in Wikipedia at the end of the term. I’ll be sure to post some of my own thoughts on this experiment after the term is over, the marking is complete and the translations are posted online. I’ll even post links to some of our work.

Has anyone else used (or thought about using) Wikipedia articles as translation assignments? If so, I’d certainly appreciate your comments.

Wikipedia survey IV (Motivations)

While I’ve still got the survey open in my browser, I thought I’d finish writing about the results. This last post will look at the motivations the 76 respondents gave for translating, editing or otherwise participating in a crowdsourced translation initiative. (I should point out that although the question asked about the “last crowdsourced translation initiative in which [respondents] participated”, 63 of the 76 respondents (83%) indicated that Wikipedia was the last initiative in which they had participated, so their motivations are mainly for Wikipedia, with a few for Pirate Parties International, nozebe.com, open-source software, iFixit, Forvo, and Facebook)

The survey asked two questions about motivations. Respondents were first asked to select up to four motivations for participating.[*] They were then given the same list and asked to choose just one motivation. In both cases, they were offered motivations that can be described as either intrinsic (done not for a reward but rather for enjoyment or due to a sense of obligation to the community) or extrinsic (done for a direct or indirect reward). They were also allowed to select “Other” and add their own motivations to the list, as 11 respondents chose to do.

When I looked at the results, it became clear that most respondents had various reasons for participating: only 4 people choose one motivation when they were allowed to list multiple reasons (and one person skipped this question). All four wanted to make information available to others. Here’s a chart that shows which motivations were most commonly cited. (Click on the thumbnail to see a full-size image):
wikipedia translators-4 motivations

As the chart shows, intrinsic motivations (making information available to others, finding intellectual stimulation in the project, and supporting the organization that launched the initiative) were the motivations most often chosen by respondents. However, a significant number also had extrinsic reasons for participating: they wanted to gain more experience translating or practice their language skills. In the article I wrote about this survey, I broke these motivations down by type of respondent (those who had worked as professional translators vs. those who had not), so I won’t go into details here, except to say that there are some differences between the two groups.

Respondents who chose “Other” described various motivations: one was bored at work, one wanted “to be part of this network movement”, one wanted to improve his students’ translation skills by having them translate for Wikipedia, two thought it was fun, one wanted to quote a Wikipedia article in an academic work but needed the information to be in English, and three noted that they wanted to help or gain recognition within the Wikipedia community. Some more detailed motivations (often with a political/social emphasis) were also cited, either with this question, or in the final comments section:

I am not a developer of software, but I am using it for free. To translate and localise the software for developers is a way to say thank you – Only translated software has a chance to spread and prosper – I get to know new features and/or new software as soon as it is available

As a former university teacher I believe that fighting ignorance is an important way of making world a better place. Translating local knowledge into trans-national English is my personal gift for the humanity 🙂

I’m not sure how you found me because I’m pretty sure I only translated one Wikipedia page… I did it mainly because the subject of the article is almost unknown in the Jewish world, and I wanted more people to know about her and one of the few ways in which I can help make her story more widely known is by translating it into French. That being said I think I’ll try to do more!

The main reason I became involved in crowdsourced translation is that, in my opinion, the translation of science involves more than linguistic problems. It also requires an awareness of context; of why the scientific activities were undertaken, as well as how they fit into the “world” to which they belong. Many crowdsourced translation projects do not take this into account, treating the translation of science as a linguistic problem. This is fallacious. So I participate to fix the errors that creep in.

My translations are generally to make information freely available, especially to make Guatemalan cultural subjects available in Spanish to Guatemalan nationals.

I taught myself German, by looking up every single word in a couple of books I wanted to read about my passionate hobby. I have translated a couple of books in that hobby for the German association regarding that hobby (gratis). Aside from practice, practice, practice, I have had no training in translation. I began the Wiki translations when I was unemployed for a considerable amount of time and there was an article in the German Wiki on my hobby that had a tiny article in English. The rest is history. It’s been a few years since I’ve contributed to Wikipedia, but it was a great deal of fun at the time. Translation is a great deal of work for me (I have several HEAVY German/English dictionaries), but I love the outcome. Can I help English speakers understand the information and the beauty of the original text?

There were very few Sri Lankans editing on English Wikipedia at that time and I manage to bring more in and translate and put content to Wikipedia so other language speakers can get to know that information. I was enjoying my effort and eventually I got the administrator-ship of Sinhala Wikipedia. From then onwards I was working there till I had to quit as I was started to engage more with my work and studies. Well that’s my story and I’m not a full time translator and I have no training or whatsoever regarding that translating.

As these comments show, the respondents had often complex reasons for helping with Wikipedia translations. Some saw it as an opportunity to disseminate information about certain language, cultural or religious groups (e.g. Guatemalans, Sri Lankans) to people within or outside these communities; others wanted to give back to communities or organizations they believed in (for instance, by helping other Wikipedians, by giving free/open-source software a wider audience). But intrinsic reasons seem most prominent. This is undoubtedly why, when respondents were asked to select just one reason for participating in a crowdsourced translation initiative, 47% chose “To make information available to language speakers”, 21% said they found the project intellectually stimulating, and 16% wanted to support the organization that launched the initiative. No one said that all of their previous responses were equally important, which shows that while many motivations are a factor, some played a more significant role than others in respondents’ decisions to volunteer for Wikipedia (and other crowdsourced translation initiatives).

That’s apparent, too, in the responses I received for the question “Have you ever consciously decided NOT to participate in a crowdsourced translation initiative?” The responses were split almost evenly between Yes (49%) and No (51%). The 36 respondents who said Yes were then asked why they had decided not to participate, and what initiative they hadn’t wanted to participate in. Here’s a chart that shows why respondents did not want to participate:
wikipedia translators-4 motivations for not participating

Unlike last time, when only a few respondents chose 1 or 2 motivations for participating, 15 of the 36 respondents chose only 1 reason, and 11 chose only two to explain why they decided not to participate (although they could have chosen up to four motivations). This means that almost 75% of respondents did not feel that their motives for not participating were as complex as their motives for participating. (Of course, it’s also possible that because this was one of the last questions on the survey, respondents were answering more quickly than they had earlier). I had expected that ideological reasons would play a significant role in why someone would not want to participate in a crowdsourced translation initiative (ie. that most respondents, being involved in a not-for-profit initiative like Wikipedia, would have reservations about volunteering for for-profit companies like Facebook), but the most common reason respondents offered was “I didn’t have time” (20 respondents, or 56%), followed by “I wasn’t interested” (12 respondents, or 33%). Only 7 didn’t want to work for free (in four cases, it was for Facebook, while the 3 other respondents didn’t mention what initiative they were thinking of), and only 9 said they didn’t want to support the organization that launched the initiative (Facebook in four cases, a local question-and-answer type service in another, Wikia and Wikipedia in two other cases). There was some overlap between these last two responses: only 12 respondents in all indicated that they didn’t want to work for free and/or support a particular organization.

I think these responses show how attitudes toward crowdsourced translation initiatives are divided, even among those who have participated in the same ones. Although 16 respondents had translated for Facebook (as I discussed in this post), and therefore did not seem ideologically opposed to volunteering for a for-profit company, 12 others had consciously decided not to do so. And even though respondents most commonly said they didn’t participate because they didn’t have time, we have seen that many respondents participated in Wikipedia translation projects because they found it satisfying, fun, challenging, and because they wanted to help disseminate information to people who could not speak the language in which the information was already available. So factors like these must also play a role in why respondents might not participate in other crowdsourced translation initiatives.

On that note, I think I’ll end this series of blog posts. If you want to read more about the survey results, you’ll have to wait until next year, when my article appears in The Translator. However, I did write another article about the ethics of crowdsourcing, and that’s coming out in Linguistica Antverpiensia in December, so you can always check that one out in the meantime. Although I was hoping to conduct additional surveys with participants in other crowdsourced translation initiatives like the TED Open Translation Project, I don’t think I’ll have time to do so in the near future, unless someone wanted to collaborate with me. If you’re interested, you can always email me to let me know.

[*] The online software I used for the survey didn’t allow me to prevent respondents from selecting more than four reasons. However, only 14 people did so: of the 76 respondents, 4 chose 5 reasons, 7 chose 6 reasons, and 3 chose 7 reasons. I didn’t exclude these 14 responses because the next question limited respondents to just 1 reason.

Wikipedia survey III (Recognition, Effects)

It’s been quite some time now since my last post about the Wikipedia survey results, and for that I must apologize. I was side-tracked by some unrelated projects and found it hard to get back to the survey. But I’ve just finished revising my article on this topic (which will be published in the November 2012 issue of The Translator), and that made me sit down to finish blogging about the survey results. This is the third of four posts. I had planned to look at motivations, effects and recognition all in one post, but it became too long, so I’ve split it into two. This one looks at the ways respondents were recognized for participating in crowdsourced projects and what impact (if any) their participation has had on their lives. The next one (which I will post later this week), looks at respondents’ motivations for participating in crowdsourced initiatives.

For anyone who comes across these posts after the article is published, I should mention that the discrepancy between the number of survey respondents in the article and on this blog (75 vs. 76) is because I received another response to the survey after I’d submitted the article for peer review. It was easier to include all 76 responses here, since I’m creating new graphs and looking at survey responses I didn’t have space to explore in the Translator article, but I didn’t update the data in the article because the new response didn’t change much on its own (+/-0.5% here and there) and would have required several hours work to recalculate the figures I cited throughout the 20+ pages.

I also want to thank Brian Harris for discussing these survey results on his blog. You can read his entry here or visit his blog, Unprofessional Translation, to read a number of very interesting articles about translation by non-professionals, including those working in situations of conflict, war, and natural disasters.

And on to the survey results:

Recognition
The survey asked respondents what (if any) recognition they had received for participating in a crowdsourced translation initiative. Although the question asked about the last initiative in which respondents had participated (rather than Wikipedia in particular), 63 of the 76 respondents indicated that Wikipedia was the last initiative in which they had been involved, so the responses are mainly representative of the recognition they received as Wikipedia translators. Here’s a chart summarizing the responses (click on it for a full-sized image):
wikipedia translators-recognition
As the chart illustrates, no respondents received financial compensation, either directly, by being paid for their work, or indirectly, by being offered a discount on their membership fees or other services. This really isn’t surprising, though, because most respondents were Wikipedia translators, and contributors to Wikipedia (whether translators or writers) are not paid for their work. In addition, since Wikipedia does not charge membership fees, there is nothing to discount. Unexpectedly, though, 20 respondents reported receiving no recognition at all–even though 17 of them listed Wikipedia as the last initiative in which they had been involved. Because Wikipedia records the changes made to its pages, anyone who had translated an article would have been credited on the history page. These 20 respondents may not have been aware of the history feature, or–more likely–they didn’t consider it a form a recognition.[*]

Receiving credit for the translation work (either directly beside the translation or via a profile page) was the most common type of recognition. Of the 18 respondents who selected “Other”, 10 reported being credited on the Wikipedia article’s history page, 1 said their name appeared in the translated software’s source code, 1 noted they had received some feedback on the Wiki Talk page, 1 mentioned receiving badges from Facebook, and the others mentioned their motivations (e.g. just wanted to help, translation became better, could refer to the translation in other academic work) or the effect their involvement had on their careers (e.g. higher rate of pay for translating/interpreting duties at work). I discuss the advantages and disadvantages of this enhanced visibility for translators and translation in an article that will appear in Linguistica Antverpiensia later this year, so I won’t elaborate here, except to say that crediting translators, and providing a record of the changes made to translations makes translation a more visible activity and provides researchers with a large corpus of texts that can be compared and analyzed. In fact, I think Wikipedia’s translations are an untapped wealth of material that can help us better understand how translations are produced and revised by both professional and non-professional translators.

Effects
Finally, I asked respondents whether/how their participation in a crowdsourced translation initiative had impacted their lives. Here’s another chart that summarizes the results (again, click on the image to see it in full size):
Wikipedia translators-impact
I was surprised to see that 38 respondents (or 51%) didn’t feel their participation had had some sort of impact: after all, why they would continue volunteering if they were not getting something (even personal satisfaction) out of the experience? However, this may be a problem with the question itself, as I hadn’t listed “personal satisfaction” as an option. If I had, (and I would definitely make this change to the next survey), the responses might have been different. As it is, of the 16 respondents who selected “Other”, 8 indicated that participating gave them personal satisfaction, a sense of pride in their accomplishments, a feeling of gratification, etc. Here are a few of their comments:

Pride in my accomplishments, although I am an amateur translator. I did some cool stuff!

I have the immense satisfaction of knowing that my efforts are building a free information project and hope that my professionalism raises the quality bar for other contributors who see my work (e.g. edit summaries, citations of sources, etc.)

I was spending my spare time on Wikipedia and sharing my knowledge. Moreover I was enjoying what I was doing. That’s it.

As for the rest of the responses in the “Other” category: One person noted that they had been thanked by other Wikipedia users for the translation, another remarked that they had been thanked by colleagues for contributing to “open-source intellectual work”, two said they had learned new things, one had met new Facebook friends, one said they had been asked to do further translation work for the project, two noted they had been invited to participate in this survey, and one (a part-time translation professor) said “My students consider my classes as a useful and positive learning experience” because they help translate for Wikipedia together.

Nearly 1/3 of respondents (22 of the 76) felt they had received feedback that helped improve their translation skills, and I think this point is important: the open nature of Wikipedia (and many other crowdsourced projects) provides an excellent forum for exchanging ideas and commenting on the work of others. But this is also a point that deserves further study, since so few of the respondents reported having training or professional experience translating.

Interestingly, some of the more tangible effects of participating in a crowdsourced initiative, such as receiving job offers and meeting new clients or colleagues, were not often experienced by the survey respondents. I wonder whether the results would be the same if this survey were given to participants in other types of initiatives (translation of for-profit websites such as Facebook, or localization of Free/open-source software such as OmegaT). The results do show, however, that volunteering for crowdsourced translation initiatives has had some positive (and a few negative) effects on the careers and personal lives of the participants, and that personal satisfaction is also an important motivator.

[*]
An interesting aside is that of the 20 respondents who reported receiving no recognition, 5 also indicated they had received other forms of recognition, such as their names appearing beside the translation, an updated profile page, or feedback on their work. Respondents may have been thinking of all projects in which they had been involved, instead of the last one, which the question asked about. These 5 respondents all indicated that Wikipedia was the last initiative in which they had been involved.

Wikipedia survey II (Types of Participation)

This is a follow-up to last month’s post describing preliminary results from a survey of Wikipedia translators. To find out about the survey methodology and the respondent profiles, please read this post first.

I initially planned for this survey to be one of several with translators from various crowdsourced projects, so I wrote the participation-related questions hoping to compare the types of crowdsourced translation initiatives people decide to participate in and what roles they play in each one. I haven’t yet had time to survey participants in other initiatives (and, truth be told, I probably won’t have time to do so in the near future), so the responses to the next few questions will have to be only a partial glimpse of the kinds of initiatives crowdsourcing participants get involved in. Here’s a table illustrating the responses to the question about which crowdsourced translation initiatives respondents had participated in. As expected, virtually all respondents had helped translate for Wikipedia. The one respondent who did not report translating for Wikipedia participated in Translatewiki.net, with a focus on MediaWiki, the wiki platform originally designed for Wikipedia.

Initiative No. of respondents Percentage
Wikipedia 75 98.7%
Facebook 16 21.3%
Free/Open-source software projects (software localization and/or documentation translation for F/OSS projects such as OmegaT, Concrete5, Adium, Flock, Framasoft) 7 9.2%
Translatewiki.net 2 2.7%
TEDTalks 2 2.7%
The Kamusi Project 1 1.3%
Ifixit 1 1.3%
Forvo 1 1.3%
Translated.by 1 1.3%
Anobii 1 1.3%
Science-fiction fandom websites 1 1.3%
Traduwiki 1 1.3%
Orkut 1 1.3%
Der Mundo (Wordwide Lexicon) 1 1.3%
The Lied, Art Song, and Choral Texts Page 1 1.3%

A few points I found interesting. First, I was surprised to see that respondents had participated in such a diverse range of projects. I had expected that because Wikipedia was a not-for-profit initiative, participants would be less likely to have helped translate for for-profit companies like Facebook and Twitter; however, after Wikipedia, Facebook was the initiative that had attracted the most participants. Second, I was intrigued by the fact that almost 10% of respondents were involved in open-source software translation/localization projects. I hypothesized that the respondents who had reported working in the IT sector or studying computer science would be the ones involved in the F/OSS projects, but that was not always the case: when I broke down the data, I found that people from a variety of fields (a high school student, an economics student, two medical students, a translator, a software developer, a fundraiser, etc.) had helped translate/localize F/OSS projects. I think these results really indicate a need to specifically study F/OSS translation projects to see whether the Wikipedia respondents are representative of the participants.

Next, I asked respondents how they had participated in crowdsourced translation projects (as translators, revisers, project managers, etc.) and how much time per week, on average, they had spent participating in their last crowdsourced translation initiative.

Here’s a graph illustrating how respondents had participated in various crowdsourced translation projects. They were asked to select all ways they had been involved, even if it varied from one project to another. This means that the responses are not indicative of participation in Wikipedia alone:
wikipedia translators-roles played

As the graph shows, translation was the most common means of participation, but that wasn’t surprising, because I had invited respondents based on whether they had translated for Wikipedia. However, a significant number of respondents had also acted as revisers/editors, and some had participated in other ways, such as providing links to web resources and participating in the forums. I think this graph shows how crowdsourced translation initiatives allow people with various backgrounds and experiences to participate in ways that match their skills: for instance, someone with weaker second-language skills can help edit the target text in his or her mother tongue, catching typos and factual errors. And someone with a background in a particular field can share links to resources or answer questions about concepts from that field, without necessarily having to do any translating. So when we speak of crowdsourced translation initiatives, it’s important to consider that these initiatives allow for more types of involvement than translating in the narrow sense of providing a TL equivalent for a ST.

Finally, I asked participants how many hours they spent on average, per week, participating in the last crowsourced translation initiative in which they were involved. Here’s a graph that illustrates the answers I received:
wikipedia translators-hours per week

As you can see, most respondents spent no more than five hours per week participating in a crowdsourced translation initiative. On the surface, this may seem to provide some comfort to the professional translators who object to crowdsourcing as a platform for translation, since these Wikipedia respondents did not spend enough time per week on a translation to equal a full-time job; however, hundreds of people volunteering four or five hours per week can still produce enough work to replace several full-time professionals. Not-for-profit initiatives like Wikipedia, where article authors, illustrators and translators all volunteer their time are probably not as problematic to the profession, since professional translators would probably never have been hired to translate the content anyway, but for-profit initiatives such as Facebook are more ethically ambiguous. I’ve discussed some of these ethical problems in an article that will be published in Linguistica Antverpiensia later this year, in an issue focusing on community translation.

In a few weeks, I’ll post the results of the last few survey questions, the ones focusing on motivations for participating, the rewards/incentives participants have received and the effect(s) their participation has had on their lives and careers.

Wikipedia survey I (Respondent profiles)

This is the first in a series of posts about the results of my survey of Wikipedians who have translated content for the Wikimedia projects (e.g. Wikipedia). Because I’ve already submitted an article analyzing the survey, these posts will be less analytical and more descriptive, although I will be able to discuss some of the survey questions I didn’t have space for in the paper. This post will look at the profiles of the 76 Wikipedians who responded to the survey (and whom I’d like to thank once again for their time).

Survey Methodology
I wanted to randomly invite Wikipedia translators to complete the survey, so I first consulted various lists of English translators (e.g. the Translators Available page and the Translation/French/Translators page) and added these usernames to a master list. Then, for each of the 279 languages versions on the List of Wikipedias page*, I searched for a Category: Translators page for translations from that language into English (ie. Category: Translators DE-EN, Category: Translators FR-EN, etc.). I added the usernames in the Category: Translators pages to the names on the master list, and removed duplicate users. This process led to a master list with the names of 1866 users who had volunteered to translate Wikipedia content into English. I then sent out invitations to 204 randomly selected users from the master list, and 76 (or 37%) of them responded. A few caveats: additional Wikipedians have probably translated content for the encyclopedia without listing themselves on any of the pages I just mentioned. Moreover, anyone can generally edit (and translate) Wikipedia pages without creating an account, so the results of the survey probably can’t be generalized for all English Wikipedia translators, let alone Wikipedia translators into the other 280 languages, who are not necessarily listed on the English Wikipedia pages I consulted. Finally, although 76 Wikipedians may not seem like many respondents, it is important to note that many of the users on the master list did not seem to have ever translated anything for Wikipedia: when I consulted their user contribution histories, I found that some Wikipedians had added userboxes to their profile pages to indicate their desire to translate but had not actually done anything else. I was interested only in the views of people who had actually translated, so the 76 respondents actually represents a much larger share of actual Wikipedia translators than it appears.

Profiles
The vast majority of the respondents (64 respondents, or 84%) were male and most were 35 years of age or younger (57 of the respondents, or 75% were under 36). This result is not surprising, given the findings of a 2008 general survey of more than 176,000 Wikipedia users, where 50% of the respondents were 21 years of age or under (in all, 76% were under 30) and 75% were male.

When respondents were asked about translation-related training, most (51 respondents or 68%) responded that they had no formal training in translation. Here’s a graph with a breakdown for each response:
Wikipdia translators-training

Given that respondents were generally young and usually did not have formal training in translation, it’s not surprising that 52 of the 76 respondents (68.4%) had never worked as translators (ie. they had never been paid to produce translations). Only 11 respondents (or about 14%) were currently working as translators on a full- or part-time basis, while 13 (or about 17%) had worked as translators in the past but were not doing so now. So it’s not surprising either that only two respondents were members of a professional association of translators.

Finally, when asked about their current occupations, respondents reported working in range of fields. I’ve grouped them as best I could, using the Occupational Structure proposed by Human Resources and Development Canada. Two respondents did not answer this question, but here’s an overview of the 74 other responses:

Occupation No. of respondents Percentage
Student
    6 High school students
    4 College/University students (languages)
    17 College/University students (other fields)
27 36%
Works in IT sector 11 15%
Works in language industry 9 12%
Works in another sector (e.g. graphic design, law, education) 8 11%
Works in business, finance or administration 7 9%
Unemployed/stay-at-home parent/retired 5 7%
Academic 3 4%
Engineer 2 3%
Works in sales and service industry 2 3%
Total: 74 100%

Later this week (or early next week), I’ll look at the types of crowdsourced translation initiatives the respondents were involved in (other than Wikipedia, of course), and the roles they played in these initiatives. After that, I’ll discuss respondent motivations for volunteering and the impact their participation has had on their lives.


* There are now 281 Wikipedia versions.

Crowdsourcing experiment with translation students

In Howe’s 2008 book Crowdsourcing: Why the Power of the Crowd is Driving the Future of Business, which I reviewed here, Howe describes TopCoder Inc., a company that develops software for industry partners by administering programming competitions. Twice weekly, competitions are posted and any of the 200,000+ members of the TopCoder community can compete to complete small components of a piece of software that will eventually be delivered to a client. As Howe describes the process, each project is broken down into manageable chunks that are then posted on the TopCoder website so that programmers can compete against one another to write the most efficient, bug-free solution to the problem. After the entries are submitted, TopCoder members then try to find bugs in the submissions, and only bug-free versions pass to be vetted by TopCoder reviewers. Competitions are also posted to see who can compile the components into a single piece of software and run the program bug-free. Members are ranked according to how often they win competitions, and they also receive monetary rewards ranging from $25-$300 for submitting winning entries.

I decided to try out this crowdsourcing model in the classroom by organizing a similar translation competition. Before we started, I spoke a little about translation and crowdsourcing, describing some of the pros and cons, and showing examples of some of the organizations that have relied on crowdsourcing for their translations (e.g. TED, Facebook, Twitter). Then we moved on to translating a text together, with the TopCoder competition as the model for the translation process.

A few days ago, I’d broken up a short text into one- or two-sentence chunks and posted these chunks on an online form, with an empty text box under each one for the translations. Here’s a screen capture of the form, which I created with Google Docs:
crowdsourcing activity form

All students were given two minutes to translate the first segment, and then click on the “Submit” button at the bottom of the page, which automatically uploaded the translated segments to a Google Docs spreadsheet so that we were able to view all the submissions together. Here’s a screen capture of the spreadsheet, to give you an idea of what we were working with in class:
crowdsourcing activity spreadsheet

Once the first sentence had been submitted, students were then able to vote on which translation they preferred. To help speed up the process, each student was allowed to vote only once for their favourite version and then, after one version was declared the “winner”, students were able to make any revisions they wanted, provided a majority of the class agreed with the change. The revised sentence was then added to a Word document so that a final translation could be pieced together from the winning sentences. Students were then given two minutes to translate the second sentence, and, once they had done so, they were invited to vote on the winner and make any corrections to the translation. After we had translated three or four sentences (and were almost out of time), students were asked to comment on the final version and the translation process.

So what did the students think? Most noted that although our translation process (basically a popularity contest, with the possibility of adding a few touch-ups to the winner) worked well enough in our small group, it might not be as successful outside the classroom, since the most popular answer isn’t necessarily the best. Some raised some very valid concerns about how such a process would work on a larger scale or in another context. For instance, they wondered how well would the final text would hold together if it had been translated by multiple people, and how various linguistic groups (e.g. Brazilian and Iberian Portuguese speakers) would settle on an acceptable version.

It seemed, though, that many students enjoyed the exercise, regardless of whether they felt this method of translating would work outside the classroom. One student liked being able to compare the various versions laid out on the screen, because that way, when revisions/corrections were made to the winning translation, students could incorporate ideas from the versions that did not win. Another noted that one person translating alone might get stuck or lack inspiration at certain points, but that this problem would not arise if many people were working on the same text.

Overall, I think this experiment worked well. Using a Google Docs Form really simplified the setup on my end, since I needed no programming skills and was able (in less than 15 minutes) to create an interface we could work with in class. Next year, I’d do this exercise on a week when we had three hours together instead of one where I had scheduled a test in the second half of class. I think this exercise lends itself well to a 2- or 3-hour class, with 20 to 30 minutes for a talk about crowdsourcing, 1 to 1.5 hours to translate, 15-20 minutes to go over the final translation (since I didn’t give any input while students were voting on and revising the submissions and the final version did have some minor language and translation errors), and then 10-15 minutes for students to reflect on the process and the result.

References:
Howe, Jeff. (2008). Crowdsourcing: Why the Power of the Crowd is Driving the Future of Business. New York: Crown Publishing.

Survey on crowdsourced translation initiatives launched

This weekend, I finally began sending out the invitations for the survey I’ve been preparing on crowdsourced translation initiatives. It asks respondents about their backgrounds, whether they have any formal training in translation, why they have decided to participate (or not to participate) in crowdsourced translation projects, and whether their participation has impacted their lives (e.g. whether they received job offers or met new colleagues because of their participation).

I’ve begun with Wikipedia, but I plan to invite respondents who have participated in other crowdsourced translation initiatives, including TedTalks, Kiva and Global Voices Online. I’ve just finished randomly sampling the Wikipedians who have helped translate content into English, but I will now start randomly sampling those who have translated from English into French, Spanish and/or Portuguese. I’m hoping to determine whether participant profiles differ from one project to another: for instance, does the average age of participants vary from one project to another? Do some projects seem to attract more people who have some formal training in translation? Do motivations differ when participants are translating for non-profit initiatives vs. for-profit companies?

Responses have started to trickle in, and I’m already starting to see some trends, but I won’t say anything more until all of the surveys have been submitted and I’ve had a chance to analyze the results. If you’re interested in finding out more details about the survey, please let me know. And if you want to see some of the results, check back in a few months. I expect to have some details to discuss by late March or early April.