Fall semester wrap-up I

This has been an unusually busy Fall semester, which is why I haven’t posted anything here in several months. With classes over and all of the outstanding grading almost finished, I thought I’d take advantage of the temporary lull to post a few thoughts on some of the courses I taught this term. In this post, I’ll be talking about a third-year undergraduate course called Theory of Translation.

Last year, I’d experimented with having students prepare Wikipedia articles as part of their coursework, and since the results were largely successful, I made the project mandatory this year. This time, though, students submitted their projects in three stages: 1) a 100-word proposal, in which students had to describe the topic they wanted to cover, justify why it needed a new or expanded Wikipedia article, and demonstrate that they would be able to find relevant secondary sources to draw on, 2) a draft version of their article posted to their Wikipedia user page sandbox, and 3) a final version published in Wikipedia that incorporated the feedback I’d given them on their drafts. In total, the Wikipedia project was worth 45% of the final grade (10% for the proposal, 15% for the draft, and 20% for the final version). About 20 students were enrolled in Theory of Translation this semester, which means that together, these students added about 10,000 words to Wikipedia.

For the most part, the articles turned out very well. I tried to prepare students for the research and drafting process by spending time as a class thinking how to write a good Wikipedia article. Early on, we reviewed resources like WikiProject Translation Studies so we could think about what topics needed new or expanded articles. Three weeks into the course, we walked over to the university library to explore possible resources, and a week later, we took at a look at the Wikipedia article on Computer-assisted translation, which has a number of quality issues. We then spent about fifteen minutes in class trying to improve its references, structure and content: this activity doubled as a way to apply some of the readings from our unit on translation technology. These preparation sessions seem to have been worthwhile: most of the draft versions my students submitted a few weeks ago needed only fairly minor revisions to be added to Wikipedia. With only a few exceptions, all of the final articles made it into to Wikipedia.

In future years, though, I will ask students to write a longer proposal so they can better assess the potential need for an article and the resources they have access to. A few students did not fully explore the feasibility of their topic during the proposal stage and then had trouble drafting an article that relied on at least three secondary sources that met Wikipedia’s verifiability and reliability criteria. This happened most frequently when students wanted to write a biographical article on a translator or Translation Studies researcher. If the person was not very well known to the general public, students could usually rely on only primary sources such as a CV or personal website for the biographical details, and these are not considered reliable by Wikipedia standards. (Incidentally, this was the most common reason that articles my students had prepared were rejected by Wikipedia editors, although in one case my student had prepared an excellent biography relying only on secondary sources, but the translator was still deemed “not notable enough” to merit a Wikipedia page). I’d like to help students avoid these problems in the future.

Here’s a sample of the Wikipedia articles students added or expanded this term:

Biographies:

Translation institutions:

Other translation-related topics:

Games, games and more games II

In my last blog post, I described The Twitter Race, the game my Introduction to Translation into English students most enjoyed playing last semester, and I promised to follow-up with a post about the game my students enjoyed the least: Wikipedia Level Up. Since I’m presenting a paper about the games at the didTRAD Conference in Barcelona this afternoon, I think now is a good time to write this post, so I can share a few thoughts about why this game was not as effective as it could have been and how I will improve it in the future.

Wikipedia Level Up

For this game, students had to edit and revise a translated Wikipedia article I had selected from a list of articles needing cleanup after French translation. To complete Level 1, students had to identify and correct at least five language errors in the English version, without consulting the source text. Once they had finished, they could then move on to Level 2, where they had to compare the French ST and English TT to identify and correct at least four transfer errors. Level 3 involved identifying at least three violations of Wikipedia’s core content policies, and Level 4 involved identifying and correcting at least four instances where the English translation did not conform to Wikipedia’s Manual of Style. To successfully complete each level, students had to show me the errors they had identified and the corrections they were proposing. I would then award points for each error they had correctly identified and resolved: they earned 1 point per Level 1 error, 2 points for Level 2 errors, etc.

Students could win either by completing all four levels in 45 minutes or by accumulating the most points in 45 minutes. This meant we could have multiple winners.

So what went wrong? Well, when designing this game, I had assumed students would try to get through the levels as quickly as possible, finding just the minimum number of errors (or perhaps one or two more to earn a few extra points) and then moving on–particularly since I had weighted the points to make Level 4 more lucrative. After all, a student who whipped through all four levels resolving just the minimum number of errors would finish with 38 points, whereas a student who stayed at Level 1 looking for as many errors as possible would have to find and correct 39 problems to beat their classmate. As a player, I would want to head to Level 4 as quickly as possible so I could accumulate points up to four times faster than students in lower levels and be guaranteed to win the game. As it turned out though, some students focused on correcting as many mistakes as possible mistake at each level, and since I had purposely chosen a translation that had a lot of problems (to give students a better chance of finding errors quickly), only 2 of the 10 students who attended class that day were able to win the game. When I surveyed my students about the games after the course had ended, I wasn’t surprised to read a comment that the Wikipedia Level Up game was too complicated to be easily completed in the time allowed.

For next year, I will double the time I had allotted for the game (from 45 to 90 minutes) so that students are better able to complete the existing levels. I will also add “Level 5: Enter your corrections into Wikipedia”, so students become more familiar with the platform, and a bonus “Level 6: Decide whether the translated content should be adapted to better target English-speaking readers” to help students develop their subject-matter expertise.

Finally, since the most important objective of this game is that students are exposed to as many aspects of Wikipedia translation as possible, I will change the criteria for winning the game so that points are collected only as an extra challenge: students will be trying to set a high score that future students may want to beat, but the points won’t count toward winning the game. Instead, students will win if they successfully complete all five levels within the allotted 90 minutes, and they will earn an additional 50 bonus points if they can complete the bonus round.

Next year, after I’ve implemented these changes, I’ll write a follow-up post about whether this game was more successful with my next group of students, and I’ll offer a few thoughts about how the rest of the games fared the second time around.

More than 2000 words about translation and translators added to Wikipedia

As I mentioned in my last blog post, this year, my undergraduate Theory of Translation students wrote 500-word Wikipedia articles on translation-related topics as one of their course assignments. About two thirds of the class opted to cover one of these topics instead of creating a game for us to play, so Wikipedia now has half a dozen new (or significantly expanded) articles profiling Canadian translators, professional translator associations, term bases and more.

Although I’ve incorporated Wikipedia translation projects into my courses before, I’d never assigned a Wikipedia writing project. However, I’ve been working with The Wiki Education Foundation for a few years now, and I’ve seen what other instructors have done with their Wikipedia assignments. I also have a good idea of what features a good Wikipedia article should include, so I gave my students the following goals to achieve:

Final Wikipedia articles should:

Not every article that was written made it into Wikipedia, mainly due to the extensions I gave for the assignment: with many students submitting their articles close to the end of the term, I wasn’t able to mark everything until after the semester was over. Next year, I’ll have students submit their first drafts as mid-term assignments, and then have a follow-up assignment requiring students to revise their articles and post them to Wikipedia before the end of the term. This will make sure that the articles are revised and published to Wikipedia within the 12-week time frame of our course.

Interested in seeing some of the newly created or significantly expanded articles? Her are a few that are now available:

Translation flows in English Wikipedia

My summer project this year involves Wikipedia again. I’ve already studied the motivations and profiles of Wikipedia translators as well as revision trends in translated Wikipedia articles, so I’m now moving on to tracking translation flows in English Wikipedia. The things I’ll be looking at include how the demand for translation from any of the 290 Wikipedia languages into English has changed over time, how this demand matches up with Wikipedia activity in those languages, how often translations from a given language are flagged for revision, how these revision requests change over time, how often translated articles in need of revision are deleted and why, and whether the change in the number of active users mirrors the changes in translation activity over time.

Are these questions important? I would argue that they’re worth studying for various reasons, but mainly because the discourse about crowdsourcing often emphasizes that the diverse backgrounds of the volunteers who participate in crowdsourced translation initiatives means that translations from and into “exotic” languages can take place and may even be more frequent than in traditional translation projects, where the cost of localizing a website into a language with just a few thousand speakers would be prohibitive (see this interview with Twitter’ Engineering manager Gaku Ueda, for instance). So it’s worth asking questions like whether some languages are prioritized over others and whether some have more activity than others. The answer is likely “yes” in both cases, but if we look at which languages are receiving the most attention, we might be surprised by the results. After all, the five largest Wikipedias, based on the number of articles available in each version, are currently English, Swedish, German, Dutch and French, in that order, but just one year ago, Swedish was ranked 8th. Swedish Wikipedia is not, of course, comprised solely of articles translated from other languages, but the fact that it currently has more articles (and far fewer native speakers) than say German or French led me to wonder whether more translations are flowing into and out languages like Swedish, which are typically less well represented online than languages like English, Chinese and Spanish.

I’m not very far into this project yet, so I don’t have much data to share, but here’s a graph of the demand for translation from French into English, Spanish into English and German into English. I compiled the data using the current Category:Articles needing translation from French [or Spanish or German] Wikipedia page and comparing it with previous versions approximately every six months for the last six years, based on data from in the Internet Archive’s Wayback Machine. (The gap in both the Spanish and German versions exists because the Wayback Machine did not crawl these two pages in 2010).

Number of pages listed in the Category:Articles needing translation from [French, Spanish or German] Wikipedia pages, 2009-2015.

Number of pages listed in the Category:Articles needing translation from [French, Spanish or German] Wikipedia pages, 2009-2015.

As the graph shows, requests for translation into English from French, Spanish or German have increased substantially over the past six years. From what I’ve seen so far, French seems to be an anomaly, because the number of articles listed as being good candidates for translation varied widely, sometimes increasing or decreasing by 1000 in just six months. These numbers don’t tell us much yet, but I’ll be digging into them more over the summer: I want to see, for instance, how long it takes for an article to be removed from the list and whether demand for translation from English into these languages is similar. This could help explain whether the number of articles listed as needing translation is increasing because little translating is taking place and a backlog of translations is accumulating, whether it is increasing because Wikipedians are becoming more interested in translation and are therefore adding articles to the lists more frequently now than in the past, whether articles are being translated and are simply not being removed from the list, causing demand for translation to appear inflated, etc. I hope to have more to share soon.

In the meantime, I’d certainly welcome any comments on the project, or thoughts on translation in Wikipedia!

How do you translate “expulsé” or “chanson”? Another Wikipedia challenge

“I realize now why professors say ‘don’t use Wikipedia'”, one of my students remarked during a group presentation about the challenges of translating a Wikipedia article from French into English. Once again, I had assigned Wikipedia articles as a final translation assignment for the 22 students enrolled in Introduction to Translation into English this past winter. This student’s remark came about after the group became frustrated with the French article’s lack of sources and occasionally promotional tone. In their final translation, they took a 2800-word text about French-Canadian singer-songwriter Pierre Lapointe with 21 references and turned it into an 1600-word article with 47 documented sources and a more neutral tone. Another group’s article on reasonable accommodation went from 14 references in the source text to 66 in the English translation.

But references weren’t the only challenges my students faced this year. When they reflected on their projects, one group discussed the trouble they had trying to find the right translation for “la chanson française” as a genre (they finally opted for French chanson, since there is a Wikipedia article with that title), while another group struggled with “expulsé”, as in:

En février 2005, deux ambulanciers ont été expulsés d’une cafétéria de l’Hôpital général juif de Montréal parce qu’ils mangeaient un repas qu’ils s’étaient préparé.

and

Le 24 février 2007, à Laval, une jeune musulmane ontarienne de 11 ans est expulsée d’un match de soccer auquel elle participe et qui réunit de jeunes joueuses canadiennes.

(Both of which came from this article on reasonable accommodation in Quebec)

In this case, they settled for “ejected” in the first example and “sent off” in the second, after consulting various English media reports. A graduate assistant helped me correct these assignments, and since we both recommended different translations, I’d certainly agree with my students that this was a tricky case. In the end, I suggested “asked to leave” for the first example, because it seemed to depict the event in the most neutral way, which is one of Wikipedia’s core content policies.

Most groups took various liberties with their source material, rewriting and reorganizing the article to present information more effectively, remove details that couldn’t be verified, and update details that were several years old. In almost every case, these decisions made the final English articles much better quality than the original French articles, and it helped students look more critically at their source material (as the comment from the student I cited earlier suggests).

Due to a lengthy strike at the university this year, I had to extend the submission deadline for these projects, and consequently had to finish marking the translations long after the term had ended instead of doing so in mid-March. This meant that only a few groups have been dedicated enough to post their final, corrected translations to English over the past few weeks, something that is perfectly understandable given that most students are now working, travelling or taking summer courses and would therefore have trouble finding some free time to revise their work. So I was very happy to see that at least three articles have made it to Wikipedia. Want to check them out? Here they are:

Should I fix this mistake or not? On the ethics of researching Wikipedia

I’ve just finished some of the final edits for an article that will soon be published in has just been published in Translation Studies, and it reminded me that I’ve been meaning to write a blog post about an ethical dilemma I faced when I was preparing my research. So before I turn to a new project and forget all about this one again, here’s what happened.

The paper focuses a corpus of 94 Wikipedia articles that have been translated in whole or in part from French or Spanish Wikipedia. I wanted to see not just how often translation errors in the articles were caught and fixed, but also how long it took for errors to be addressed. It will probably not come as any surprise that almost all of the articles I studied initially contained both transfer problems (e.g. incorrect words or terms, omissions) and language problems (e.g. spelling errors, syntax errors), since they were posted on Wikipedia:Pages needing translation into English, which lists articles that are included in English Wikipedia but which contain content in another language, content that requires some post-translation editing, or both. Over the course of the two years leading up to May 2013, when I did the research, some of the errors I found in the initial translations were addressed in subsequent versions of the articles. In other cases, though, the errors were still there, even though the page had been listed as needing “clean-up” for weeks, months, or even years.

And that’s where my ethical dilemma arose: should I fix these problems? It would be very simple to do, since I was already comparing the source and target texts for my project, but it felt very much like I would be tampering with my data. For instance, in the back of my mind was the thought that I might want to conduct a follow-up study in a year or two, to see whether some of the errors had been resolved with more time. If I were to fix these problems, I wouldn’t be able to check on the status of these articles later, which would prevent me from finding out more about how quickly Wikipedians resolve translation errors.

And yet, I was torn, partly due to a Bibliotech podcast I’d listened to a few years ago that made a compelling argument for improving Wikipedia’s content:

When people tell me that they saw something inaccurate on Wikipedia, and scoff at how poor a source it is, I have to ask them: why didn’t you fix it? Isn’t that our role in society, those of us with access to good information and the time to consider it, isn’t it our role to help improve the level of knowledge and understanding of our communities? Making sure Wikipedia is accurate when we have the chance to is one small, easy way to contribute. If you see an error, you can fix it. That’s how Wikipedia works.

In the end, I didn’t make any changes, but this was mainly because I didn’t have the time. I didn’t want to tamper with my data while I was writing the paper, and after I had submitted it, I didn’t get around to going back through the list of errors I’d compiled to starting editing articles. Most of the corrections would have been for very minor problems, such as changing a general word (“he worked for”) to a word that more specifically reflected the source text (“he volunteered for”), or changing incorrect words for better translations, although the original version would have given users the gist of the meaning (e.g. “the caves have been exploited” vs. “the caves have been mined”). I had trouble justifying the need to invest several hours correcting details that wouldn’t really affect the overall meaning of the text, and yet this question still nagged at me. So I thought that instead I would write a blog post to see what others thought: what is more ethical, making the corrections myself, or leaving the articles as they are, to see how they change over time without my influence?

Wikipedia translation projects: Take 2

Last year was the first time I assigned a Wikipedia translation project in my Introduction to Translation into English course, and I was happy enough with the experience that I tried it again this year. Now that I’m more familiar with Wikipedia, I was able to change the assignment in ways that I hope improved both the student experience and the translations we produced. Here’s an overview of how I modified the assignment this year and what students thought about the project:

Overview of the assignment

For this assignment, students are required to work in groups to translate an article they choose. Like last year, I recommended they select from this list of 7000+ articles needing translation from French into English. Also like last year, as part of the assignment, students had to submit a report about how they divided the work and made their translation decisions. Finally, they had to do a presentation in front of the class to show their finished article and explain the challenges they faced when translating it.

First change: More training on Wikipedia policies and style guides

This year, I spent more class time talking about Wikipedia policies and discussing what would make a “good” English article. During our second week of class, for instance, we covered Wikipedia’s three Wikipedia’s Core Content Policies: neutral point of view, no original research, and verifiability of sources. I asked students to consider how this might affect the articles they chose to translate, and reminded them that they should be aware that even a single word in the source text (e.g. “talented”, “greatest”, “spectacular”) could run counter to these three policies. Last year, for instance, the group translating an article about a historic site in France found adjectives like “spectacular” and “great” used in the French article to describe a tower that stood on the site. In their translation, they deleted these adjectives, because they found them too subjective. After we discussed this example, I asked students to think of other evaluative words they might encounter in their source texts, and then we came up with some strategies for addressing these problems in their translations, including omitting the words and finding a reliable secondary source to quote instead (“X and Y have described the tower as ‘spectacular’).

On Weeks 3 and 4, we took a closer look at the Wikipedia Manual of Style, and in particular, at the Manual of Style for articles about the French language or France and the Manual of Style for Canada-related articles. Though students could choose to translate articles on French-speaking regions other than France and Canada, only those two French-speaking countries have their own style guide. I pointed out the recommendations for accented characters and proper names and we discussed what to do in cases where no rule existed, or where considerable controversy continues to exist, as is the case for capitalization of French titles and expressions. In this case, we created our own rule (follow typical English capitalization rules), but students could still choose to do something else: they just had to justify their decision in the commentary accompanying their translation.

Second change: revised marking scheme

Last year, I’d intended to mark the translations just like any other assignment: I told students I would give them a grade for the accuracy of their translation, based on whether they had any errors like incorrect words and shifts in meaning, and a grade for English-language problems like grammar errors, spelling mistakes, and ambiguous wordings. But a good Wikipedia article also needs to have hyperlinks to other articles, citations to back up any facts, and various other features that are mentioned in the Manual of Style. My marking scheme from last year couldn’t accommodate these things. This year, I marked the translations out of 50, broken down as follows: 15 marks for the accuracy of the translation, 15 marks for the language, 10 marks for conforming to the Manual of Style and adding relevant hyperlinks to other Wikipedia articles, 5 marks for citing references and ensuring hyperlinks are functional, and a final 5 marks for ensuring the translation is posted to Wikipedia, with the corrections I suggested. I also had students submit their translations earlier so I could start marking them before the end of the semester, giving them time to post their final versions before the course was over. Together, these changes made the assignment work much better, and I noticed a big improvement in the quality of the final articles.

Student reactions to the assignment

At first, some students were very nervous about working within the Wikipedia environment. In the first week of class, when I asked how many had ever edited a Wikpipedia article, no one raised their hand. As the weeks went on, I heard comments from the groups about how they needed to spend some time figuring out the markup language, and how to use the sandbox, but by the end of the term, everyone succeeded in posting their translations online.

During their presentations this week, some students even noted that the markup language was fairly easy to learn and that they were glad to have more experience with it because it’s a tool they might need to use in the future. As I’d hoped, many students discovered that researching an article is a lot of work and that just because you’re interested in a topic doesn’t mean it will be easy to translate an article about it. Some students commented that adapting their texts to an English audience was challenging, particularly when English sources about the people and places they’d chosen to write about weren’t readily available. And nearly all of them felt the assignment has made them look at Wikipedia more critically: some students said they would check how recently an article had been updated (since their French article had out-of-date tourism statistics, for instance, or dead hyperlinks), while others said they would be looking to see whether the article cited reliable sources.

Not all of the translations have been corrected and posted online yet, but here are a few that have. I’ll update the list later, when everyone’s done: [List updated April 19]:

  • Aubagne (Students translated the “History”, “Politics” and “Environment and Environmental Policies” sections)
  • Fundy National Park (Students translated the “Natural Environment” and “Tourism and Administration” sections)
  • Louis Calaferte (Students translated the introduction, along with the “early life” and “Career” sections)
  • Lyonnaise cuisine (Students translated the “Terroirs and culinary influences” and “The Mères” sections)
  • Die2Nite

What translators can learn from the F/OSS community

Looking through my blog archives late last year, I was disappointed to discover I’d posted only seven articles in all of 2013: usually my goal is to get at least one post up every month, and last year was the first time since 2009 that I hadn’t been able to achieve that. So my goal for this year is to blog more frequently and more consistently. And with that, here is my first post of 2014:

In November, I came across a blog post that hit on a number of issues relevant to the translation industry, even though it was addressed to the Free/Open-Source Software (F/OSS) community. It’s called The Ethics of Unpaid Labour and the OSS Community, and it appeared on Ashe Dryden’s blog. Ashe writes and speaks about about diversity in corporations, and so her post focused on how unpaid OSS work creates inequalities in the workforce. As she argues, the demographics of OSS contributors differs from that of proprietary software developers as well as the general population, with white males overwhelmingly represented among OSS contributors: One source Ashe cites, for instance, remarks that only 1.5% of F/OSS contributors are female, compared to 28% of contributors for proprietary software. Ashe notes that lack of free time among groups that are typically marginalized in the IT sector (women, certain ethnic groups, people with disabilities, etc.) is the main reason these groups are under-represented in OSS projects.

These demographics are problematic for the workforce because many software developers require their employees (and potential new hires) to have contributed to F/OSS projects. And while some large IT firms do allow employees to contribute to such projects during work hours, people from marginalized groups often do not work at these kinds of companies. This means people who would like to find paid employment as software developers probably need to be able to devote unpaid hours to F/OSS projects so they have a portfolio of publicly available code for employers to consult.

So how is this relevant to translators, or the translation industry? It’s relevant because the same factors affecting the demographics of the F/OSS community are also likely to affect the demographics of the crowdsourced translation community. People can volunteer to translate the Facebook interface only if they have free time and access to a computer; likewise, people with physical disabilities that make interacting with a computer difficult are likely to spend less time participating in crowdsourced projects than people with no disabilities. And since, in many cases, the community of translators participating in a crowdsourced project will largely determine how quickly a project is completed, what texts are translated and what language pairs will be available, the profile of participants is important.

Unfortunately, we don’t have a lot of data about the profiles of people who participate in crowdsourced (or volunteer) projects. The studies that have been done do hint at a larger question worth exploring: O’Brien & Schäler’s 2010 article on the motivations of The Rosetta Foundation’s volunteer translators noted that the group of translators identifying themselves as “professionals” was overwhelmingly female (82%), while the gender of those identifying themselves as amateurs was more balanced (54% female). The source languages of the volunteers were mainly English, French, German and Spanish. My own survey of Wikipedia translators found that 84% of the respondents were male and 75% were younger than 36. Because both these projects show that people with certain profiles participated more than others, it’s clear there’s a need for more research. If we had a better idea of the profiles of those who participate in other crowdsourced translation projects, we would be able to get see whether some projects seem more attractive to one gender, which language pairs are most often represented, and what kinds of content is being translated for which language communities. And we could then try to figure out whether (and if so, how) to make these projects more inclusive.

Since it’s still a point of debate whether relying on crowdsourcing to translate the Twitter interface, a Wikipedia article or a TED Talk is beneficial to the translation industry, let’s leave that question aside for a moment and just consider the following: one of the benefits both novice and seasoned translators are supposed to be able to reap from participating in a crowdsourced project is visible recognition for their work. Online, accessible users profiles are common in crowdsourcing projects (as in this example from the TED translator community), and translators are usually given visible credit for their contributions to a project (as in this Webflakes article, which credits the author, translator and reviewer). If certain groups of people are more likely to participate in crowdsourced projects, that means this kind of visibility is available only to them. If we take a look at where the software development industry is headed (with employers actively seeking those who have participated in F/OSS projects, putting those who cannot freely share their code at a disadvantage), translators could eventually see a similar trend, putting those who are unable (or who choose not) to participate in crowdsourced projects at a disadvantage.

I think this is a point worth considering when crowdsourced projects are designed, and it’s certainly a point worth exploring in further studies, as it raises a host of ethical questions that deserve a closer look.

Translation in Wikipedia

I’ve been busy working on a new paper about translation and revision in Wikipedia, which is why I haven’t posted anything here in quite some time. I’ve just about finished now, so I’m taking some time to write a series of posts related to my research, based on material I had to cut from the article. Later, if I have time, I’ll also write a post about the ethics and challenges of researching translation and revision trends in Wikipedia articles.

This post talks about the corpus I’ve used as the basis of my research.

Translations from French and Spanish Wikipedia via Wikipedia:Pages needing translation into English

I wanted to study translation within Wikipedia, so I chose a sample project, Wikipedia:Pages needing translation into English, and compiled a corpus of articles translated in whole or in part from French and Spanish into English. To do this, I consulting recent and previous versions of Wikipedia:Pages needing translation into English, which has a regularly updated list split into two categories: the “translate” section, which lists articles Wikipedians have identified as having content in a language other than English, and the “cleanup” section, which lists articles that have been (presumably) translated into English but require post-editing. Articles usually require cleanup for one of three reasons: the translation was done using machine translation software, the translation was done by a non-native English speaker, or the translation was done by someone who did not have a good grasp of the source language. Occasionally, articles are listed in the clean-up section even though they may not, in fact, be translations: this usually happens when the article appears to have been written by a non-native speaker of English. (The Aviación del Noroeste article is one example). Although the articles listed on Wikipedia:Pages needing translation into English come from any of Wikipedia’s 285 language versions, I was interested only in the ones described as being originally written in French or Spanish, since these are my two working languages.

I started my research on May 15, 2013, and at that time, the current version of the Wikipedia:Pages needing translation into English page listed six articles that had been translated from French and ten that had been translated from Spanish. I then went back through the Revision History of this page, reading through the archived version for the 15th of every month between May 15, 2011 and April 15, 2013: that is, the May 15, 2011, June 15, 2011, July 15, 2011 all the way to April 15, 2013 versions of the page, bringing the sample period to a total of two years. In that time, the number of articles of French origin listed in either the “translate” or “clean-up” sections of the archived pages to a total of 34, while the total number of articles of Spanish origin listed in those two sections was 60. This suggests Spanish to English translations were more frequently reported on Wikipedia:Pages needing translation into English than translations from French. Given that the French version of the encyclopedia has more articles, more active users and more edits than the Spanish version, the fact that more Spanish to English translation was taking place through Wikipedia:Pages needing translation into English is somewhat surprising.

Does this mean only 94 Wikipedia articles were translated from French or Spanish into English between May 2011 and May 2013? Unlikely: the articles listed on this page were identified by Wikipedians as being “rough translations” or in a language other than English. Since the process for identifying these articles is not fully automated, many other translated articles could have been created or expanded during this time: Other “rough” translations may not have been spotted by Wikipedia users, while translations without the grammatical errors and incorrect syntax associated with non-native English might have passed unnoticed by Wikipedians who might have otherwise added the translation to this page. So while this sample group of articles is probably not representative of all translations within Wikipedia (or even of French- and Spanish-to-English translation in particular), Wikipedia:Pages needing translation into English was still a good source from which to draw a sample of translated articles that may have undergone some sort of revision or editing. Even if the results are not generalizable, they at least indicate the kinds of changes made to translated articles within the Wikipedia environment, and therefore, whether this particular crowdsourcing model is an effective way to translate.

So what are these articles about? Let’s take a closer look via some tables. In this first one, I’ve grouped the 94 translated articles by subject. Due to rounding, the percentages do not add up to exactly 100.

Subject Number of French articles Percentage of total (French) Number of Spanish articles Percentage of total (Spanish)
Biography 20 58.8% 20 33.3%
Arts (TV, film, music, fashion, museums) 3 8.8% 8 13.3%
Geography 2 5.8% 12 20%
Transportation 2 5.8% 4 6.7%
Business/Finance

(includes company profiles)

2 5.8% 2 3.3%
Politics 1 2.9% 4 6.7%
Technology (IT) 1 2.9% 1 1.6%
Sports 1 2.9% 1 1.6%
Education 1 2.9% 1 1.6%
Science 1 2.9% 0 0%
Architecture 0 0% 2 3.3%
Unknown 0 0% 3 5%
Other 0 0% 2 3.3%
Total 34 99.5% 60 99.7%

Table 1: Subjects of translated articles listed on Wikipedia:Pages needing translation into English (May 15, 2011-May 15, 2013)

As we can see from this table, the majority of the translations from French and Spanish listed on Wikipedia:Pages needing translation into English are biographies—of politicians, musicians, actors, engineers, doctors, architects, and even serial killers. While some of these biographies are of historical figures, most are of living people. The arts—articles about TV shows, bands, museums, and fashion—were also a popular topic for translation. In the translations from Spanish, articles about cities or towns in Colombia, Ecuador, Spain, Venezuela and Mexico (grouped here under the label “geography”) were also frequent. So it seems that the interests of those who have been translating articles from French and Spanish as part of this initiative have focused on arts, culture and politics rather than specialized topics such as science, technology, law and medicine. That may explain why articles are also visibly associated with French- and Spanish-speaking regions, demonstrated by the next two tables.

I created these two tables by consulting each of the 94 articles, except in cases where the article had been deleted and no equivalent article could be found in the Spanish or French Wikipedias (marked “unknown” in the tables), and I identified the main country associated with the topic. A biography about a French citizen, for instance, was counted as “France”, as were articles about French subway systems, cities and institutions. Every article was associated with just one country. Thus, when a biography was about someone who was born in one country but lived and worked primarily in another, I labelled the article as being about the country where that person had spent the most time. For instance, http://en.wikipedia.org/wiki/Manuel_Valls was born in Barcelona, but became a French citizen over thirty years ago and is a politician in France’s Parti socialiste, so this article was labelled “France.”

Country/Region Number of articles
France 24
Belgium 2
Cameroon 1
Algeria 2
Canada 1
Western Europe 1
Switzerland 1
Romania 1
n/a 1
Total: 34

Table 2: Primary country associated with translations from French Wikipedia

 

Country Number of articles
Spain 13
Mexico 10
Colombia 10
Argentina 7
Chile 3
Venezuela 3
Peru 2
Ecuador 2
Nicaragua 1
Guatemala 1
Uruguay 1
United States 1
Cuba 1
n/a 1
Unknown 4
Total 60

Table 3: Primary country associated with translations from Spanish Wikipedia

Interestingly, these two tables demonstrate a marked contrast in the geographic spread of the articles: more than 75% of the the French source articles dealt with one country (France), while 75% of the Spanish source articles dealt with three (Spain, Colombia and Mexico), with nearly equal representation for each country. The two tables do, however, demonstrate that the vast majority of articles had strong ties to either French or Spanish-speaking countries: only two exceptions (marked as “n/a” in the tables) did not have a specific link to a country where French or Spanish is an official language.

I think it’s important to keep in mind, though, that even though the French/Spanish translations in Wikipedia:Pages needing translation into English seem to focus on biographies, arts and politics from France, Colombia, Spain and Mexico, translation in Wikipedia as a whole might have other focuses. Topics might differ for other language pairs, and they might also differ in other translation initiatives within Wikipedia and its sister projects (Wikinews, Wiktionary, Wikibooks, etc.). For instance the WikiProject:Medicine Translation Task Force aims to translate medical articles from English Wikipedia into as many other languages as possible, while the Category: Articles needing translation from French Wikipedia page lists over 9,000 articles that could be expanded with content from French Wikipedia, on topics ranging from French military units, government, history and politics to geographic locations and biographies.

I’ll have more details about these translations in the coming weeks. If you have specific questions you’d like to read about, please let me know and I’ll try to find the answers.

Students translating for Wikipedia

Well, the Fall term is officially ending this week, and I’ve just taught my last Introduction to Translation into English class, where my students presented the Wikipedia translation projects they’ve been working on for the past month.

I really enjoyed listening to the students describe the challenges they encountered during the translation process, their experiences using the wiki markup language, and their justifications for adapting the French articles for the English version of the encyclopedia.

They had a lot of positive things to say about the assignment, which involved working in pairs or small groups (of up to four students) to translate all or part of an article of their choice, which I recommended they select from this list of 9000+ articles needing translation. They liked the fact that Wikipedia has (very broad) translation guidelines to follow, as well as advice about writing in an encyclopedic style. One of these recommendations was that translators should “avoid being overly influenced by the style of the original” and that “a useful translation may require more than just a faithful rendering of the original.” My students really seemed to like this flexibility: if they found some information irrelevant for English readers, they omitted it; if they found a word or section too subjective for an encyclopedia article (e.g. adjectives like “spectacular” and “great” to describe the historic site of Aigues-Mortes), they omitted it in the translation; if they found that important details about a subject’s life were missing (e.g. Octave Crémazie’s bankruptcy and subsequent flight from Quebec), they added them in. I haven’t marked the assignments yet, so this aspect may prove a little challenging for me, but I was happy to see the students taking such an interest in really making the texts fit the expectations of an imagined English-speaking audience.

On the other hand, students did find some aspects of the project frustrating: one group was annoyed that their translation was modified by another Wikipedian shortly after they posted it. They had spent a lot of time debating stylistic preferences such as hyphenation, spelling, and capitalization, and they felt that the changes the other user was proposing were not justified by the style guides they had consulted and–even worse–were not applied evenly throughout the article. Other students found that editing within the Wikipedia environment was tedious, and not everyone was able to figure out how to add references, post images and add hyperlinks to relevant English articles. (Others, though, were happy with the Wikipedia cheat sheet, which outlines most of the mark-up code for things like adding links, headings, and italics.)

In general, though, the students seemed to have enjoyed the assignment. They were able to choose articles that interested them, collaborate with others in the class to solve problems and research terms, and post their translations online for other Wikipedia users to see–although as one student mentioned, the collaborative nature of Wikipedia means that the translation might ultimately change, and it might eventually be hard for individual students to show exactly how they contributed to the version that is available online.

Once I’ve had a chance to mark these assignments, I’ll post a few thoughts about my experience with the project, in case other instructors might be interested in integrating a Wikipedia translation exercise into their classes.

Would you like to take a look at some of the translations? Here are a few of the articles students contributed to:
Russell Bowie
Old Quebec
Octave Crémazie
Aigues-Mortes
Hippogriff