Fall semester wrap-up I

This has been an unusually busy Fall semester, which is why I haven’t posted anything here in several months. With classes over and all of the outstanding grading almost finished, I thought I’d take advantage of the temporary lull to post a few thoughts on some of the courses I taught this term. In this post, I’ll be talking about a third-year undergraduate course called Theory of Translation.

Last year, I’d experimented with having students prepare Wikipedia articles as part of their coursework, and since the results were largely successful, I made the project mandatory this year. This time, though, students submitted their projects in three stages: 1) a 100-word proposal, in which students had to describe the topic they wanted to cover, justify why it needed a new or expanded Wikipedia article, and demonstrate that they would be able to find relevant secondary sources to draw on, 2) a draft version of their article posted to their Wikipedia user page sandbox, and 3) a final version published in Wikipedia that incorporated the feedback I’d given them on their drafts. In total, the Wikipedia project was worth 45% of the final grade (10% for the proposal, 15% for the draft, and 20% for the final version). About 20 students were enrolled in Theory of Translation this semester, which means that together, these students added about 10,000 words to Wikipedia.

For the most part, the articles turned out very well. I tried to prepare students for the research and drafting process by spending time as a class thinking how to write a good Wikipedia article. Early on, we reviewed resources like WikiProject Translation Studies so we could think about what topics needed new or expanded articles. Three weeks into the course, we walked over to the university library to explore possible resources, and a week later, we took at a look at the Wikipedia article on Computer-assisted translation, which has a number of quality issues. We then spent about fifteen minutes in class trying to improve its references, structure and content: this activity doubled as a way to apply some of the readings from our unit on translation technology. These preparation sessions seem to have been worthwhile: most of the draft versions my students submitted a few weeks ago needed only fairly minor revisions to be added to Wikipedia. With only a few exceptions, all of the final articles made it into to Wikipedia.

In future years, though, I will ask students to write a longer proposal so they can better assess the potential need for an article and the resources they have access to. A few students did not fully explore the feasibility of their topic during the proposal stage and then had trouble drafting an article that relied on at least three secondary sources that met Wikipedia’s verifiability and reliability criteria. This happened most frequently when students wanted to write a biographical article on a translator or Translation Studies researcher. If the person was not very well known to the general public, students could usually rely on only primary sources such as a CV or personal website for the biographical details, and these are not considered reliable by Wikipedia standards. (Incidentally, this was the most common reason that articles my students had prepared were rejected by Wikipedia editors, although in one case my student had prepared an excellent biography relying only on secondary sources, but the translator was still deemed “not notable enough” to merit a Wikipedia page). I’d like to help students avoid these problems in the future.

Here’s a sample of the Wikipedia articles students added or expanded this term:


Translation institutions:

Other translation-related topics:

Translation flows in English Wikipedia

My summer project this year involves Wikipedia again. I’ve already studied the motivations and profiles of Wikipedia translators as well as revision trends in translated Wikipedia articles, so I’m now moving on to tracking translation flows in English Wikipedia. The things I’ll be looking at include how the demand for translation from any of the 290 Wikipedia languages into English has changed over time, how this demand matches up with Wikipedia activity in those languages, how often translations from a given language are flagged for revision, how these revision requests change over time, how often translated articles in need of revision are deleted and why, and whether the change in the number of active users mirrors the changes in translation activity over time.

Are these questions important? I would argue that they’re worth studying for various reasons, but mainly because the discourse about crowdsourcing often emphasizes that the diverse backgrounds of the volunteers who participate in crowdsourced translation initiatives means that translations from and into “exotic” languages can take place and may even be more frequent than in traditional translation projects, where the cost of localizing a website into a language with just a few thousand speakers would be prohibitive (see this interview with Twitter’ Engineering manager Gaku Ueda, for instance). So it’s worth asking questions like whether some languages are prioritized over others and whether some have more activity than others. The answer is likely “yes” in both cases, but if we look at which languages are receiving the most attention, we might be surprised by the results. After all, the five largest Wikipedias, based on the number of articles available in each version, are currently English, Swedish, German, Dutch and French, in that order, but just one year ago, Swedish was ranked 8th. Swedish Wikipedia is not, of course, comprised solely of articles translated from other languages, but the fact that it currently has more articles (and far fewer native speakers) than say German or French led me to wonder whether more translations are flowing into and out languages like Swedish, which are typically less well represented online than languages like English, Chinese and Spanish.

I’m not very far into this project yet, so I don’t have much data to share, but here’s a graph of the demand for translation from French into English, Spanish into English and German into English. I compiled the data using the current Category:Articles needing translation from French [or Spanish or German] Wikipedia page and comparing it with previous versions approximately every six months for the last six years, based on data from in the Internet Archive’s Wayback Machine. (The gap in both the Spanish and German versions exists because the Wayback Machine did not crawl these two pages in 2010).

Number of pages listed in the Category:Articles needing translation from [French, Spanish or German] Wikipedia pages, 2009-2015.

Number of pages listed in the Category:Articles needing translation from [French, Spanish or German] Wikipedia pages, 2009-2015.

As the graph shows, requests for translation into English from French, Spanish or German have increased substantially over the past six years. From what I’ve seen so far, French seems to be an anomaly, because the number of articles listed as being good candidates for translation varied widely, sometimes increasing or decreasing by 1000 in just six months. These numbers don’t tell us much yet, but I’ll be digging into them more over the summer: I want to see, for instance, how long it takes for an article to be removed from the list and whether demand for translation from English into these languages is similar. This could help explain whether the number of articles listed as needing translation is increasing because little translating is taking place and a backlog of translations is accumulating, whether it is increasing because Wikipedians are becoming more interested in translation and are therefore adding articles to the lists more frequently now than in the past, whether articles are being translated and are simply not being removed from the list, causing demand for translation to appear inflated, etc. I hope to have more to share soon.

In the meantime, I’d certainly welcome any comments on the project, or thoughts on translation in Wikipedia!

Snippets from the Monterey Forum 2015

I’ve just returned from Monterey, Califonia, where I was at the Education Translators, Interpreters and Localizers in an Evolving World conference at the Middlebury Institute of International Studies at Monterey. As the title suggests, the talks focused on translation and interpreting pedagogy, and I came away with some new ideas after a number of very interesting presentations. Most of the day consisted of parallel sessions, so obviously I wasn’t able to attend everything. I’ll just summarize a few of the talks I particularly enjoyed. I’ve grouped them into three broad categories: those that discussed how to design new course offerings or fundamentally reshape the way a course is offered, those that touched on online teaching, and those that offered ideas or activities that could be integrated into existing translation courses.

Course design

On Saturday morning, I listened to Kayoko Takeda from Rikkyo University in Japan speak about developing a general education course on “Translation and interpretation literacy” for undergraduate students at her university. It focused on topics like the roles translators and interpreters play in society and professional issues translators and interpreters face without actually having students practice translation. What I found most intriguing was the way the course was designed: three instructors co-taught the course, and 14 guest speakers came to the Saturday-morning class to speak about topics like crowdsourcing and machine translation, business practices, subtitling, Bible translation, and community translation. These guest speakers would give students tasks to do before their talks (e.g. consulting a website, reading blog posts or articles), and then students would participate in the lectures, often by producing in-class essays on topics like rules, remuneration and rewards, or technology.

Methods and Activities for Online teaching

Saturday morning and afternoon included several presentations about teaching translation and interpretation in an online environment. Here are some of my favourites:

Suzanne Zeng talked about the University of Hawaii’s Interactive Video Service (HITS), which allows her to teach up to three groups of students simultaneously via an interactive closed-circuit TV system. While Suzanne teaches a group of students in one room, other groups of students at university campuses located on various Hawaiian islands sit in similar classrooms and participate in the class via video. All students have microphones at their desks, and when they push the button to talk, the video cameras are programmed to zoom in on the speakers so everyone else can hear what their peers have to say and see them clearly while they say it. The shared screens also allow everyone to see any PowerPoint presentations the instructor might use, and any notes he or she might write on the white board. I liked the way this system works because instructors can teach both in person and online at the same time. Having all students together simultaneously allows everyone to participate in discussions and group exercises, regardless of which island they might live on. Suzanne did mention that she has to monitor the video feeds while she is teaching so she can make sure the students in the other locations are fully engaged: addressing them directly helps remind these students that she is able to see them and is paying attention to them as well as the students in her classroom. She also noted that the system doesn’t allow for any flexibility in timing: class has to begin and end at a precise time because that’s when the video feed stops and starts; so if she is in the middle of a sentence with just a few seconds left on the counter, she’ll be cut off as soon as the clock reaches zero, and students in the remote locations won’t hear the end of what she had to say. That means instructors need to be very conscious of the clock with a system like this.

Qjinti Oblitas and Andrew Clifford, from my department at York University, offered some insight into how their interpretation students develop close ties with their peers, even though the first year of our Master’s in Conference Interpreting program is offered online. Through a variety of sometimes humourous examples, Qjinti and Andrew showed that students engaged with one another outside of the virtual classroom via private Facebook groups, text messages, Skype chats, and apps like WeChat, and they argued that the students felt a real sense of community with their peers—so much so in fact that many of the students found ways to meet one another in person if they lived in the same country or were travelling to a place near one of their classmates.

Finally, Cristina Silva said that every strategy for offline teaching could be adapted for the online classroom. She offered a variety of examples, some of which I use already, and others that I will consider using in the future—though I should point out that many of these ideas would work just as easily in a face-to-face classroom. Cristina’s suggestions included having students translate together via Google Docs, having students practice editing machine translations while using screen-sharing software so that their classmates can see their results, and encouraging students to use Dragon Naturally Speaking to record themselves while they dictated a sight translation to see whether their productivity increased compared to just typing out the translations themselves.

Activities for the translation classroom

Kent State University’s Erik Angelone offered a new way of having students assess their translation process. Arguing that other process-focused research methods like think-aloud-protocols and keystroke logging were too time consuming or too complicated to integrate into the classroom, Erik proposed using screen recorders like Blueberry Flashback Express to have students record their computer screens while they worked. Then, when students look back at these recordings, they would be able to see, for instance, whether they hesitated before translating a word but did not consult an electronic resource, which might indicate that the translation needs to be double-checked. Integrating screen recordings into the classroom would also allow students to learn from the methods other students or even professional translators had used: how do others deal with distractions like email alerts, for instance? Or how did others research a problematic word or phrase? I thought this was a very helpful idea for getting students to think about how they translate and whether their method could be more effective. One audience member did mention that the disadvantage of screen recordings is that it doesn’t show what students are doing off their computers (e.g. Consulting a paper dictionary), but Erik suggested that students could comment on their screen recordings afterwards in a retrospective interview. Of course, they could comment more informally as well, by adding a few written remarks at the end of their recording to describe any of their research techniques that wouldn’t show up in the screen recording. I’m going to integrate an activity like this into my introductory translation class next term, and after I do, I’ll write a short post about the results.

There were other talks I enjoyed as well, but this post is getting quite long. I think I’ll end with a link to the tweets that came out of the conference, which, though short, give a good overview of a larger selection of talks. You can take a look at the tweet compilation on Storify here.

Summer breaks and productivity

After presenting papers at three conferences in St. Catharines, Edmonton, and Barcelona in May, June and July, I spent the past two months on activities largely unrelated to my research, teaching or translating. I read six novels (more than I’ve managed to read in the last three years combined). I repainted parts of the house. I weeded and watered the vegetable garden in our backyard. And I spent a lot of time at playgrounds, splash pads and parks with my children. I did finish correcting two other articles that will soon appear in print, but aside from a book review, I didn’t submit any new texts to journals. In short, I’ve enjoyed the past two months, which were the first extended break I’ve taken from academic activities in several years.

And yet, I’ve still wrestled with the thought that I should be more productive. From time to time, I’ve wondered whether I will regret not getting further along in the survey I’m designing to find out more about what our undergraduate students think about internships. On more than one occasion, I got annoyed that I hadn’t returned to the major research project I’ve been tackling on and off for the past three years. I should be transcribing interviews! Visiting archives! Blogging! And, as the start of the Fall term has drawn closer, I’ve regretted not having the syllabi for all three courses completely finalized.

But working just one day a week for the past eight weeks has taught me that I can spend part of my summer break finishing off just the most pressing tasks without dire consequences. And in those moments when twinges of guilt did make themselves felt, I asked myself whether I would rather my children think of me as the mother who published a thirteenth article this summer, or as the mother who helped them make popsicles with the raspberries from our garden, who guided them through making a jar of pesto with our basil leaves, who showed them how to toss a salad made from the tomatoes that just a few months ago were tiny seeds in our kitchen. I’ll work on that thirteenth article in the fall, and it probably won’t matter at all that I didn’t start it sooner. In fact, I may even try this again next year.

Should I fix this mistake or not? On the ethics of researching Wikipedia

I’ve just finished some of the final edits for an article that will soon be published in has just been published in Translation Studies, and it reminded me that I’ve been meaning to write a blog post about an ethical dilemma I faced when I was preparing my research. So before I turn to a new project and forget all about this one again, here’s what happened.

The paper focuses a corpus of 94 Wikipedia articles that have been translated in whole or in part from French or Spanish Wikipedia. I wanted to see not just how often translation errors in the articles were caught and fixed, but also how long it took for errors to be addressed. It will probably not come as any surprise that almost all of the articles I studied initially contained both transfer problems (e.g. incorrect words or terms, omissions) and language problems (e.g. spelling errors, syntax errors), since they were posted on Wikipedia:Pages needing translation into English, which lists articles that are included in English Wikipedia but which contain content in another language, content that requires some post-translation editing, or both. Over the course of the two years leading up to May 2013, when I did the research, some of the errors I found in the initial translations were addressed in subsequent versions of the articles. In other cases, though, the errors were still there, even though the page had been listed as needing “clean-up” for weeks, months, or even years.

And that’s where my ethical dilemma arose: should I fix these problems? It would be very simple to do, since I was already comparing the source and target texts for my project, but it felt very much like I would be tampering with my data. For instance, in the back of my mind was the thought that I might want to conduct a follow-up study in a year or two, to see whether some of the errors had been resolved with more time. If I were to fix these problems, I wouldn’t be able to check on the status of these articles later, which would prevent me from finding out more about how quickly Wikipedians resolve translation errors.

And yet, I was torn, partly due to a Bibliotech podcast I’d listened to a few years ago that made a compelling argument for improving Wikipedia’s content:

When people tell me that they saw something inaccurate on Wikipedia, and scoff at how poor a source it is, I have to ask them: why didn’t you fix it? Isn’t that our role in society, those of us with access to good information and the time to consider it, isn’t it our role to help improve the level of knowledge and understanding of our communities? Making sure Wikipedia is accurate when we have the chance to is one small, easy way to contribute. If you see an error, you can fix it. That’s how Wikipedia works.

In the end, I didn’t make any changes, but this was mainly because I didn’t have the time. I didn’t want to tamper with my data while I was writing the paper, and after I had submitted it, I didn’t get around to going back through the list of errors I’d compiled to starting editing articles. Most of the corrections would have been for very minor problems, such as changing a general word (“he worked for”) to a word that more specifically reflected the source text (“he volunteered for”), or changing incorrect words for better translations, although the original version would have given users the gist of the meaning (e.g. “the caves have been exploited” vs. “the caves have been mined”). I had trouble justifying the need to invest several hours correcting details that wouldn’t really affect the overall meaning of the text, and yet this question still nagged at me. So I thought that instead I would write a blog post to see what others thought: what is more ethical, making the corrections myself, or leaving the articles as they are, to see how they change over time without my influence?

Some of my favourite talks from the CATS conference at Brock University

I’ve just returned from the 27th annual conference organized by the Canadian Association for Translation Studies, which was held at Brock University in St. Catharines, Ontario this year. The theme was “Translation: Territories, Memory, History”, and although a number of the talks addressed topics you might expect to find in this theme, namely the history of translated texts in regions like Asia, Latin America and Brazil, others were more broadly related, addressing subjects like the history of language technologies in Canada, or “new territories” like fansubbing norms. Since many of these topics are likely to interest to people who weren’t able to attend, I thought I would summarize some of my favourite presentations and offer a few thoughts on the wider implications of these research questions. Very roughly, the talks I most enjoyed can be grouped into three broad, and somewhat overlapping, categories that also match my own research interests: technological, professional and pedagogical concerns.

Technological Concerns

Two talks on technology-related topics were particularly intriguing: Geneviève Has, a doctoral candidate at Université Laval, spoke about the history of language technologies in Canada, focusing particularly on the role of the federal government in projects like TAUM-MÉTÉO, the very successful machine-translation system for meteorology texts, and RALI, a lab that developed programs like the bilingual concordancer TransSearch. Has explored some of the reasons why entire research labs or specific research projects had been dismantled, and noted that when emphasis is placed on producing marketable results within a set period of time, funding is often pulled from projects if the results are not what the funders are looking for, even if useful research is being produced by the lab. For instance, the quest to develop a machine translation system as successful as TAUM-MÉTÉO led to later systems being abandoned when the results were not as impressive.

Valérie Florentin, a doctoral candidate at the Université de Montréal, meanwhile, gave a fascinating talk on fansubbing norms, noting that in the English to French community she studied, online forum discussions between the fansubbers showed how they wanted to ensure the subtitles would be easily understood by francophones in various countries. Thus, they avoided regionalisms as well as expressions and cultural references they thought typical viewers would not understand. They also followed style guidelines to ensure the subtitles, on which various people had collaborated, would be consistent in terms of things like whether characters should use tu or vous to address one another. In her conclusions, she wondered whether the collaborative model used by this fansubbing community (in which about eight people translate and review the subtitles for any given episode) could be useful in professional communities. Recognizing that it would be unfeasible to expect companies to pay this many people to work on a project (even if each person was doing less work than they would if they prepared the subtitles alone), she argued that the model could be useful in training contexts, allowing students to debate with one another about cultural concerns and equivalents, while also following a set of style guidelines to ensure consistency in the final product. I found this suggestion particularly relevant to my own teaching, since I like to try collaborative models with my students, and since I have argued in other talks that crowdsourcing models often offer elements that could be adopted in professional translation, such as greater visibility for the translators who work on projects.

Professional Concerns

Marco Fiola, from Ryerson University and Aysha Abughazzi, from Jordan University of Science and Technology, both spoke on translation quality. While Marco’s presentation explored competing definitions of translation quality and specifically addressed issues like understandability and usability, Aysha spoke about translation quality in Jordan, discussing the qualifications of translators and the quality of translations she obtained from various agencies. Both of these talks underscored for me the difficulty translators and translation scholars continue to have when defining quality and in determining what “professional” translation should look like.

Pedagogical Concerns

Philippe Caignon, an associate professor at Concordia University, offered an excellent presentation on concept mapping and cognitive mapping, illustrating how these can be useful for students in terminology courses as an alternative to tree diagrams. Although he didn’t show the software itself, he did mention that Cmap Tools can be used to create concept maps fairly easily. As I listened to his talk, I decided I could incorporate concept mapping into the undergraduate Theory of Translation course I usually teach, to help students think about the terms translation and translation studies. I think examples like this one would help students see how they can visualize translation, and if they had a few minutes to work on their concept map individually before discussing their map with the rest of the class, I think we would be able to explore the different ways translation can be understood. More on this after I’ve tried it out in class.

On academic blogging

Some recent online articles weighing the pros and cons of academic blogging and academic publishing more broadly led me to reflect on my own reasons for blogging over the past 4 1/2 years.

One of the concerns academic bloggers have mentioned is that the writing they do for their blogs does not count as academic research: the posts are not peer-reviewed, so they will typically be counted as professional service rather than research in tenure and promotion assessments, even though blogs–being freely accessible online–are likely to reach a wider audience than a typical academic journal article. As one blogger noted, any time spent writing her blog was time not spent writing a peer-reviewed essay or a book that would “count” as research. And this is certainly something I have considered as well.

When I started this blog in 2009, I had a lot more time on my hands: I had just finished my PhD, was getting ready to teach three courses in the next academic year, and was looking forward to finally being able to write short posts in a single sitting, rather than trying to plow through a major project like a dissertation. Not unexpectedly, I posted much more actively than I did last year, for instance, when I taught five courses, wrote three journal articles and edited the book review section of another journal. But I still enjoy blogging, even if I don’t have as much time for it. And, in case any other academics are trying to decide whether it’s worth starting a blog, here’s a few reasons why I continue to post articles on this one:

  1. First, this blog has helped me connect with many people I would probably not otherwise have met: other researchers, of course, but also graduate students and non-academics from around the world. Over the last four years, several thousand people have visited the site. Some bloggers, of course, can attract that many visitors in a much shorter period, but I don’t have the time to write content more frequently and to promote the website more efficiently. And I’m happy with my readership figures: without this blog, I would not have been able to reach several thousand people who were otherwise interested in the topics I write about.
  2. Second, the blog is a great way to archive things I’m likely to want to look up again later. For instance, because I try to write about the conferences I’ve attended, I’m able to go back months or years later and double-check who said what at which event. I can also review what I was doing in my classes a few years ago and what I thought about it at the time. Without the blog, I probably wouldn’t have that kind of information at hand, since my conference notes would likely have ended up somewhere among the many stacks of papers covering my desk and filing cabinets.
  3. If, like me, you integrate your blog into a website (and WordPress allows you to do so very easily), you can also keep your CV up to date and provide links to (or full versions of) your articles. I realize that you could also do this via sites like Academia.edu, but I like having my own site, which gives me more control over the layout, structure, and kind of content I would like to include.
  4. Finally, with a blog, you can post material you’ve had to cut from longer papers but wouldn’t be able to develop into another full-length article. You can also work out ideas for projects you might later develop into a larger project, or reflect on topical issues that you’re never going to have time to develop into a full-length article. If you use your blog in this way, as I sometimes do, it becomes an extension of your writing activities, fodder for new work, and a platform to test out new ideas rather than a side-project taking you away from your “real” research.

These are my primary motivations for blogging, but I’m sure other bloggers could add more reasons to this list. In case you’d like to read other blogs about translation written from an academic’s perspective, here are a few of the blogs I follow that are written by people who are or were actively involved in Translation Studies:

Know of others? I’d be happy to update the list.

What translators can learn from the F/OSS community

Looking through my blog archives late last year, I was disappointed to discover I’d posted only seven articles in all of 2013: usually my goal is to get at least one post up every month, and last year was the first time since 2009 that I hadn’t been able to achieve that. So my goal for this year is to blog more frequently and more consistently. And with that, here is my first post of 2014:

In November, I came across a blog post that hit on a number of issues relevant to the translation industry, even though it was addressed to the Free/Open-Source Software (F/OSS) community. It’s called The Ethics of Unpaid Labour and the OSS Community, and it appeared on Ashe Dryden’s blog. Ashe writes and speaks about about diversity in corporations, and so her post focused on how unpaid OSS work creates inequalities in the workforce. As she argues, the demographics of OSS contributors differs from that of proprietary software developers as well as the general population, with white males overwhelmingly represented among OSS contributors: One source Ashe cites, for instance, remarks that only 1.5% of F/OSS contributors are female, compared to 28% of contributors for proprietary software. Ashe notes that lack of free time among groups that are typically marginalized in the IT sector (women, certain ethnic groups, people with disabilities, etc.) is the main reason these groups are under-represented in OSS projects.

These demographics are problematic for the workforce because many software developers require their employees (and potential new hires) to have contributed to F/OSS projects. And while some large IT firms do allow employees to contribute to such projects during work hours, people from marginalized groups often do not work at these kinds of companies. This means people who would like to find paid employment as software developers probably need to be able to devote unpaid hours to F/OSS projects so they have a portfolio of publicly available code for employers to consult.

So how is this relevant to translators, or the translation industry? It’s relevant because the same factors affecting the demographics of the F/OSS community are also likely to affect the demographics of the crowdsourced translation community. People can volunteer to translate the Facebook interface only if they have free time and access to a computer; likewise, people with physical disabilities that make interacting with a computer difficult are likely to spend less time participating in crowdsourced projects than people with no disabilities. And since, in many cases, the community of translators participating in a crowdsourced project will largely determine how quickly a project is completed, what texts are translated and what language pairs will be available, the profile of participants is important.

Unfortunately, we don’t have a lot of data about the profiles of people who participate in crowdsourced (or volunteer) projects. The studies that have been done do hint at a larger question worth exploring: O’Brien & Schäler’s 2010 article on the motivations of The Rosetta Foundation’s volunteer translators noted that the group of translators identifying themselves as “professionals” was overwhelmingly female (82%), while the gender of those identifying themselves as amateurs was more balanced (54% female). The source languages of the volunteers were mainly English, French, German and Spanish. My own survey of Wikipedia translators found that 84% of the respondents were male and 75% were younger than 36. Because both these projects show that people with certain profiles participated more than others, it’s clear there’s a need for more research. If we had a better idea of the profiles of those who participate in other crowdsourced translation projects, we would be able to get see whether some projects seem more attractive to one gender, which language pairs are most often represented, and what kinds of content is being translated for which language communities. And we could then try to figure out whether (and if so, how) to make these projects more inclusive.

Since it’s still a point of debate whether relying on crowdsourcing to translate the Twitter interface, a Wikipedia article or a TED Talk is beneficial to the translation industry, let’s leave that question aside for a moment and just consider the following: one of the benefits both novice and seasoned translators are supposed to be able to reap from participating in a crowdsourced project is visible recognition for their work. Online, accessible users profiles are common in crowdsourcing projects (as in this example from the TED translator community), and translators are usually given visible credit for their contributions to a project (as in this Webflakes article, which credits the author, translator and reviewer). If certain groups of people are more likely to participate in crowdsourced projects, that means this kind of visibility is available only to them. If we take a look at where the software development industry is headed (with employers actively seeking those who have participated in F/OSS projects, putting those who cannot freely share their code at a disadvantage), translators could eventually see a similar trend, putting those who are unable (or who choose not) to participate in crowdsourced projects at a disadvantage.

I think this is a point worth considering when crowdsourced projects are designed, and it’s certainly a point worth exploring in further studies, as it raises a host of ethical questions that deserve a closer look.

Translation in Wikipedia

I’ve been busy working on a new paper about translation and revision in Wikipedia, which is why I haven’t posted anything here in quite some time. I’ve just about finished now, so I’m taking some time to write a series of posts related to my research, based on material I had to cut from the article. Later, if I have time, I’ll also write a post about the ethics and challenges of researching translation and revision trends in Wikipedia articles.

This post talks about the corpus I’ve used as the basis of my research.

Translations from French and Spanish Wikipedia via Wikipedia:Pages needing translation into English

I wanted to study translation within Wikipedia, so I chose a sample project, Wikipedia:Pages needing translation into English, and compiled a corpus of articles translated in whole or in part from French and Spanish into English. To do this, I consulting recent and previous versions of Wikipedia:Pages needing translation into English, which has a regularly updated list split into two categories: the “translate” section, which lists articles Wikipedians have identified as having content in a language other than English, and the “cleanup” section, which lists articles that have been (presumably) translated into English but require post-editing. Articles usually require cleanup for one of three reasons: the translation was done using machine translation software, the translation was done by a non-native English speaker, or the translation was done by someone who did not have a good grasp of the source language. Occasionally, articles are listed in the clean-up section even though they may not, in fact, be translations: this usually happens when the article appears to have been written by a non-native speaker of English. (The Aviación del Noroeste article is one example). Although the articles listed on Wikipedia:Pages needing translation into English come from any of Wikipedia’s 285 language versions, I was interested only in the ones described as being originally written in French or Spanish, since these are my two working languages.

I started my research on May 15, 2013, and at that time, the current version of the Wikipedia:Pages needing translation into English page listed six articles that had been translated from French and ten that had been translated from Spanish. I then went back through the Revision History of this page, reading through the archived version for the 15th of every month between May 15, 2011 and April 15, 2013: that is, the May 15, 2011, June 15, 2011, July 15, 2011 all the way to April 15, 2013 versions of the page, bringing the sample period to a total of two years. In that time, the number of articles of French origin listed in either the “translate” or “clean-up” sections of the archived pages to a total of 34, while the total number of articles of Spanish origin listed in those two sections was 60. This suggests Spanish to English translations were more frequently reported on Wikipedia:Pages needing translation into English than translations from French. Given that the French version of the encyclopedia has more articles, more active users and more edits than the Spanish version, the fact that more Spanish to English translation was taking place through Wikipedia:Pages needing translation into English is somewhat surprising.

Does this mean only 94 Wikipedia articles were translated from French or Spanish into English between May 2011 and May 2013? Unlikely: the articles listed on this page were identified by Wikipedians as being “rough translations” or in a language other than English. Since the process for identifying these articles is not fully automated, many other translated articles could have been created or expanded during this time: Other “rough” translations may not have been spotted by Wikipedia users, while translations without the grammatical errors and incorrect syntax associated with non-native English might have passed unnoticed by Wikipedians who might have otherwise added the translation to this page. So while this sample group of articles is probably not representative of all translations within Wikipedia (or even of French- and Spanish-to-English translation in particular), Wikipedia:Pages needing translation into English was still a good source from which to draw a sample of translated articles that may have undergone some sort of revision or editing. Even if the results are not generalizable, they at least indicate the kinds of changes made to translated articles within the Wikipedia environment, and therefore, whether this particular crowdsourcing model is an effective way to translate.

So what are these articles about? Let’s take a closer look via some tables. In this first one, I’ve grouped the 94 translated articles by subject. Due to rounding, the percentages do not add up to exactly 100.

Subject Number of French articles Percentage of total (French) Number of Spanish articles Percentage of total (Spanish)
Biography 20 58.8% 20 33.3%
Arts (TV, film, music, fashion, museums) 3 8.8% 8 13.3%
Geography 2 5.8% 12 20%
Transportation 2 5.8% 4 6.7%

(includes company profiles)

2 5.8% 2 3.3%
Politics 1 2.9% 4 6.7%
Technology (IT) 1 2.9% 1 1.6%
Sports 1 2.9% 1 1.6%
Education 1 2.9% 1 1.6%
Science 1 2.9% 0 0%
Architecture 0 0% 2 3.3%
Unknown 0 0% 3 5%
Other 0 0% 2 3.3%
Total 34 99.5% 60 99.7%

Table 1: Subjects of translated articles listed on Wikipedia:Pages needing translation into English (May 15, 2011-May 15, 2013)

As we can see from this table, the majority of the translations from French and Spanish listed on Wikipedia:Pages needing translation into English are biographies—of politicians, musicians, actors, engineers, doctors, architects, and even serial killers. While some of these biographies are of historical figures, most are of living people. The arts—articles about TV shows, bands, museums, and fashion—were also a popular topic for translation. In the translations from Spanish, articles about cities or towns in Colombia, Ecuador, Spain, Venezuela and Mexico (grouped here under the label “geography”) were also frequent. So it seems that the interests of those who have been translating articles from French and Spanish as part of this initiative have focused on arts, culture and politics rather than specialized topics such as science, technology, law and medicine. That may explain why articles are also visibly associated with French- and Spanish-speaking regions, demonstrated by the next two tables.

I created these two tables by consulting each of the 94 articles, except in cases where the article had been deleted and no equivalent article could be found in the Spanish or French Wikipedias (marked “unknown” in the tables), and I identified the main country associated with the topic. A biography about a French citizen, for instance, was counted as “France”, as were articles about French subway systems, cities and institutions. Every article was associated with just one country. Thus, when a biography was about someone who was born in one country but lived and worked primarily in another, I labelled the article as being about the country where that person had spent the most time. For instance, http://en.wikipedia.org/wiki/Manuel_Valls was born in Barcelona, but became a French citizen over thirty years ago and is a politician in France’s Parti socialiste, so this article was labelled “France.”

Country/Region Number of articles
France 24
Belgium 2
Cameroon 1
Algeria 2
Canada 1
Western Europe 1
Switzerland 1
Romania 1
n/a 1
Total: 34

Table 2: Primary country associated with translations from French Wikipedia


Country Number of articles
Spain 13
Mexico 10
Colombia 10
Argentina 7
Chile 3
Venezuela 3
Peru 2
Ecuador 2
Nicaragua 1
Guatemala 1
Uruguay 1
United States 1
Cuba 1
n/a 1
Unknown 4
Total 60

Table 3: Primary country associated with translations from Spanish Wikipedia

Interestingly, these two tables demonstrate a marked contrast in the geographic spread of the articles: more than 75% of the the French source articles dealt with one country (France), while 75% of the Spanish source articles dealt with three (Spain, Colombia and Mexico), with nearly equal representation for each country. The two tables do, however, demonstrate that the vast majority of articles had strong ties to either French or Spanish-speaking countries: only two exceptions (marked as “n/a” in the tables) did not have a specific link to a country where French or Spanish is an official language.

I think it’s important to keep in mind, though, that even though the French/Spanish translations in Wikipedia:Pages needing translation into English seem to focus on biographies, arts and politics from France, Colombia, Spain and Mexico, translation in Wikipedia as a whole might have other focuses. Topics might differ for other language pairs, and they might also differ in other translation initiatives within Wikipedia and its sister projects (Wikinews, Wiktionary, Wikibooks, etc.). For instance the WikiProject:Medicine Translation Task Force aims to translate medical articles from English Wikipedia into as many other languages as possible, while the Category: Articles needing translation from French Wikipedia page lists over 9,000 articles that could be expanded with content from French Wikipedia, on topics ranging from French military units, government, history and politics to geographic locations and biographies.

I’ll have more details about these translations in the coming weeks. If you have specific questions you’d like to read about, please let me know and I’ll try to find the answers.

ACFAS Conference

I’ve just returned from Quebec City, where I was attending the 81st Congress of the Association francophone pour le savoir (ACFAS), which took place at the Université Laval this year. It was the first time I’d been to an ACFAS event, which, for those of you who might not know, is similar to the Congress of the Humanities and Social Sciences in that a number of conferences from different disciplines take place there, each organized by a different group of scholars. Unlike the Congress of the Humanities and Social Sciences, which is held at universities across Canada and is bilingual, ACFAS is usually hosted by Quebec universities and takes place entirely in French.

This year, three translation-related conferences were taking place at ACFAS, and I was able to attend two of them: La formation aux professions langagières : nouvelles tendances (Training Language Professionals: New Trends), which took place on Wednesday, and La traduction comme frontière (Translation as Borders), which took place Thursday and Friday. Unfortunately, I had to miss the third conference, Langues et technologies : chercheurs, praticiens et gestionnaires se donnent rendez-vous , (Languages and Technologies: A Meeting of Researchers, Practitioners and Managers), because it was taking place at the same time as the conference on translation as borders, where I was presenting a paper. But here are a few points I found interesting and useful at the two conferences I did manage to attend:

La formation aux professions langagières: Nouvelles tendances
This conference gave me a lot of practical ideas to integrate into my courses next year. For instance, I really enjoyed the presentation by Mathieu Leblanc, who carried out an ethonographic study at three Language Service Providers (one public and two private) several years ago. These three LSPs each had at least 35 employees, including new and experienced translators, and he spent one month at each one, conducting interviews and observing workplace practices. (Mathieu presented some of the data from this study at the CATS conference last year. I wrote about it in this post). Although his research goal had been to study translator attitudes toward tools like Translation Memories, the data he gathered during his fieldwork also allowed him to explore questions like “What do translators think about university training programs?” He noted that although both novice and experienced translators noted that university training was good overall, some areas could still be improved: students could be better prepared to meet the productivity demands they will encounter at the workplace, taught not to rely so extensively on tools like Translation Memories, and encouraged to be more critical of sources and translations.

The presentation by Université de Sherbrooke doctoral candidate Fouad El-Karnichi, focused on converting traditional courses to online environments, and I learned that other universities are using a variety of platforms to offer real-time translation courses online. At Glendon, we’ve adopted Adobe Connect for the Master of Conference Interpreting, but the Université du Québec à Trois-Rivières, is using Via for their new online BA in translation. I’ll have to take a look at it to see how it works. Fouad has just posted a few of his own thoughts on the ACFAS conference. You can read them on his blog here.

Finally, Éric Poirier, from the Université du Québec à Trois-Rivières, described a number of activities that could be integrated into a translation course to help familiarize students with online documentary resources like dictionaries, corpora, and concordancers. Here are a few of the activities I found interesting:

  • Have students use a corpus to find collocations for a base word (e.g. Winter + ~cold = harsh)
  • Have students read one of the language columns in Language Update and then translate the word that’s been discussed
  • Have students practice using dictionaries to distinguish between paronyms like affect and effect

In an online course, these kinds of activities could be integrated into the course website via an online form or a quiz that needs to be completed.

Other presentations were very interesting as well, but this post is getting a little long, and I also wanted to discuss some of the talks from the second conference.

La traduction comme frontière
Although several presenters cancelled their talks on the first day, we still had some very stimulating discussions about translation as borders, whether these borders are real, imagined, pragmatic, semantic, political, ideological or something else entirely. Two papers were particularly thought-provoking (at least to me): Chantal Gagnon, from the Université de Montréal, spoke about Canadian Throne Speeches since 1970, with particular emphasis on the words “Canada”, “Canadien/canadien” and “Canadian” in these speeches. The fact that the number of occurrences of these words in English and French differed was not really surprising, since Chantal had found similar differences in other Canadian speeches, but the fact that the 2011 Throne Speech under Prime Minister Harper differed from the others was very intriguing. Finally, Alvaro Echeverri, also from the Université de Montréal, raised some very illuminating questions about the limits of translation, particularly with respect to how we might define the term translation. Based on work by Maria Tymoczko, he proposed studying the corpus of texts before trying to determine what should be considered a translation: that way, researchers will know what kinds of translations/adaptations/inspirations to include.

So all in all, these three days in Quebec City were very stimulating, and I’m anxious to incorporate some of these ideas into my courses next year and my research this summer.