Crowdsourcing: One of the top two threats to professional translators?

According to a recent recent article in Translorial, the journal of the Northern California Translators Association, the American Translators Association Board had just declared crowdsourcing one of the top two threats to the profession and the association. It was tied with the economic downturn.

A companion piece that was also part of the February 2010 issue of Translorial offers a brief summary—and a link to the video recording—of a talk from the 2009 general meeting of the Northern California Translators Association. The talk was entitled “New Trends in Crowdsourcing: The Kiva/Idem Case Study,” and it was given by Monica Moreno, localization manager at Idem Translations, and Naomi Baer, Director of Micro-loan Review and Translation at a not-for-profit microfinancing organization called Kiva. (Baer, incidentally, is also the author of the first Translorial article I cited).

Despite the ATA’s rather dour opinion of crowdsourcing, both the Translorial article and the presentation by Moreno and Baer offer a fairly positive view of the opportunities crowdsourcing provides not just to the companies that turn to volunteers for their translation needs, but also to web users, minority-language communities, and even professional translators. After all, as Moreno and Baer noted, languages that are considered Tier 2 or lower by corporations are often used in crowdsourcing initiatives. Just look at the TED Open Translation Project , one of the crowdsourcing initiatives cited in the presentation.

As of March 26, 2010, TEDTalks have been subtitled into more than 70 languages, including Swahili, Tagalog, Tamil, Icelandic and Hungarian. More than 400 talks have been subtitled in Bulgarian, nearly 300 in Arabic, and more than 200 in Romanian, Polish and Turkish. And these figures compare favourably with traditional Tier 1 languages: French (304 talks), Italian (263 talks), German (195 talks) and Spanish (575 talks). By comparison, large localization projects by commercial organizations don’t usually offer as many languages: Of Google, Microsoft and Coca-Cola, which topped the 2009 Global Brands ranking published in the Financial Times, Coca-Cola appears to have been localized for the most country and language pairs, with a whopping 124 countries and 141 locales, while Microsoft is a close second at 124 locales. However, many of the links to Coca-Cola sites (e.g. nearly all of the 44 African locales) actually take users to the US English site, so Coke probably offers closer to just under 100 locales, many of which (e.g. 13 of the 30 Eurasia locales) are actually English-language versions. Likewise, IBM, the fourth-ranked brand, offers 100 locales, but 49 of them are English-language versions, and another 10 are in Spanish. So, while some of the largest brands initially appear to have targeted more linguistic groups, the TEDTalks have actually been made available in more languages.

In addition, smaller linguistic communities within a region are not often targeted by the larger corporations, as these groups may not have the purchasing power to justify translation costs. Microsoft, Coca-Cola, IBM, Procter & Gamble, and Ikea, for instance, all offer their Spain websites only in Spanish, while some TED videos (as well as Google) are available in Catalan and Galician. With non-profit initiatives, where users may feel driven to contribute their time to support a particular cause or to make available information (like the TED talks) that would otherwise be inaccessible to those who don’t speak the source language, crowdsourcing can help reduce the language hierarchy that for-profit localization initiatives encourage: the translations are user-generated and sometimes user-initiated, so as long as enough members of a community feel committed to making information available, they will provide translations into so-called major and minor languages without worrying about a return on investment. What we need now, then, is more research into the quality of the translations produced by volunteer, crowdsourced efforts. Making information available in more languages is laudable, but if the translations are inaccurate, contain omissions or have added information, then the crowdsourcing model may not be as advantageous as it appears.

The presentation by Moreno and Baer also offered a few insights into the motivations of volunteer translators: some wanted to give back to the community, others wanted to mentor student or amateur translators without having to make a significant time commitment, while others saw it as a networking opportunity. As Baer noted, her volunteer efforts for Kiva eventually landed her a paid job with the organization. These anecdotal details about translator motivations underscored (at least for me) the need to systematically research the motivations of the people involved in crowdsourced translation projects. I think it’s worth comparing the motivations of those involved in non-for-profit initiatives like TED, Kiva, or Global Voices (which I’ve discussed in a previous post) and those involved in initiatives launched by for-profit companies such as Facebook. I suspect that motivations would differ, but a survey of the volunteers could confirm or refute this hypothesis.

Overall, the presentation by Moreno and Baer is definitely worth watching if you’re at all interested in crowdsourcing and translation. It’s available on Vimo at this address:

An app for web localization research

This morning, while reading the Globe and Mail, I came across an article reviewing a Firefox application that could be useful to localization researchers.

One of the difficulties in localization research is the unstable nature of the object being studied: when researchers study a print source text, it generally stays the same. True, some printed texts, like Don Quixote, do not have one definitive source text edition, and others, such as annual reports, are updated on a regular basis; however, the websites of large corporations and global brands pose a particular problem because they rarely stay the same for long, and previous versions are sometimes lost to the public.

For instance, when I was researching my last paper comparing how global and local brands depict Canada on their websites, I found spelling errors in the French version of the Pampers advertisements. When I went back to check on those errors a month later, they had been corrected. Microsoft had an ad I considered discussing, but when I went back the next day to take a better look at it, it had been replaced by something else. Granted, I could have downloaded the websites to my computer or to a server so that I could examine them at my leisure without worrying about when they might be updated, but that requires a great deal time and disk space (since many of the websites I studied contained sizeable graphics and video files). Moreover, I would always wonder whether I should delete the archives or not… after all, who know when I might need to refer back to a site, or even compare the previous version with the new one?

The Internet Archive’s Wayback Machine allows Internet users to browse through archived versions of various websites. Simply enter a URL, and you’ll be taken to a page that lists all available versions in chronological order. A new application for Firefox, which can be downloaded here improves on the Wayback Machine: it consults the Internet Archive database on your behalf and indicates in the lower-right-hand corner of your browser whether an archived version of the site you’re currently browsing is available, and if so, how many snapshots can be found. By moving the scroll bar, you can select the date of the archive you’d like to view. This means that you don’t have to go directly to the Internet Archive site every time you’d like to find out whether a older version of a particular website is available.

The only drawback is that the WaybackFox application is still in the experimental stage. It’s quite slow, for one thing: I waited over three minutes for a 2005 version of the Globe and Mail homepage to load. And it has some bugs: some of the images are just broken files, and as the Globe and Mail reviewer pointed out, the dates on the scroll bar don’t always match the archived versions that actually load up. However, I see a lot of potential for this tool for localization researchers. For one thing, even in its current pre-alpha release, the WaybackFox application allows researchers to quickly determine whether an archived version of a site exists, how many versions are available, and how far back the archives stretch. This information alone allows someone to decide whether to include a particular website in a longitudinal study of how localization habits have changed over time. As the application is improved, I hope it would help researchers compare how the localized versions of a website have changed over time, and then to compare these results with the changes on other sites to determine whether certain localization trends can be identified and whether such trends are particular to certain locales, brands, industries, etc.

Translation in Global News

Recently, I’ve been reading Esperança Bielsa and Susan Bassnett’s Translation in Global News, which I’m reviewing for TTR. I came across the following paragraph about news translation, which also applies to website localization:

What research in this field [news translation] is starting to show is that translation is one element in a complex set of processes whereby information is transposed from one language into another and then edited, rewritten, shaped and repackaged in a new context, to such a degree that any clear distinction between source and target ceases to be meaningful. This is in total contrast to more established research into translation practice, particularly in the field of literary translation, where discussion is always in some way focussed around the idea of the binary distinction between source and target texts. Research into news translation poses questions about the very existence of a source and hence challenges established definitions of translation itself (Bielsa & Bassnet 2009: 11).

One of the challenges when studying localized websites is comparing the “source” site with that of the target locale. In some cases, various versions of a company or brand website were clearly developed with a general template. For example, the Pampers Canada and Pampers US sites have nearly identical layouts, images, colours and advertisements:

This is the French-Canadian site:

Here’s the English-Canadian site:

And the US English site:

As you can see, except for the fact that the French Canada site is missing the purple “Shop” tab in the menu bar and the banners in the upper portion of each site, (which differ only because I took the three screen shots at different points in the four-ad cycle), the sites are visually identical. In this case, it’s probably safe to assume that the US English site served as the source for the Canadian English site, since the Pampers brand is owned by US-based Procter & Gamble. The French Canada site was likely translated based on the English-Canadian content (an assumption supported by the fact that the French-Canadian banner ad still has some English text in it, as the first screen shot shows).
But what happens with other sites, when the layouts are not as similar?
Consider the AT&T US website:
And the AT&T Canada site:

Can we still talk about a source website? A researcher I met at the LISA@Berkeley conference told me that he looks at the source code of websites localized for Spanish-speaking locales to see whether it contains traces of English (e.g. comments by the programmers) to help confirm that the local Spanish-language websites were created based on the US/UK English-language sites. That works when the source and target locales have different languages, but not with English Canada and the United States.

Website localization does pose some problems for translation studies research, since the source locale cannot always be confirmed. In some cases, it’s easier just to analyze the site of a particular target locale (e.g. English Canada) without comparing it to the assumed source website. This reduces the kinds of analysis that can be done (studies based on parallel corpora, for instance, become difficult), but it still allows one to draw conclusions about how a given locale is being targeted through images and colours, or even through the text prepared for the target-locale users.


Bielsa, Esperança & Susan Bassnett. (2009) Translation in Global News. London; New York: Routledge.

LISA @ Berkeley

Recently, I returned from the Berkeley Globalization Conference at the University of California. The conference was co-hosted by LISA and the University of California, Berkeley, and it was the first LISA event to be targeted at both academics and industry professionals rather than just the latter. The format worked well, I think, because, unlike many of the academic conferences I’ve attended in the past, industry professionals were in the audience and could give feedback with a different perspective. I know that some of the questions and comments I received after my presentation on websites localized for Canadians were very useful, and I’ll be addressing these views in the revised paper I’m polishing up for Translation Studies.

I really enjoyed two presentations in particular, as they were closely related to my own area of research: one studied websites localized for two French-speaking locales (Belgium and France), while the other studied websites localized for Spanish-speaking users in various countries. The first presentation relied heavily on Hofstede’s cultural dimensions, with which I have some quibbles (but more on that in a future post), and the second used corpus linguistics to compare the content of localized and domestic websites, demonstrating that localized websites rarely look and feel the same as locally produced sites. Both gave me some new ideas for methodological approaches to analyzing localized sites, something I came to the conference seeking.

A third presentation pointed out some works I hadn’t heard of before, while a fourth brought up some more examples of for-profit websites seeking crowdsourced localization. This is an area that as intrigued me since I started studying translator blogs and learned about the LinkedIn controversy. I think there’s great potential for some scholarly work in this area. Two questions that come to mind are: what are translator attitudes toward crowdsourcing by for-profit and not-for profit companies and how does crowdsourcing affect translation quality (i.e. does community approval help improve the translation?)

Localizing for Quebec

One of my many in-progress-but-on-the-back-burner projects is studying the ways in which websites are localized for Quebec. I am particularly interested in how and when Quebec is targeted separately from the rest of (French) Canada. Yahoo!, for example, has been localized for English Canada and French Quebec, which ignores the official-language minority groups: the French speakers outside Quebec and the English speakers inside it. And while the latest Yahoo! International page does list Yahoo! Quebec as a locale within Canada (in the map) and as a special Yahoo! homepage within the Americas (in the list of country names), not too long ago, the page was a little different:

Yahoo! Quebec map

In this version, the table above the map lists available locales in alphabetical order. In the Americas column are: Argentina, Brazil, Canada, Mexico, Quebec and the United States. Unlike the rest of these locales, however, Quebec is not an independent country. And unlike the various US locales (US in Chinese, US in Russian), which are clearly variations of the larger US-EN locale, Québec was not listed as a subset of Canada. What is interesting is that only Canada and the United States had separate links for the various languages in which their sites are available. Even though Yahoo! Switzerland is available in French, German and Italian, only one hyperlink (to the Swiss-German site) was listed in the text box, and this has stayed the same in the updated Yahoo! International page. Click on the link in the first paragraph to see for yourself.

By contrast, in the map, a part/whole relationship between Quebec and Canada was apparent, as a dotted arrow led from Canada to Y! Québec, just as dotted arrows led from United States to US in Chinese and Y! Telemundo. Switzerland, by contrast, had only one hyperlink on the interactive map, and no indication that three locales were actually available: Swiss-French, Swiss-German and Swiss-Italian.

Does this have political implications? Possibly. In the previous table, Yahoo! was depicting Quebec not just as a distinct locale (one in which French is spoken) but as a nation with the same status as independent countries like Canada and Argentina. And, in both the old and new Yahoo! International pages, Yahoo! has indicated that Quebec is either the only area in Canada where French is spoken, or the only one important enough to target, since it is the only French-language version available for Canadian Internet users. Where does that leave the French-speaking minorities outside the province? There’s some intriguing room for research here, but I’d like to collect some more examples of Quebec as a targeted locale before I draw any conclusions. Any thoughts would be welcome.