Translation Quality -- WannaCry?

For as long as I have been engaged with the professional translation industry, I have seen that there exist great confusion and ambiguity around the concept of  "translation quality". This is a services industry where nobody has been able to coherently define "quality" in a way that makes sense to a new buyer and potential customer of translation services. It also is, unfortunately, the basis of a lot of the differentiation claims made by translation agencies in competitive situations. Thus, is it surprising that many buyers of translation services are mystified and confused about what this really means?

To this day it is my sense that the best objective measures of  "translation quality", imperfect and flawed though they may be, come from the machine translation community.  The computational linguistics community have very clear definitions of adequacy and fluency that can be reduced to a number, and have the perfect order that mathematics provide.

The tranlsation industry is however reduced to confusing discussions, where ironically the words and terms used in the descriptions, are ambiguous and open to multiple interpretations. It is really hard to just say, " We produce translations that are accurate, fluent and natural," since we have seen that these words mean different things to different people.  To add to the confusion, translation output quality discussions are often conflated with translation process related issues. I maintain that the most articulate and generally useful discussion on this issue comes from the MT and NLP communities.

I feel compelled to provide something on this subject below that might be useful to a few, but I acknowledge that this remains an unresolved issue, that undermines the perceived value of the primary product that this industry produces.

Here are the basic criteria that a Translation Service Provider offering a quality service should fulfill:

a) Translation

  • Correct transfer of information from the source text to the target text.
  • Appropriate choice of terminology, vocabulary, idiom, and register in the target language.
  • Appropriate use of grammar, spelling, punctuation, and syntax, as well as the accurate transfer of dates, names, figures, etc. in the target language.
  • Appropriate style for the purpose of the text.

 b) Work process

  • Certification in accordance with national and/or international quality standards.

Gábor Ugray provides an interesting perspective on "Translation Quality" below and again raises some fundamental questions about the value of new fangled quality assessment tools, when we have yet to clarify why we do what we do. He also provides very thoughtful guidance on the way forward and suggests some things that IMO might actually improve the quality of the translation product. 
 
Quality definitions based on error counts etc.. are possibly useful to the dying bulk market as Gabor points out, and as he says, "real quality" comes from clarifying intent, understanding the target audience, long-term communication and writing experience, and from new in situ and in process tools, that enhance the translator work and knowledge-gained-via-execution experience that these new tools might provide. Humans learn and improve by watching carefully when they make mistakes, (how, why, where), not by keeping really accurate counts of errors made. 

We desperately need new tools that go beyond the TM and MT paradigm as we know it today, and really understand what might be useful and valuable to a translator or an evolving translation process. Fortunately, Gabor is in a place where he might get some to listen to these new ideas, and even try new implementations that actually produce higher quality.
 
The emphasis and callouts in his post below are almost all mine.



================


An idiosyncratic mix of human and machine translation might be the key to tracing down the notorious ransomware, WannaCry. What does the incident tell us about the translating profession’s prospects? A post on – translation quality.

.

Quality matters, and it doesn’t

Flashpoint’s stunning linguistic analysis[1] of the WannaCry malware was easily the most intriguing piece of news I read last week (and we do live in interesting times). This one detail by itself blows my mind: WannaCry’s ransom notice was dutifully localized into no less [2] than 28 languages. When even the rogues are with us on the #L10n bandwagon, what other proof do you need that we live in a globalized age?

But it gets more exciting. A close look at those texts reveals that only the two Chinese versions and the English text were authored by a human; the other 25 are all machine translations. A typo in the Chinese suggests that a Pinyin input method was used. Substituting 帮组 bāngzǔ for 帮助 bāngzhù is indicative of a Chinese speaker hailing from a southern topolect. Other vocabulary choices support the same theory. The English, in turn, “appears to be written by someone with a strong command of English, [but] a glaring grammatical error in the note suggests the speaker is non-native or perhaps poorly educated.” According to Language Log[3], the error is “But you have not so enough time.”
I find all this revealing for two reasons. One, language matters. With a bit of luck (for us, not the hackers), a typo and an ungrammatical sentence may ultimately deliver a life sentence for the shareholders of this particular venture. Two, language matters only so much. In these criminals’ cost-benefit analysis, free MT was exactly the amount of investment those 25 languages deserved.

This is the entire translating profession’s current existential narrative in a nutshell. One, translation is a high-value and high-stakes affair that decides lawsuits; it’s the difference between lost business and market success. Two, translation is a commodity, and bulk-market translators will be replaced by MT real soon. Intriguingly, the WannaCry story seems to support both of these contradictory statements.

Did the industry sidestep the real question?

I remember how 5 to 10 years ago panel discussions about translation quality were the most amusing parts of conferences. Quality was a hot topic and hotly debated. My subjective takeaway from those discussions was that (a) everyone feels strongly about quality, and (b) there’s no consensus on what quality is. It was the combination of these two circumstances that gave rise to memorable, and often intense, debates.

Fast-forward to 2017, and the industry seems to have moved on from this debate, perhaps admitting through its silence that there’s no clear answer.

Or is there? The heated debates may be over, but quality assessment software seems to be all the rage. There’s TAUS’s DQF initiative[4]. Its four cornerstones are (1) content profiling and knowledge base; (2) tools; (3) a quality dashboard; (4) an API. CSA’s Arle Lommel just wrote [5] about three new QA tools on the block: ContentQuo, LexiQA, and TQAuditor. Trados Studio has TQA, and memoQ has LQA, both built-in modules for quality assessment.

I have a bad feeling about this. Could it be that the industry simply forgot that it never really answered the two key questions, What is quality? and How do you achieve it? Are we diving headlong into building tools that record, measure, aggregate, compile into scorecards and visualize in dashboards, without knowing exactly what and why?


A personal affair with translation quality

I recently released a pet project, a collaborative website for a German-speaking audience. It has a mix of content that’s partly software UI, partly long-form, highly domain-specific text. I authored all of it in English and produced a rough German translation that a professional translator friend reviewed meticulously. We went over dozens of choices ranging from formal versus informal address to just the right degree of vagueness where vagueness is needed, versus compulsive correctness where that is called for.

How would my rough translation have fared in a formal evaluation? I can see the right kind of red flags raised for my typos and lapses grammar, for sure. But I cannot for my life imagine how the two-way intellectual exchange that made up the bulk of our work can be quantified. It’s not a question of correct vs. incorrect. The effort was all about clarifying intent, understanding the target audience, and making micro-decisions at every step of the way in order to achieve my goals through the medium of language.

Lessons from software development

The quality evaluation of translations has a close equivalent in software development.

CAT tools have automatic QA that spots typos, incorrect numbers, deviations from terminology, wrong punctuation and the like. Software development tools have on-the-fly syntax checkers, compiler errors, code style checkers, and static code analyzers. If that’s gobbledygook for you: they are tools that spot what’s obviously wrong, in the same mechanical fashion that QA checkers in CAT tools spot trivial mistakes.

With the latest surge of quality tools, CAT tools now have quality metrics based on input from human evaluators. Software developers have testers, bug tracking systems and code reviews that do the same.

But that’s where the similarities end. Let me key you in on a secret. No company anywhere evaluates or incentivizes developers through scorecards that show how many bugs each developer produced.

Some did try, 20+ years ago. They promptly changed their mind or went out of business.[6]

Ugly crashes not withstanding, the software industry as a whole has made incredible progress. It is now able to produce more and better applications than ever before. Just compare the experience of Gmail or your iPhone to, well, anything you had on your PC in the early 2000s.

The secret lies in better tooling, empowering people, and in methodologies that create tight feedback loops.

Tooling, empowerment, feedback

In software, better tooling means development environments that understand your code incredibly well, give you automatic suggestions, allow you to quickly make changes that affect hundreds of files, and to instantly test those changes in a simulated environment.

No matter how you define quality, in intellectual work, it improves if people improve. People, in turn, improve through making mistakes and learning from them. That is why empowerment is key. In a command-and-control culture, there’s no room for initiative; no room for mistakes; and consequently, no room for improvement.

But learning only happens through meaningful feedback. That is a key ingredient of methodologies like agile. The aim is to work in short iterations; roll out results; observe the outcome; adjust course. Rinse and repeat.


Takeaways for the translation industry

How do these lessons translate (no pun intended) to the translation industry, and how can technology be a part of that?

The split. It’s a bit of an elephant in the room that the so-called bulk translation market is struggling. Kevin Hendzel wrote about this very in dramatic terms in a recent post[7]. There is definitely a large amount of content where clients are bound to decide, after a short cost-benefit analysis, that MT makes the most sense. Depending on the circumstances it may be generic MT or the more expensive specialized flavor, but it will definitely not be human translators. Remember, even the WannaCry hackers made that choice for 25 languages.

But there is, and will always be, a massive and expanding market for high-quality human translation. Even from a purely technological angle, it’s easy to see why MT systems don’t translate from scratch. They extrapolate from existing human translations, and those need to come from somewhere.

My bad feeling. I am concerned that the recent quality assessment tools make the mistake of addressing the fading bulk market. If that’s the case, the mistake is obvious: no investment will yield a return if the underlying market disappears.
 . Source: TAUS Quality Dashboard [link]


Why do I think that is the case? Because the market that will remain is the high-quality, high-value market, and I don’t see how the sort of charts shown in the image above will make anyone a better translator.

Let’s return to the problems with my own rough translation. There are the trivial errors of grammar, spelling and the like. Those are basically all caught by a good automatic QA checker, and if I want to avoid them, my best bet is a German writing course and a bit of thoroughness. That would take me to an acceptable bulk translator level.

As for the more subtle issues – well, there is only one proven way to improve there. That way involves translating thousands of words every week, for 5 to 10 years on end, and having intense human-to-human discussions about those translations. With that kind of close reading and collaboration, progress doesn’t come down to picking error types from a pre-defined list.

Feedback loops. Reviewer-to-translator feedback would be the equivalent of code reviews in software development, and frankly, that is only part of the picture. That process takes you closer to software that is beautifully crafted on the inside, but it doesn’t take you closer to software that solves the right problems in the right way for its end users. To achieve that, you need user studies, frequent releases and a stable process that channels user feedback into product design and development.

Imagine a scenario where a translation’s end users can send feedback, which is delivered directly to the person who created that translation. I’ll key you in on one more secret: this is already happening. For instance, companies that localize MMO (massively multiplayer online) games receive such feedback in the form of bug reports. They assign those straight to translators, who react to them in a real-time collaborative translation environment like memoQ server. Changes are rolled out on a daily basis, creating a really tight and truly agile feedback loop.

Technology that empowers and facilitates. For me, the scenario I just described is also about empowering people. If, as a translator, you receive direct feedback from a real human, say a gamer who is your translation’s recipient, you can see the purpose of your work and feel ownership. It’s the agile equivalent of naming the translator of a work of literature.

If we put metrics before competence, I see a world where the average competence of translators stagnates. Instead of an upward quality trend throughout the ecosystem, all you have is a fluctuation, where freelancers are data points that show up on this client’s quality dashboard today, and a different client’s tomorrow, moving in endless circles.

I disagree with Kevin Hendzel on one point: technology definitely is an important factor that will continue to shape the industry. But it can only contribute to the high-value segment if it sees its role in empowerment, in connecting people (from translators to end users), in facilitating communication, and in establishing tight and actionable feedback loops. The only measure of translation quality that everyone agrees on, after all, is fitness for purpose.

References

[1] Attribution of the WannaCry ransomware to Chinese speakers. Jon Condra, John Costello, Sherman Chu
https://www.flashpoint-intel.com/blog/linguistic-analysis-wannacry-ransomware/
[2] Fewer, for the pedants.
[3] Linguistic Analysis of WannaCry Ransomware Messages Suggests Chinese-Speaking Authors. Victor Mair
http://languagelog.ldc.upenn.edu/nll/?p=32886
[4] DQF: Quality benchmark for our industry. TAUS
https://www.taus.net/evaluate/dqf-background
[5] Translation Quality Tools Heat Up: Three New Entrants Hope to Disrupt the Industry. Arle Lommel, Common Sense Advisory blog.
http://www.commonsenseadvisory.com/Default.aspx?Contenttype=ArticleDetAD&tabID=63&Aid=39177&moduleId=390
[6] Incentive Pay Considered Harmful. Joel On Software, April 3, 2000
https://www.joelonsoftware.com/2000/04/03/incentive-pay-considered-harmful/
[7] Creative Destruction Engulfs the Translation Industry: Move Upmarket Now or Risk Becoming Obsolete. Kevin Hendzel, Word Prisms blog.
http://www.kevinhendzel.com/creative-destruction-engulfs-translation-industry-move-upmarket-now-risk-becoming-obsolete/


Gábor Ugray is co-founder of Kilgray, creators of the memoQ collaborative translation environment and TMS. He is now Kilgray’s Head of Innovation, and when he’s not busy building MVPs, he blogs at jealousmarkup.xyz and tweets as @twilliability.

Comments

Popular posts from this blog

Full Stack Development Roadmap For 2020

The Growing Interest & Concern About the Future of Professional Translation

Business Benefites Of Hiring Offshore Development Services In 2019