The Building Momentum for Post-Edited Machine Translation (PEMT)

This is an (opinionated) summary of interesting findings from a flurry of conferences that I attended earlier this month. The conferences were the TAUS User Conference, Localization World and tekom. Even though it is tiring to have so many so close together, it is interesting to see what sticks out a few weeks later. For me TAUS and tekom were clearly worthwhile, and Localization World was not, and I believe that #LWSV is an event that is losing it’s mojo in spite of big attendance numbers.

Some of the big themes that stand out (mostly from TAUS) were:
  • Detailed case studies that provide clear and specific evidence that customized MT enhances and improves the productivity of traditional (TEP) translation processes
  • The Instant on-demand Moses MT engine parade
  • Initial attempts at defining post-editing effort and difficulty from MemoQ and Memosource
  • A future session on the multilingual web from speakers who actually are involved with big perspective, global web-wide changes and requirements
  • More MT hyperbole
  • The bigger context and content production chain for translation that is visible at tekom
  • Post-editor feedback at tekom
  • The lack of innovation in most of the content presented at Localization World
 The archived twitter stream from TAUS (#tausuc11) is available here, the tekom tag is #tcworld11 and Localization World is #lwsv. Many of the TAUS presentations will be available as web video shortly and I recommend that you check some of them out.


PEMT Case Studies
In the last month I have seen several case studies that document the time and cost savings and overall consistency benefits of good customized MT systems. At TAUS, Caterpillar indicated that their demand for translation was rising rapidly and thus they instituted their famed controlled language (Caterpillar English) based translation production process using MT. The MT process was initially more expensive since 100% of the segments needed to be reviewed but they are now seeing better results on their quality measurements from MT than from human translators on Brazilian Portuguese and Russian according to Don Johnson, Caterpillar. They expect to expand to new kinds of content as these engines mature.

Catherine Dove of PayPal described how the human translation process got bogged down on review and rework cycles (to ensure PayPal brand’s tone and style was intact) and was unable to meet production requirements of 15K words per week with a 3 day turnaround in 25 languages. They found that “machine-aided human translation” delivers better, more consistent terminology in the first pass and thus they were able to focus more on style and fluency. Deadlines are easier to meet and she also commented that MT can handle tags better than humans. They also focus on source cleanup and improvement to leverage the MT efforts and interestingly the MT is also useful in catching errors in the authoring phase. PayPal uses an “edit distance” measurement to determine the amount of rework and have found that the MT process reduces this effort by 20% on 8 of 10 languages they are using MT on. An additional benefit is that there is a new quality improvement process in place that should continue to yield increasing benefits.

A PEMT user case study was also presented by Asia Online and Sajan at the Localization Research Conference in September 2011. The global enterprise customer is a major information technology software developer, hardware/IT OEM manufacturer, and comprehensive IT services provider for mission critical enterprise systems in 100+ countries. This company had a legacy MT system developed internally that had been used in the past by the key customer stakeholders. Sajan and Asia Online customized English to Chinese and English to Spanish engines for this customer. These MT systems have been delivering translated output that even beats the first pass output from their human translators due to the highly technical terminology, especially in Chinese.  A summary of the use case is provided below:
  • 27 million words have been processed by this client using MT
  • Large amounts of quality TM (many millions of words) and glossaries were provided and these engines are expected to continue to improve with additional feedback.
  • The customized engine was focused on the broad IT domain and was intended to translate new documentation and support content from English into Chinese and Spanish.
  • A key objective of the project was to eliminate the need for full translation and limit it to MT + Post-editing as a new modified production process.
  • The custom engine output delivered higher quality than their first pass human translators especially in Chinese
  • All output was proof read to deliver publication quality.
  • Using Asia Online Language Studio the customer saved 60% in costs and 77% in time over previous production processes based on their own structured time and cost measurements.
  • The client also produces an MT product, but the business units prefer to use Asia Online because of considerable quality and cost differences.
  • Client extremely impressed with result especially when compared to the output of their own engine.
  • The new pricing model enabled by MT creates a situation where the higher the volume the more beneficial the outcome.
The video presentation below by Sajan begins at 27 minutes (in case you want to skip over the Asia Online part) and even if you only watch the Sajan presentation for 5 minutes you will get a clear sense for the benefit delivered by the PEMT process.

Popular posts from this blog

Understanding MT Quality: BLEU Scores

Full Stack Development Roadmap For 2020

Machine Translation at Volkswagen AG