Posts

Showing posts from April, 2019

Understanding MT Quality - What Really Matters?

Image
This is the second post in our posts series on machine translation quality. Again this is a slightly less polished and raw variant of a version published on the SDL site . The first one focused on BLEU scores , which are often improperly used to make decisions on inferred MT quality, where it clearly is not the best metric to draw this inference. The reality of many of these comparisons today is that scores based on publicly available (i.e. not blind) news domain tests are being used by many companies and LSPs to select MT systems which translate IT, customer support, pharma, financial services domain related content. Clearly, this can only result in sub-optimal choices. The use of machine translation (MT) in the translation industry has historically been heavily focused on localization use cases, with the primary intention to improve efficiency, that is, speed up turnaround and reduce unit word cost. Indeed, machine translation post-editing (MTPE) has been instrumental in helping loca

Understanding MT Quality: BLEU Scores

Image
This is the first in a series of posts discussing various aspects of MT quality from the context of enterprise use and value, where linguistic quality is important, but not the only determinant of suitability in a structured MT technology evaluation process. A cleaner, more polished, and shorter studio version of this post is available here . You can consider this post a first draft, or the live stage performance (stream of consciousness) version. What is BLEU (Bilingual Evaluation Understudy)? As the use of enterprise machine translation expands, it becomes increasingly more important for users and practitioners to understand MT quality issues in a relevant, meaningful, and accurate way. The BLEU score is a string-matching algorithm that provides basic output quality metrics for MT researchers and developers . In this first post, we will review and look more closely at the  BLEU score , which is probably the most widely used MT quality assessment metric in use by MT researchers and de