Posts

Showing posts with the label BLEU

In a Funk about BLEU

Image
This is a more fleshed-out version of a blog post by Pete Smith and Henry Anderson of the University of Texas at Arlington already published on SDL.com . They describe initial results from a research project they are conducting on MT system quality measurement and related issues.  MT quality measurement, like human translation quality measurement, has been a difficult and challenging subject for both the translation industry and for many MT researchers and systems developers as the most commonly used metric BLEU, is now quite widely understood to be of especially limited value with NMT systems.  Most of the other text-matching NLP scoring measures are just as suspect, and practitioners are reluctant to adopt them as they are either difficult to implement, or the interpretation pitfalls and nuances of these other measures are not well understood. They all can generate a numeric score based on various calculations of Precision and Recall that need to be interpreted with great c...

Understanding MT Quality - What Really Matters?

Image
This is the second post in our posts series on machine translation quality. Again this is a slightly less polished and raw variant of a version published on the SDL site . The first one focused on BLEU scores , which are often improperly used to make decisions on inferred MT quality, where it clearly is not the best metric to draw this inference. The reality of many of these comparisons today is that scores based on publicly available (i.e. not blind) news domain tests are being used by many companies and LSPs to select MT systems which translate IT, customer support, pharma, financial services domain related content. Clearly, this can only result in sub-optimal choices. The use of machine translation (MT) in the translation industry has historically been heavily focused on localization use cases, with the primary intention to improve efficiency, that is, speed up turnaround and reduce unit word cost. Indeed, machine translation post-editing (MTPE) has been instrumental in helping loca...

Understanding MT Quality: BLEU Scores

Image
This is the first in a series of posts discussing various aspects of MT quality from the context of enterprise use and value, where linguistic quality is important, but not the only determinant of suitability in a structured MT technology evaluation process. A cleaner, more polished, and shorter studio version of this post is available here . You can consider this post a first draft, or the live stage performance (stream of consciousness) version. What is BLEU (Bilingual Evaluation Understudy)? As the use of enterprise machine translation expands, it becomes increasingly more important for users and practitioners to understand MT quality issues in a relevant, meaningful, and accurate way. The BLEU score is a string-matching algorithm that provides basic output quality metrics for MT researchers and developers . In this first post, we will review and look more closely at the  BLEU score , which is probably the most widely used MT quality assessment metric in use by MT researchers an...

The Problem with BLEU and Neural Machine Translation

Image
There has been a great deal of public attention and publicity given to the subject of Neural Machine Translation in 2016. While experimentation with Neural Machine Translation (NMT) has been going on for the last several years, 2016 has proven to be the year that NMT broke through and became a big deal, and became more widely understood to be of great merit outside of the academic and research community, where it was already understood that NMT has great promise for some years now. The reasons for the sometimes excessive exuberance around NMT are largely based on BLEU (not BLUE) score improvements on test systems which are sometimes validated by human quality assessments. However it has been understood by some that BLEU, which is still the most widely used measure of quality improvement, can be misleading in its indications when it is used to compare some kinds of MT systems. The basis for the NMT optimism is related both to the very slow progress in recent years with improving ...