Posts

Showing posts from November, 2016

The Critical Importance of Simplicity

Image
This is a post by Luigi Muzii that was initially triggered by this post  and this one, but I think it has grown into a broader comment on a key issue related to the successful professional use of MT i.e. the assessment of MT quality and the extent, scope, and management of the post-editing effort. Being able to get a quick and accurate assessment of the specific quality at any given time in a production use scenario is critical, but the assessment process itself cannot be so cumbersome and so complicated a process that the measurement effort becomes a new problem in itself. While we see that industry leaders and academics continue to develop well meaning but very difficult to deploy (efficiently and cost-effectively) metrics like MQM and DQF, most practitioners are left with BLEU and TER as the only viable and cost-effective measures. However, these easy-to-do metrics have well-known bias issues with RbMT and now with NMT. And given that this estimati...

The Thanksgiving Myth

Image
Thanksgiving is fundamentally about giving thanks. Though, according to Wikipedia and what we are generally told in the US, it has associations with Pilgrims, Puritans and being a harvest festival in the US. For Native Americans, the story of Thanksgiving is not a very happy one . “Thanksgiving” has become a time of mourning for many Native People. It serves as a period of remembering how a gift of generosity was rewarded by theft of land and seed corn, extermination of many Native people from disease, and near total elimination of many more from forced assimilation. As celebrated in America “Thanksgiving” is a reminder of 500 years of betrayal. To many Native Americans, the Thanksgiving Myth amounts to the settler’s justification for the genocide of Indigenous peoples. Native Americans think of this official U.S. celebration of the survival of early arrivals in a European invasion that culminated in the death of 10+ million native people. Here is a  view of how one Na...

Understanding Your Data Using Corpus Analysis

Image
If you were surprised by the outcome of the recent US Presidential elections, you can imagine the surprise of the “expert” pollsters whose alleged expertise it is, to predict these events. These predictions were based on an understanding of the population (the data), which in this case meant predicting how 120 million people would vote based on a sample of 25,000 or maybe 100,000 people who are assumed to be a representative sample . They were all wrong because the sample was simply not representative of the actual voting population. So it goes. It is very easy to go wrong with big data even when you have deep expertise. This is not so different from Google claiming “human quality MT” based on a sample of 500 sentences. Unfortunately for them, it is just not true once you step away from these 500 sentences. The real world is much more unpredictable. This is a guest post by Juan Rowda about Corpus Analysis which is a technical way of saying ...