Understanding the Realities of Language Data
This is a guest post by Luigi Muzii that focuses mostly on the various questions that surround Language Data, which by most “big data” definitions and volumes is really not what most in the community would consider big data. As the world hurtles into the brave new world that is being created by a growling volume of machine learning and AI applications, the question of getting the data right is often brushed aside. Most think the data is a solved problem or presume that data is easily available. However, those of us who have been working at MT seriously over the last decade understand this is far from a solved problem. Machines learn from data and smart engineers can find ways to leverage the patterns in data in innumerable ways. Properly used it can make knowledge work easier, or more efficient e.g. machine translation, recommendation, and personalization. The value and quality of this pattern learning can only be as good as the data used, and however exciting all this technology seem...