The Evolution in Corpus Analysis Tools

This is a guest post by Ondřej Matuška, t he Sales & Marketing Manager of Lexical Computing , a company that develops a corpus and language data analysis product called Sketch Engine . I was first made aware of Sketch Engine by Jost Zetzsche's newsletter (276th Edition of the Tool Box) a few weeks ago. As relatively clean text corpora proliferate and grow in data volume, it becomes necessary to use new kinds of tools to understand this huge volume of text data, which may or may not be under consideration for translation. These new tools help us to understand how to accurately profile the most prominent linguistic patterns in large collections of textual language data and extract useful knowledge from these new corpora to help in many translation related tasks. For those of us in the MT world, there have always been student-made (mostly by graduate students in NLP and computational linguistic programs ) tools that were used and needed to understand the corpus for better M...