Posts

Showing posts from April, 2018

The Data Security Issues Around Public MT - A Translator Perspective

Image
This is a guest post by Mats Linder on the data privacy and security issues around the use of public MT services in professional translator use scenarios. As I put this post together, I can hear Mark Zuckerberg giving his testimony on Capitol Hill to shockingly ignorant questions from legislators who don't really have a clue. This is not so different from the naive and somewhat ignorant comments I also see in blogs in the translation industry, on the data privacy issue with MT. The looming deadlines of the GDPR legislation have raised the volume of discussion on the privacy issue, but unfortunately not the clarity. GDPR will now result in some companies being fined, and since there is a possibility to calculate what it costs not to do it right, many companies are being much more careful, at least in Europe. But as the Guardian said: " If it’s rigorously enforced (which could be a big “if” unless data protection authorities are properly resourced) it could blow a massive hole i

UTH - Another Chinese Translation Memory Data Utility

Image
This is a guest post by Henry Wang of UTH. I include a brief interview I conducted before Henry wrote this post. I think this focus on developing a data marketplace is interesting as I happen to believe that the data used to train the machine learning systems is often more important than the algorithms themselves. The number of open source toolkits available for building Neural MT system is now almost 10.   I do not have a sense of whether the quality of the UTH data is better than other data utilities that exist and this post is not an endorsement of UTH by me. They, however, appear to be investing much more effort in cleaning the data, but I still feel that the metadata is still sorely lacking for real value to come from this data. And metadata is not just about domain classification. It will be interesting to see the quality of the MT systems that are built using this data, and that evidence will be the best indicator of the quality and value of this data to the MT community. These