Feedback on the Google Neural MT Deception Post
There was an interesting discussion thread in Reddit about the Google deception post with somebody with the alias oneasasum that I thought was worth highlighting here, since it was the most coherent criticism of my original post.
Google makes MASSIVE progress on Machine Translation -- "We show that our GNMT system approaches the accuracy achieved by average bilingual human translators on some of our test sets."
This is a slightly cleaned up version of just our banter from the whole thread that you can see at the link above which also has other fun comments:
KV: Seriously exaggerated -- take a look at this for more accurate overview The Google Neural Machine Translation Marketing Deception ow.ly/Ii57304JV6S
KV: Absolutely, BLEU scores are deeply flawed but they are UNDERSTOOD and so they continue to be used as all the other metrics are even worse. I have written about this on my blog.
Google makes MASSIVE progress on Machine Translation -- "We show that our GNMT system approaches the accuracy achieved by average bilingual human translators on some of our test sets."
This is a slightly cleaned up version of just our banter from the whole thread that you can see at the link above which also has other fun comments:
KV: Seriously exaggerated -- take a look at this for more accurate overview The Google Neural Machine Translation Marketing Deception ow.ly/Ii57304JV6S
HE: You should have also posted this article, as you did on another Reddit forum:
https://slator.com/technology/hyperbolic-experts-weigh-in-on-google-neural-translate/
That's a much better take, in my opinion.
....
I saw the blog posting myself the other day. This isn't marketing deception, and most of what this guy covers in his piece, I also covered in mine -- with the exception of pointing out the "60%" and "87%" claims as not being meaningful. (My title may have given you a different impression, however.)
People in NLP are not impressed by the advances in theory or algorithm, as the results amount to a repackaging of methods developed over the past two years by the wider academic community; but are impressed by the scale of the effort, and by the results. See, for example, what Yoav Goldberg said on Twitter -- he said he's impressed by the results:
https://twitter.com/yoavgo/status/780849971407024128
http://spectrum.ieee.org/tech-talk/computing/software/google-translate-gets-a-deep-learning-upgrade
My read of what these researchers have written (and also what a Google software engineer or two wrote on Twitter, before deleting their comments), is that they are very excited by their work, and feel they have made genuine progress. What you are seeing is not "hype", but "excitement". But there is always a price to pay for showing emotion -- somebody will always try to bring you back down to earth.
The third thing I would say is that this is the first example of a large deployment of neural machine translation, according to Cho again:
http://spectrum.ieee.org/tech-talk/computing/software/google-translate-gets-a-deep-learning-upgrade
That, in and of itself, is praiseworthy.
HE: Well, I suppose they will work harder next time to find a better way to measure the quality of their system. Again, I don't think they were trying to deceive.
https://slator.com/technology/hyperbolic-experts-weigh-in-on-google-neural-translate/
That's a much better take, in my opinion.
....
I saw the blog posting myself the other day. This isn't marketing deception, and most of what this guy covers in his piece, I also covered in mine -- with the exception of pointing out the "60%" and "87%" claims as not being meaningful. (My title may have given you a different impression, however.)
People in NLP are not impressed by the advances in theory or algorithm, as the results amount to a repackaging of methods developed over the past two years by the wider academic community; but are impressed by the scale of the effort, and by the results. See, for example, what Yoav Goldberg said on Twitter -- he said he's impressed by the results:
https://twitter.com/yoavgo/status/780849971407024128
The GNMT results are cool. the BLEU not so much, only the human evals. But this is very hard to compare to other systems.Another example is Kyunghyan Cho, known for his work on neural machine translation:
http://spectrum.ieee.org/tech-talk/computing/software/google-translate-gets-a-deep-learning-upgrade
“I am extremely impressed by their effort and success in making the inference of neural machine translation fast enough for their production system by quantized inference and their TPU,” Cho says.The second thing I would say is that the research article is written by researchers, not Google marketing people. The Google marketing people have no sway over how researchers pitch their results in research articles.
My read of what these researchers have written (and also what a Google software engineer or two wrote on Twitter, before deleting their comments), is that they are very excited by their work, and feel they have made genuine progress. What you are seeing is not "hype", but "excitement". But there is always a price to pay for showing emotion -- somebody will always try to bring you back down to earth.
The third thing I would say is that this is the first example of a large deployment of neural machine translation, according to Cho again:
http://spectrum.ieee.org/tech-talk/computing/software/google-translate-gets-a-deep-learning-upgrade
That, in and of itself, is praiseworthy.
But he confirmed that Google seems to be the first to publicly announce its use of neural machine translation in a translation product.The fourth thing I would say is to take with a grain of salt comments by people from either a competing product or school of thought. Perhaps this doesn't apply here; but it's still good to keep it in mind. An example of this might be something like the following: say you have one group working on classical knowledge representation using small data. And then say a machine learning method with large amounts of data makes progress on a problem they care about. What are they going to say? Are they going to say, "That's really great that we are now making progress on this old, stubborn problem!"? No, more likely they'll say, "That's just empty hype. They're nowhere near to solving that problem, and if they really want to make progress they'll drop what they're doing and use some classical knowledge representation."
KV: While the sheer scale of the initiative both in terms of training data volume and ability to provide translations to millions of users at production scale is impressive, the actual translation quality results are really not that impressive and certainly do not warrant a claim such as “Nearly Indistinguishable From Human Translation” and “GNMT reduces translation errors by more than 55%-85% on several major language pairs “.
The translation improvement claims based on the human evaluation is where the problem lies. The validity of the human evaluation is the biggest question mark about the whole report. This is well known to people in the MT research community so to make the claims they did is disingenuous and even deceptive.
I agree they are doing it on a massive scale but actually, it is surprising that they seem to have gotten so little benefit in translation quality improvement as Rico Sennrich at the University of Edinburgh says in this post: https://slator.com/technology/hyperbolic-experts-weigh-in-on-google-neural-translate/
The translation improvement claims based on the human evaluation is where the problem lies. The validity of the human evaluation is the biggest question mark about the whole report. This is well known to people in the MT research community so to make the claims they did is disingenuous and even deceptive.
I agree they are doing it on a massive scale but actually, it is surprising that they seem to have gotten so little benefit in translation quality improvement as Rico Sennrich at the University of Edinburgh says in this post: https://slator.com/technology/hyperbolic-experts-weigh-in-on-google-neural-translate/
HE: Well, I suppose they will work harder next time to find a better way to measure the quality of their system. Again, I don't think they were trying to deceive.
One thing I would say, however, is that BLEU scores have problems, too. One problem is that even human translators sometimes have low BLEU scores (I had a better reference for this, but lost it, so will give this one):
http://homepages.inf.ed.ac.uk/pkoehn/publications/tralogy11.pdf
http://homepages.inf.ed.ac.uk/pkoehn/publications/tralogy11.pdf
Recent experiments computed so-called human BLEU scores, where a human reference translation scored against other human reference translations. Such human BLEU scores are barely higher (if at all) than BLEU scores computed for machine translation output, even though the human translations are better.
SO here is an example of the "nearly indistinguishable from human translation" GNMT of a Chinese web page that I just did with the new NMT engine, that just happens to talk about work that Baidu, Alibaba and Microsoft are doing. It is definitely better than looking at a page of Chinese characters (for me anyway) but clearly a very long way from human translation.
https://translate.google.com/translate?sl=auto&tl=en&js=y&prev=_t&hl=en&ie=UTF-8&u=http%3A%2F%2Fwww.chinatoday.com.cn%2Fchinese%2Feconomy%2Fnews%2F201606%2Ft20160603_800058589.html&edit-text=&act=url
HE: Yes, I saw those examples. This guy had posted a link to Twitter, before he deleted it:
https://mobile.twitter.com/WAWilsonIV/status/780824172624687104
He is a software engineer at Google, and was very excited by the results. But, yes, those particular examples weren't great. Not clear whether they were a random sample, or a sample showing the range of quality.
Also another fun thing that he noticed from the press coverage:
Here's a Technology Review article about it: Google’s New Service Translates Languages Almost as Well as Humans Can
This quote is priceless:
Take a look at that Chinese Newspaper sample above, which I ran today. Seriously what are these guys smoking and are they really so deluded? Yes, clearly they have done something that few can do in terms of using thousands of computers and solving a tough computing challenge. But of very little benefit for the guy who does not speak Chinese as the English they produce is STILL pretty hard to follow. This is the source page. And this is the translation I got today from the super duper human-like GNMT!
https://translate.google.com/translate?sl=auto&tl=en&js=y&prev=_t&hl=en&ie=UTF-8&u=http%3A%2F%2Fwww.chinatoday.com.cn%2Fchinese%2Feconomy%2Fnews%2F201606%2Ft20160603_800058589.html&edit-text=&act=url
HE: Yes, I saw those examples. This guy had posted a link to Twitter, before he deleted it:
https://mobile.twitter.com/WAWilsonIV/status/780824172624687104
He is a software engineer at Google, and was very excited by the results. But, yes, those particular examples weren't great. Not clear whether they were a random sample, or a sample showing the range of quality.
Also another fun thing that he noticed from the press coverage:
Here's a Technology Review article about it: Google’s New Service Translates Languages Almost as Well as Humans Can
This quote is priceless:
“It can be unsettling, but we've tested it in a lot of places and it just works,” he [Googler and co-author on the paper Quoc Le] says.
One more EMPTY PROMISE
Take a look at that Chinese Newspaper sample above, which I ran today. Seriously what are these guys smoking and are they really so deluded? Yes, clearly they have done something that few can do in terms of using thousands of computers and solving a tough computing challenge. But of very little benefit for the guy who does not speak Chinese as the English they produce is STILL pretty hard to follow. This is the source page. And this is the translation I got today from the super duper human-like GNMT!
Original Chinese Text:
语言服务支持"一带一路"且行且远
Comments
Post a Comment