A Deep Dive into SYSTRAN’s Neural Machine Translation (NMT) Technology
One of the wonderful things about my current independent status is the ability to engage deeply with other MT experts who were previously off limits because competing MT vendors don't usually chat with open hearts and open cloaks. MT is tough to do well and I think the heavy lifting should be left to people who are committed for the long run, and who are willing to play, invest and experiment in spite of regular failure. This is how humans who endure and persist, learn and solve complex problems.
This is Part 1 of a two-part post on the SYSTRAN NMT product announcement. The second part will focus on comparing NMT with RBMT and SMT and also with the latest Adaptive MT initiatives. It can be found here: Comparing Neural MT, SMT and RBMT – The SYSTRAN Perspective
Press releases are so filled with marketing-speak as to be completely useless to most of us. They have a lot of words but after you read them you realize you really don't know much more than you got from the headline. So, I recently had a conversation with Jean Senellart , Global CTO and SYSTRAN SAS Director General, to find out more about their new NMT technology. He was very forthcoming and responded to all my questions with useful details, anecdotes, and enthusiasm. The conversation only reinforced in my mind that "real MT system development" is something best left to experts, and not something that even large LSPs should dabble with. The reality and complexity of NMT development push the limits of MT even further away from the DIY mirage.
In the text below, I have put quotes around everything that I have gotten directly from SYSTRAN material or from Jean Senellart (JAS) to make it clear that I am not interpreting. I have done some minor editing to facilitate readability and "English flow" and added comments in italics within his quotes where this is done.
SYSTRAN intends to keep all the existing MT system configurations they have in addition to the new NMT options. So they will have all of the following options:
They will preserve exactly the same set of APIs and features (like the support of a user dictionary) around these new NMT modules so that these historical linguistic investments are fully interchangeable across the product line.
JAS said: "From my intuition, there will still be situations where we will prefer to continue to offer the older solutions: for instance, when we will need high-throughput on a standard CPU server, or for low-resource languages for which we already have some RBMT solution, or for customers currently using heavily customized engines." However, they expect that NMT will proliferate even in the small memory footprint environment, and even though they expect that NMT will eventually prevail, they will keep the other options available for their existing customer base.
The NMT initiative focused on languages that were most important to their customers, or was known to be difficult historically, or was currently presenting special challenges not easily solved with the legacy solutions. So as expected the initial focus was on EN<>FR, EN<>AR, EN<>ZH, EN<>KO, FR<>KO. All of these already show promise, especially the KO <> EN, FR combinations which showed the most dramatic improvements and can be expected to improve further as the technology matures.
However, DE<>EN is one of the most challenging language pairs, as Jean said: "we have found the way to deal with the morphology, but the compounding is still problematic. Results are not bad, though, but we don't have the same jump in quality yet for this language pair."
What I found the most spectacular is that the translation is naturally fluent at the full sentence level - while we have been (historically used) to some feeling of local fluency but not sounding fully right at the sentence level. Also, there are some cases, where the translation is going quite away from the source structure - and we can see some real "rewriting" going on."
Here are some examples comparing KO>EN sentences with NMT, SYSTRAN V8 (the current generation) and Google:
And here are some examples of where the NMT seems to make linguistically informed decisions and changes the sentence structure away from the source to produce a better translation.
As JAS stated: "We will be delivering high-quality generic NMT engines that will be instantly ready for "specialization" (I am making a difference with customization (which implies training) because the nature of the adaptation to the customer domain is very different with NMT)."
Also very important for the existing customer base is that all the old dictionaries developed over many years for RBMT / SMT systems will be useful for NMT systems. As Jean confirmed: "Yes - all of our existing resources are being used in the training of the NMT engines. It is worth noting that, dictionaries are not the only components from our the legacy modules we are re-using, the morphological analysis or named entity recognition are also key parts of our models."
With regard to the User Interface for the new NMT products, JAS confirmed: "the first generation will fully integrate into the current translation infrastructure we have - we had to replace of course the back-end engines, but also some intermediate middle components. However, the GUI is preserved. We have started thinking about the next generation of UI which will fully leverage the new features of this technology, and will be targeting a release next year."
The official SYSTRAN marketing blurb states the following:
So far we have not found one single paradigm that works for all languages, and each language pair seems to have its own preference. What we can observe is that unlike SMT where the nature of the parameters was numerical and not really intuitive, here it seems that we can get major improvements by really considering the nature of the language pair we are dealing with."
So do these corrective changes require re-training or is there an instant dictionary-like capability that works right away? "Yes - this is a cool new feature.We can introduce feedback to the engine, sentence by sentence. It does not need retraining, we are just feeding the extra sentence and the model instantly adapts. Of course, the user dictionary is also a quick and easy option. The ability of an NMT engine to "specialize" very easily and even to adapt from one single example is very impressive."
"The most satisfying result, however, is that the human evaluation is always confirming the results - for instance for the same language pair shown below - when doing pair-wise human ranking we obtained the following results. (RE is human reference translation, NM is NMT, BI is Bing, GO is Google, NA is Naver, and V8 our current generation). It reads "when a system A was in a ranking comparison with a system B - or reference), how many times was it preferred by the human?"
"What is interesting in the cross comparison is that when we rank engines by the pair - When we blindly show a Google and V8 translation we see which one the user prefers. The most interesting row, however, is the second one:
RE BI GO NA V8
When comparing NMT output with the human reference translation, 46% of the time NMT is preferred (which is not bad, that means about one sentence out of two, the human does not prefer the Reference HT over NMT!), when comparing NMT and Google - 74% of the time, the preference goes to NMT, etc..."
"Artificial neural networks have a terrific potential but they also have limitations, particularly to understand rare words. SYSTRAN mitigates this weakness by combining artificial neural network and its current terminology technology that will feed the machine and improve its ability to translate."
"It is important to point out that graphic processing units (GPUs) are required to operate the new engine. Also, to quickly make this technology available, SYSTRAN will provide the market with a ready-to-use solution using an appliance (that is to say hardware and software integrated into a single offering). In addition, the overall trend is that desktops will integrate GPUs in the near future as some smartphones already do (the latest iPhone can manage neural models). As [server] size is becoming less and less of an issue, NMT engines will easily be able to run locally on an enterprise server."
As mentioned earlier there are still some languages where the optimal NMT formula is still being unraveled e.g. DE <> EN but these are still early days and I think we can expect that the research community will zero in on these tough problems, and at some point at least small solutions will be available even if complete solutions are not.
"We have several beta-users - but two of them are most significant. For the first one, our goal is to translate a huge tourism related database from French to English, Chinese, Korean, and Spanish. We intend to use and publish the translation without post-editing. The challenge was to introduce support for named entity recognition in the model - since geographical entities were quite frequent [in the content] and a bit challenging for NMT. The best model was a generic model, meaning that we did not even have to adapt to a tourism model - and this seems to be a general rule, while in previous generation MT, the customization was doing 80% of the job, for NMT, the customization is only interesting and useful for slight final adaptation.
The second [use case]- is about technical documentation in English>Korean for an LSP. The challenge was that the available "in-domain" data was only 170K segments, which is not enough to train a full engine, but seems to be good enough to specialize a generic engine."
From everything I understand from my conversations, SYSTRAN is far along the NMT path, and miles ahead in terms of actually having something to show and sell, relative to any other MT vendor . They are not just writing puff pieces about how cool NMT is, to suggest some awareness of the technology. They have tested scores of systems and have identified many things that work and many that don't. Like many innovative things in MT, it takes at least a thousand or more attempts before you start developing real competence.They have been carefully measuring the relative quality improvements with competitive alternatives, which is always a sign that things are getting real. The product is not out yet, but based on my discussions so far, I can tell they have been playing for awhile. They have reason to be excited, but all of us in MT have been down this path before and as many of us know, the history of MT is filled with empty promises. As the Wolf character warns us (NSFW link, do NOT click on it if you are easily offended) in the movie Pulp Fiction after fixing a somewhat impossible problem, let's not get carried away just yet. Let's wait to hear from actual users and let us wait to see how it works in more production use scenarios before we celebrate.
The goal of the MT developer community has always been to get a really useful automated translation, in a professional setting, since perfection it seems is a myth. SYSTRAN has seriously upped their ability in being able to do this. They are getting continuously better translation output from the machine. If I were working with an enterprise with a significant interest in CJK <> E content, I would definitely take a closer look, as I have also gotten validation from Chris Wendt at Microsoft on their own success with NMT on J <>E content. I look forward to hearing more feedback about the NMT initiative at SYSTRAN, and if they keep me in the loop I will share it on this blog in future. I encourage you to come forward with your questions as it is a great way to learn and get to the truth, and Jean Senellart seems willing and able to share his valuable insights and experience.
This is Part 1 of a two-part post on the SYSTRAN NMT product announcement. The second part will focus on comparing NMT with RBMT and SMT and also with the latest Adaptive MT initiatives. It can be found here: Comparing Neural MT, SMT and RBMT – The SYSTRAN Perspective
Press releases are so filled with marketing-speak as to be completely useless to most of us. They have a lot of words but after you read them you realize you really don't know much more than you got from the headline. So, I recently had a conversation with Jean Senellart , Global CTO and SYSTRAN SAS Director General, to find out more about their new NMT technology. He was very forthcoming and responded to all my questions with useful details, anecdotes, and enthusiasm. The conversation only reinforced in my mind that "real MT system development" is something best left to experts, and not something that even large LSPs should dabble with. The reality and complexity of NMT development push the limits of MT even further away from the DIY mirage.
In the text below, I have put quotes around everything that I have gotten directly from SYSTRAN material or from Jean Senellart (JAS) to make it clear that I am not interpreting. I have done some minor editing to facilitate readability and "English flow" and added comments in italics within his quotes where this is done.
The New Product Line
JAS clarified several points about the overall evolution of the SYSTRAN product line.SYSTRAN intends to keep all the existing MT system configurations they have in addition to the new NMT options. So they will have all of the following options:
- RBMT :- the rule-based legacy technology
- SMT :- Moses-based generation of engines that they have released for some language pairs over the last few years
- SPE :- Statistical Post-Editing translation engines - that were introduced in 2007 as the first implementation combining Rule-Based plus Phrase-Based Statistical systems.
- NMT:- is the purely neural machine translation engines that they just announced.
- NPE :- stands for « Neural Post-Editing » and it is the replication of what they did in SPE using Phrase-Based machine translation, but now using Neural Machine Translation instead of SMT for the second step in the process. They are now using a neural network to correct and improve the output of a rule-based engine.
They will preserve exactly the same set of APIs and features (like the support of a user dictionary) around these new NMT modules so that these historical linguistic investments are fully interchangeable across the product line.
JAS said: "From my intuition, there will still be situations where we will prefer to continue to offer the older solutions: for instance, when we will need high-throughput on a standard CPU server, or for low-resource languages for which we already have some RBMT solution, or for customers currently using heavily customized engines." However, they expect that NMT will proliferate even in the small memory footprint environment, and even though they expect that NMT will eventually prevail, they will keep the other options available for their existing customer base.
The NMT initiative focused on languages that were most important to their customers, or was known to be difficult historically, or was currently presenting special challenges not easily solved with the legacy solutions. So as expected the initial focus was on EN<>FR, EN<>AR, EN<>ZH, EN<>KO, FR<>KO. All of these already show promise, especially the KO <> EN, FR combinations which showed the most dramatic improvements and can be expected to improve further as the technology matures.
However, DE<>EN is one of the most challenging language pairs, as Jean said: "we have found the way to deal with the morphology, but the compounding is still problematic. Results are not bad, though, but we don't have the same jump in quality yet for this language pair."
The Best Results
So where have they seen the most promising results? As Jean said: "The most impressive results I have seen are in complicated language pairs like English-Korean, however, even for Arabic-English, or French-English the difference of quality between our legacy engines, online engines, and this new generation is impressive.What I found the most spectacular is that the translation is naturally fluent at the full sentence level - while we have been (historically used) to some feeling of local fluency but not sounding fully right at the sentence level. Also, there are some cases, where the translation is going quite away from the source structure - and we can see some real "rewriting" going on."
Here are some examples comparing KO>EN sentences with NMT, SYSTRAN V8 (the current generation) and Google:
And here are some examples of where the NMT seems to make linguistically informed decisions and changes the sentence structure away from the source to produce a better translation.
The Initial Release
When the NMT technology is released in October, SYSTRAN expects to release about 40 language pairs (mostly European and Major Asian languages related to English and French) with an additional 10 still in development to be released shortly after.As JAS stated: "We will be delivering high-quality generic NMT engines that will be instantly ready for "specialization" (I am making a difference with customization (which implies training) because the nature of the adaptation to the customer domain is very different with NMT)."
Also very important for the existing customer base is that all the old dictionaries developed over many years for RBMT / SMT systems will be useful for NMT systems. As Jean confirmed: "Yes - all of our existing resources are being used in the training of the NMT engines. It is worth noting that, dictionaries are not the only components from our the legacy modules we are re-using, the morphological analysis or named entity recognition are also key parts of our models."
With regard to the User Interface for the new NMT products, JAS confirmed: "the first generation will fully integrate into the current translation infrastructure we have - we had to replace of course the back-end engines, but also some intermediate middle components. However, the GUI is preserved. We have started thinking about the next generation of UI which will fully leverage the new features of this technology, and will be targeting a release next year."
The official SYSTRAN marketing blurb states the following:
"SYSTRAN exploits the capacity NMT engines have to learn from qualitative data by allowing translation models to be enriched each time the user submits a correction. SYSTRAN has always sought to provide solutions adjusted to the terminology and business of its customers by training its engines on customer data. Today SYSTRAN offers a self-specialized engine, which is continuously learning on the data provided."
Driving MT Engine Improvements
Jean also informed me that NMT has a simple architecture but the number of options available to tune the engines are huge and he has not found one single approach that is suitable for all languages. Options that can make a significant difference include, "type of tokenization, the introduction of additional features for instance for guiding the alignment, etc...So far we have not found one single paradigm that works for all languages, and each language pair seems to have its own preference. What we can observe is that unlike SMT where the nature of the parameters was numerical and not really intuitive, here it seems that we can get major improvements by really considering the nature of the language pair we are dealing with."
So do these corrective changes require re-training or is there an instant dictionary-like capability that works right away? "Yes - this is a cool new feature.We can introduce feedback to the engine, sentence by sentence. It does not need retraining, we are just feeding the extra sentence and the model instantly adapts. Of course, the user dictionary is also a quick and easy option. The ability of an NMT engine to "specialize" very easily and even to adapt from one single example is very impressive."
Detailed MT Quality Metrics
"What is interesting is that we get major score improvement for systems that have not been tuned for the metrics they are evaluated against - for instance, here are some results on English-Korean using the RIBES metric.""In general, we have results in the BLEU range of generally above 5 points improvement over current baselines."
"The most satisfying result, however, is that the human evaluation is always confirming the results - for instance for the same language pair shown below - when doing pair-wise human ranking we obtained the following results. (RE is human reference translation, NM is NMT, BI is Bing, GO is Google, NA is Naver, and V8 our current generation). It reads "when a system A was in a ranking comparison with a system B - or reference), how many times was it preferred by the human?"
"What is interesting in the cross comparison is that when we rank engines by the pair - When we blindly show a Google and V8 translation we see which one the user prefers. The most interesting row, however, is the second one:
RE BI GO NA V8
NM 46.4 74.5 73.9 72 63.1
When comparing NMT output with the human reference translation, 46% of the time NMT is preferred (which is not bad, that means about one sentence out of two, the human does not prefer the Reference HT over NMT!), when comparing NMT and Google - 74% of the time, the preference goes to NMT, etc..."
The Challenges
The computing requirements have been described by many as a particular challenge. Even with GPUs, training an NMT engine is a long task. As Jean says: "and when we have to wait 3 weeks for a full training, we do need to be careful with the training workflow and explore as many options as possible in parallel.""Artificial neural networks have a terrific potential but they also have limitations, particularly to understand rare words. SYSTRAN mitigates this weakness by combining artificial neural network and its current terminology technology that will feed the machine and improve its ability to translate."
"It is important to point out that graphic processing units (GPUs) are required to operate the new engine. Also, to quickly make this technology available, SYSTRAN will provide the market with a ready-to-use solution using an appliance (that is to say hardware and software integrated into a single offering). In addition, the overall trend is that desktops will integrate GPUs in the near future as some smartphones already do (the latest iPhone can manage neural models). As [server] size is becoming less and less of an issue, NMT engines will easily be able to run locally on an enterprise server."
As mentioned earlier there are still some languages where the optimal NMT formula is still being unraveled e.g. DE <> EN but these are still early days and I think we can expect that the research community will zero in on these tough problems, and at some point at least small solutions will be available even if complete solutions are not.
Production User Case Studies
When asked about real life production use of any of the NMT systems Jean provided two key examples."We have several beta-users - but two of them are most significant. For the first one, our goal is to translate a huge tourism related database from French to English, Chinese, Korean, and Spanish. We intend to use and publish the translation without post-editing. The challenge was to introduce support for named entity recognition in the model - since geographical entities were quite frequent [in the content] and a bit challenging for NMT. The best model was a generic model, meaning that we did not even have to adapt to a tourism model - and this seems to be a general rule, while in previous generation MT, the customization was doing 80% of the job, for NMT, the customization is only interesting and useful for slight final adaptation.
The second [use case]- is about technical documentation in English>Korean for an LSP. The challenge was that the available "in-domain" data was only 170K segments, which is not enough to train a full engine, but seems to be good enough to specialize a generic engine."
From everything I understand from my conversations, SYSTRAN is far along the NMT path, and miles ahead in terms of actually having something to show and sell, relative to any other MT vendor . They are not just writing puff pieces about how cool NMT is, to suggest some awareness of the technology. They have tested scores of systems and have identified many things that work and many that don't. Like many innovative things in MT, it takes at least a thousand or more attempts before you start developing real competence.They have been carefully measuring the relative quality improvements with competitive alternatives, which is always a sign that things are getting real. The product is not out yet, but based on my discussions so far, I can tell they have been playing for awhile. They have reason to be excited, but all of us in MT have been down this path before and as many of us know, the history of MT is filled with empty promises. As the Wolf character warns us (NSFW link, do NOT click on it if you are easily offended) in the movie Pulp Fiction after fixing a somewhat impossible problem, let's not get carried away just yet. Let's wait to hear from actual users and let us wait to see how it works in more production use scenarios before we celebrate.
The goal of the MT developer community has always been to get a really useful automated translation, in a professional setting, since perfection it seems is a myth. SYSTRAN has seriously upped their ability in being able to do this. They are getting continuously better translation output from the machine. If I were working with an enterprise with a significant interest in CJK <> E content, I would definitely take a closer look, as I have also gotten validation from Chris Wendt at Microsoft on their own success with NMT on J <>E content. I look forward to hearing more feedback about the NMT initiative at SYSTRAN, and if they keep me in the loop I will share it on this blog in future. I encourage you to come forward with your questions as it is a great way to learn and get to the truth, and Jean Senellart seems willing and able to share his valuable insights and experience.
Comments
Post a Comment