Localization and Language Quality
This is a guest post by David Snider, Globalization Architect at LinkedIn - Reprinted here with permission. I thought the article was interesting because it points out that MT quality is now quite adequate for several types of Enterprise applications, even though MT might very well be a force that influences and causes the "crapification" (a word I wish I had invented) of overall language quality. While this might seem like horror to some, for a lot of business content that has a very short shelf life, and value only if the information is current, this MT quality is sufficient for most of the people who have in interest in the specific content. While David thinks that the language quality will improve, I doubt very much if much of this MT content will improve much beyond what is possible by the raw technology itself. Business content that has value for a short time and then is forgotten simply cannot justify the effort to raise it the level of "proper" written material.
If you go to the original post there are several comments that are worth reading as well.
People have been complaining recently about the decline of language quality (actually, they’ve been complaining for decades – or make that centuries!) I have to admit that I sympathize: I’m from a generation that was taught to value good writing, and I still react with horror when I see obvious errors, like using “it’s” instead of “its”, or confusing ‘their’, ‘there’ and ‘they’re’. (I’m even more horrified when I make mistakes myself, which happens more than I like to admit.)
But for my son’s generation? Not so much. Grammar, spelling, and punctuation aren’t that important to them; what matters is whether the other person understands them, and vice versa. My son is already 25 (wow, time flies!), so there’s another generation coming up behind him that’s even less concerned about ‘good’ writing; in fact, this new generation is so accustomed to seeing bad writing that for the most part they don’t even realize there are errors. This makes for a vicious circle: people grow up surrounded by bad writing, so they, in turn write badly, which in turns exacerbates the problem. I’ve heard this referred to as the ‘crapification of language’.
Informal communications: email, texting, twitter – they all favor speed, and when people are in a hurry quality usually suffers.
Machine-generated content: this includes content that’s created by computers – for example, Machine Generated support content created by piecing together user traffic about problems – as well as Machine Translated content. Machine Generated content, and especially MT content is, as we localization people know, often of very poor quality.
MT engine improvements: MT quality has steadily improved over the past 50 years (yes it’s been around at least that long!) Major improvements, like statistical MT and now neural MT, seem to occur every 10 years or so. Perfect human-quality MT is still ‘only 5 years out’ and will undoubtedly continue to be so for a long time, but quality is steadily improving.
User expectations: The good news for MT is that due to the crapification of language the expectations bar has been coming down, and people are much more willing to accept raw MT, warts and all. Despite the quality problems, more & more people are using web-based MT services like Google Translate, Bing Translator, etc., to read and write content in other languages. As with texting above, they’re more concerned with content than with form: they’re OK with errors as long as they can understand the content or at least get the gist of it. This seems to be true even for countries that have traditionally had a high bar for language quality, like Japan and France. As shown in the chart below, we’ve already passed the point that raw MT is acceptable for some types of content. (Note that this chart is purely illustrative and is not based on hard data.)
If you go to the original post there are several comments that are worth reading as well.
-------
People have been complaining recently about the decline of language quality (actually, they’ve been complaining for decades – or make that centuries!) I have to admit that I sympathize: I’m from a generation that was taught to value good writing, and I still react with horror when I see obvious errors, like using “it’s” instead of “its”, or confusing ‘their’, ‘there’ and ‘they’re’. (I’m even more horrified when I make mistakes myself, which happens more than I like to admit.)
But for my son’s generation? Not so much. Grammar, spelling, and punctuation aren’t that important to them; what matters is whether the other person understands them, and vice versa. My son is already 25 (wow, time flies!), so there’s another generation coming up behind him that’s even less concerned about ‘good’ writing; in fact, this new generation is so accustomed to seeing bad writing that for the most part they don’t even realize there are errors. This makes for a vicious circle: people grow up surrounded by bad writing, so they, in turn write badly, which in turns exacerbates the problem. I’ve heard this referred to as the ‘crapification of language’.
Why is this happening?
Ease of publishing: in the old days, the cost of publishing content - typesetting it, grinding up trees and making paper, printing the content onto the paper, binding it, shipping it to a store and selling it - was immense. For this reason most published content was thoroughly edited and proofread, as there was no second chance. So if you read printed content like books, magazines and newspapers, you were generally exposed to correct grammar, spelling and punctuation. Since most of what people read was correctly written (even if not always well-written), people who read a lot generally learned to write well. But now anyone can create and publish content, with no editing or proofreading. The result is just what you’d expect.
Informal communications: email, texting, twitter – they all favor speed, and when people are in a hurry quality usually suffers.
Machine-generated content: this includes content that’s created by computers – for example, Machine Generated support content created by piecing together user traffic about problems – as well as Machine Translated content. Machine Generated content, and especially MT content is, as we localization people know, often of very poor quality.
What does this mean for Localization?
Being in the localization business myself, I want to tie this in to the effect on localization. In some ways this ‘crapification’ works against us: garbage in garbage out, after all, and if the source content is badly written then it’s harder for the translators to do a good job, be they humans or machines. But at the same time, this can work for us – especially when it comes to Machine Translation, where there are a couple of things that are making even raw MT more acceptable:
MT engine improvements: MT quality has steadily improved over the past 50 years (yes it’s been around at least that long!) Major improvements, like statistical MT and now neural MT, seem to occur every 10 years or so. Perfect human-quality MT is still ‘only 5 years out’ and will undoubtedly continue to be so for a long time, but quality is steadily improving.
User expectations: The good news for MT is that due to the crapification of language the expectations bar has been coming down, and people are much more willing to accept raw MT, warts and all. Despite the quality problems, more & more people are using web-based MT services like Google Translate, Bing Translator, etc., to read and write content in other languages. As with texting above, they’re more concerned with content than with form: they’re OK with errors as long as they can understand the content or at least get the gist of it. This seems to be true even for countries that have traditionally had a high bar for language quality, like Japan and France. As shown in the chart below, we’ve already passed the point that raw MT is acceptable for some types of content. (Note that this chart is purely illustrative and is not based on hard data.)
Comments
Post a Comment