top of page

STAY CONSERVATIVE OR EMBRACE THE FUTURE? ABOUT TRANSLATIONS, STATISTICS AND COMPUTERS

We are living in an increasingly digitalized world, where computers have been changing and are continuously changing our lives. Every day, we need to adjust to new technological improvements, to be competitive and to keep up. I believe very few people would argue with that.

Well, given this background, the translation process has not been spared from the need to adjust to improvements. A lot has changed ever since the discovery of the Rosetta stone, back in 1799, if we come to think that nowadays we use CAT tools for making our translator’s lives easier.

But what would be the link between translations, statistics and computers? Is there any relation whatsoever between the three?

Let’s see, basically, what is the reasoning behind automated translation. Well, supposedly, you write a program that examines each sentence and tries to understand the grammatical structure – more specifically, it searches for verbs, nouns that go with verbs and adjectives that modify the nouns. Once the program “understands” the structure, it converts the sentence structure into the target language and uses a dictionary for the selected languages to translate individual words – and puff! – the magic happens.

This was the initial approach, before an IBM team drastically changed it, in the late 1980s. The team gave up the lengthy lists of rules expressing linguistic structure and created a statistical model. They did it by getting a copy of the transcripts of the Canadian Parliament, from a collection known as Hansard (by Canadian law, Hansard is available in both English and French). They then used a computer to compare the corresponding English and French texts and spot relationships.

Michael Nielsen explains this in his article by providing an example: “For instance, the computer might notice that sentences containing the French word bonjour tend to contain the English word hello in about the same position in the sentence. The computer didn’t know anything about either word—it started without a conventional grammar or dictionary. But it didn’t need those. Instead, it could use pure brute force to spot the correspondence between bonjour and hello. […] More precisely, the computer used Hansard to estimate the probability that an English word or phrase will be in a sentence, given that a particular French word or phrase is in the corresponding translation. It also used Hansard to estimate probabilities for the way words and phrases are shuffled around within translated sentences.” Furthermore, he concludes that “Using this statistical model, the computer could take a new French sentence — one it had never seen before — and figure out the most likely corresponding English sentence. And that would be the program’s translation.”

Most traditionally-trained linguists would find this idea very intriguing and alarming, given that it does not consider – not in the very least – what is already known about the structure of language and that the corresponding statistical models do not seem to give a fig about the meaning of the sentence. Maybe one of the most meaningful examples in this sense was Noam Chomsky declaring himself skeptical about this notion of success viewed as “approximating unanalyzed data”. Let alone the exaggerated rumors that soon, there will be no need for human translation, since computers will take over.

However, the IBM team discovered that the approach of statistical models works better than that of older models focused on complex concepts of linguistics. It may come as no surprise that systems like the “infamous” Google Translate are structured on similar ideas.

Furthermore, it is no secret that most computer speech recognition systems – you both iOS or Android users might relate to this – are based on statistical models of human language. And, according to Mr. Nielsen here, online search engines make use of statistical models to understand search queries and find the best responses. So, how long has it been since you last talked to Siri and she intelligibly replied to you?

To conclude, yes, translations and linguistics are not only about letters, words and sentences… I remember that one of the first things they tell you during the linguistics lectures at the university is that languages are evolving like living organisms: they are continuously shifting and changing. As striking as it may seem, maybe it’s high time for us, translators, to at least keep up with the latest developments and be open to new tools and ways of adjusting to new technologies that may help us in our work, instead of fearing that artificial intelligence and computers may take over our jobs. I often wonder: if language is continuously evolving, technology is continuously developing and improving, do we actually afford to maintain an old-school type, dismissive stance towards computers, CAT tools and, implicitly, statistical models? Wouldn’t it be better for us to embrace an open, “how-can-technology-help-me” attitude?

I leave it to you to answer this; I know I have already made my choice.

Sources:

Comments


bottom of page