Well, let me introduce myself, at first. I’m a Bangladeshi, and my mother tongue is Bengali. There are people in India too whose mother tongue is Bengali. Actually, thanks to the politicians, Bengal was historically divided in two, and when the colonial British left India in 1947, they left behind a partitioned India. One part became Pakistan and the other remained as India. A plebiscite held before the partition, and based on the result, the western part of Bengal, with the majority of the population being Hindu, became part of India and the Muslim-majority eastern part joined Pakistan with a new name; East Pakistan. Later, in 1971, East Pakistan became the new nation of Bangladesh, followed by a 9-month War of Independence.
Well, it’s not a history blog. What I mention is necessary to understand what I’m going to write about in the succeeding sections. So, readers, please excuse me for the way I’ve started the blog. I’m actually here to tell you something on translation. Let me come to the point very quickly.
My 5th grader son is fond of watching some specific programs on TV that are either aired by an Indian version of Discovery Channel, or by Asian TV. These programs were originally made in English but were repackaged in dubbed Bengali for the West Bengal and Bangladesh markets. He often says, “Well, Pa, why do these people speak Bengali in this way?” One day, when I replied, “What’s your problem?,” he then said, “their Bengali sounds weird. We’re not speaking in that way.”
Well, it’s not my son’s reaction alone but many people in my community feel as such. So, what’s the problem, really? Translators aren’t good enough to capture the language? Not really. The elephant in the room is something different. I’m here to discuss that. But before doing so, I want to tell you something more about myself again, and for that I’m begging to be excused again.
I’m a linguist by training, and as such, translation has been my companion since the days of my undergrad program in English Language and Literature. I had to translate literary pieces as part of my course. Those were the days when translation (from English to Bengali or Bengali to English) used to be done manually and handwritten from printed texts. I had never been one of the best translators but I had the opportunity to learn from my teachers and editors. Though I’ve translated many pieces of diverse discipline in between, I entered into the world of online translation after retiring from active service forty years later. I’m now seeing how “cheap” translation has become; I’m seeing why it has been so, and possibly how humble it has becomes thank to the advent of software-based machine translation.
Today, I see that translating text online is as “easy” as clicking a button. But will automated services ever really be enough to break down the boundaries between us in all language pairs? Not likely; esp. when the translation is English <> Bengali. Let me explain why.
Yes, I know that one of the most popular and successful applications of artificial intelligence is machine translation, and how it has to go through many ups and downs after many incidents that completely mistranslated what is wanted. But today, machine translation has become statistical, and we see statistical machine translation that leverages the vast amount of available translated corpuses in major language pairs that have access to a lot of texts from official governments and also from embassies, and United Nations that publish texts both in English and other languages.
As these sources contain good quality texts, some languages, especially the major global languages used by the UN systems, were powered by the machine translation systems. While there is room for improvement still, machine translation of some language pairs has made significant progress and the translators of those pairs use those resources extensively, utilizing various CAT tools.
But what about other languages, such as Bengali? Though, for example, Google Translate claims that it can handle over 100 languages, let’s see how it handles English to Bengali. Our model test case (case #1) would use this two-slot sentence that is generated automatically by converting voice to text:
Now, let’s see what it brings when I used subtitling software to translate this two-slot sentence humanly.
But, to my judgment, the sentence can be better translated simply like this, which is more close to what we the Bengalis usually say: আমরা ব্রিজটি ভাঙ্গার শেষ বড় বাধাটি অতিক্রম করতে চেয়েছিলাম।
Take another example (case #2) where the source text in English was taken from a published article and translation was done humanly (wasn’t done using software and the English order in the sentence was avoided to make the Bengali translation more reader-friendly). The English passage was:
"Most days, Julio César Imperatori’s tiny restaurant in Old Havana is packed with tourists who have flocked to the Cuban capital since the Obama administration moved to end a half century of enmity and normalized relations with the Communist island."
And the Bengali translation humanly done was: “ওবামা প্রশাসন যখন কিউবার সাথে অর্ধ-শতবর্ষের শত্রুতার সম্পর্ক শেষ করার পদক্ষেপ গ্রহন করে ও কম্যুনিস্ট দ্বীপটির সাথে সম্পর্ক স্বাভাবিক করার উদ্যোগ নেয় তখন থেকে পর্যটকরা দলবেঁধে সেদেশটির রাজধানীর দিকে ছুটতে থাকে, ও প্রায় সব দিনই জুলিও সিজার ইম্পারেতরির পুরাতন হাভানায় অবস্থিত ছোট্ট রেস্তোরাঁটি পর্যটকে ভরা থাকে।“
Had the translation been done keeping the English order, the Bengali translation would be: “সব দিনই জুলিও সিজার ইম্পারেতরির পুরাতন হাভানায় অবস্থিত ছোট্ট রেস্তোরাঁটি পর্যটকে ভরা থাকে যারা দলবেঁধে কিউবার রাজধানীর দিকে ছুটে আসছিল ঠিক তখন থেকে যখন ওবামা প্রশাসন অর্ধ-শতবর্ষের শত্রুতার সম্পর্ক শেষ করার পদক্ষেপ গ্রহন করে ও কম্যুনিস্ট দ্বীপটির সাথে সম্পর্ক স্বাভাবিক করার উদ্যোগ নেয়।”
Had it been machine translated (using Google Translate), the translation would be: "বেশিরভাগ দিন, প্রাচীন হাভানা হুলিও সিজার Imperatori ক্ষুদ্র রেস্টুরেন্টে পর্যটক যারা কিউবার রাজধানী ভিড় করত পর ওবামা প্রশাসন শত্রুতা ও কমিউনিস্ট দ্বীপ সঙ্গে সাধারণ সম্পর্কের একটি অর্ধ শতাব্দীর শেষ সরানো সঙ্গে বস্তাবন্দী হয়."
You’ve got the examples of the machine translations in these cases, and the judgment is yours. But my impression is that both the translations are syntactically and semantically wrong. Regarding two other translation examples, what I can say is that, in these examples, there’s no absolute right or wrong but aesthetic qualities of the target language may have been better kept when the ‘order’ of writing sentences in that language is honored.
Order is not only important in language but also in math, you know, and that’s pretty important. Just take this case, for example. This math is a simple problem 6÷2(1+2) but what the answer one gets depends on how s/he calculates it. We know that the order of operations rule we learned during our childhood days, BODMAS, says we should solve a problem by working through the bracket, then the order, the division and multiplication, followed by addition and subtraction. If we go by avoiding the order of the problem, we would get 6÷2(1+2) = 6÷ 2(3) = 6÷6 = 1 (the wrong result). But if we go by the correct order (from left to right after solving whatever inside the bracket) we would get the correct result 6÷2(1+2) = 6÷2*(1+2) = 6÷2*3 = 3*3 = 9.
But how can one maintain ‘order’ when the software only understands the lines (first thing first), and that also has character limits? So, there is the cultural conflict that arises obviously, and most of all active online translators usually go by the rules to stay in the business and worry less about the aesthetics of the sentences they write.
After all, translation outputs aren’t computer codes. So, those outputs aren’t rejected if they are done within the allocated time and character limits no matter as to whether they’re syntactically or semantically wrong.
Anyway, I’ve had similar experiences as the above while bidding on job postings on ProZ.com. I’ve seen how companies post machine-translated pieces of source materials that were also captured automatically from voice recognition (similar to the examples mentioned above but may not be limited to those only) and ask translators to proofread/edit that @INR0.50psw. (INR = Indian Rupee; 1INR = US$0.015). I also observed that most of the English <> Bengali language pair jobs originated from Indian translation companies and/or trafficked through them, and most of them can’t afford to pay more than 1INRpsw – 1.15INRpsw to the translators for translating content from English to Bengali or Bengali to English. Not only that, they also expect Indian Bengalis to bid since they’re only willing to pay in INR and unwilling to pay in FC to Bangladeshi translators (foreigners) like me who are, in fact, translating research materials at home 4BDTpsw (BDT = Bangladesh Taka; 1BDT = US$0.013). And they also like the translators to complete these jobs within a very limited time; say 12,000 words in 3 days.
I’m short of taste in writing all these things in my blog but the purpose is to let you know that, perhaps, everything is choreographed to do these jobs on a machine using this or that software, and so, most of the employers might have a preset mind that as everything will be done by machine, why should they allow translators enough time and money?
Well, I don’t want to repeat all these things time and again testing your patience. I only sat for this piece to be written to answer my son’s question as to why Bengali spoken by characters in cinemas and/or TV serials “sounds weird”. Is there any way out, then? I’m afraid that, no, there isn’t; because “translators” are, by default, inclined to use software and machines since they’re somehow able to make some quick money while translation companies are also able to make money by paying less. So, consumers would continue to swallow what suppliers give them, and unless and until English to Bengali translation becomes statistical, we’re to remain as consumers of “bad quality” artificial intelligence rather than actors.
In today’s business environment, scopes of human translation are diminishing, and so, my going online has gone awry. Now I fear that the job of translators will be no more beyond 2030.