RECENT TOPICS » View all
I am sure you have noticed that translation software seems to perform better between certain languages. For example, going from Spanish to English it's usually pretty accurate and translating German or Dutch to English is even better. But going from Japanese to English or viceversa is remarkably bad.
I have a few questions. First, why do you think this is?. And second, do you think languages that are easier to translate by machines are also easier to learn (at least for English speakers)?.
Finally, which one do you think is more accurately translated to English, Japanese or Mandarin?.
I think the two main reasons why Japanese<->English machine translation is so difficult are that the sentence structure is so completely different, and that Japanese is so highly context-dependent, not even requiring that sentences have explicitly stated subjects. (Even in languages that allow pronouns to be dropped in certain contexts, like Spanish, verb inflection makes up for that; there's no need to say 'yo hablo' because 'hablo' is already inflected for 1st-person-singular.)
Translation is a matter of understanding the structure of the original sentence, understanding how the words interact with each other within the sentence (and what they're referring to outside the sentence!) and then writing something in your target language that has the same meaning and sounds natural.
What machine translation can't do is
1) understand the meaning of the original sentence
2) write something in your target language that has the same meaning and sounds natural.
Because the Germanic languages are so close together (including English), and the Romance languages are pretty close to those, it's easy to do dictionary-based lookup + fine-tuning based on collocations and comparisons to other translation samples. For languages that aren't related, you can't rely on that. You actually have to understand the structure of a sentence.
For example:
そのたびに、私が取得したのは何の免許だったのかなと、免許証を取り出して思わず確かめてしまう。
Each time [that I rub up against guard rails or bump into road signs], I end up automatically taking out my license just to check -- what kind of license did I get, anyway?
Here's Google's translation:
Each time, did I get and I guess it's what was the license, make sure they instinctively take out a license.
Without understanding the sentence structure, how do you know that 免許 is the object of 取得, or that 免許証 is the object of 取り出して and 確かめて, or that 思わず modifies 確かめて? And how do you know that the subject of 取り出して is "I" rather than "they"?
So Google ends up translating like a 1st-year student who just looks up every word in the dictionary and makes up a sentence based on that, except that Google doesn't know how to write good English, either.
This is just speculation, but, I don't think that you can get good Japanese<->English translation without a computer that actually understands natural human language. And I think we'll be very lucky if we can make that much progress on artificial intelligence in the next 30-40 years.
(Or unlucky. I for one welcome our new robot overlords?)
Last edited by Fillanzea (2012 July 19, 9:09 am)
well if you're curious korean <-> Japanese is better than Japanese <-> english becuase of the grammar and the many many words that are based on chinese. I read somewhere it's like 75% is same or common or whatever.
What about Chinese <-> English?
Computational linguistics is still a fresh branch, so it hasn't gotten around to being perfected yet. Still, if you compared how google translate works now to, say, how Babelfish worked 10 years ago, the improvement is astounding. It's still not perfect by any means, and most google translated documents have bad or unnatural grammar; even a beginner can find mistakes in the translations easily.
English-German-Spanish etc work a bit better than the rest since they get way more attention thanks to there being more speakers working on it; the moment you want a translation involving less common languages the translation will be so sub-par it will be almost unintelligible. Since Japanese fits the part, plus the sentence structure is considerably different et all, it's pretty easy to see why the accuracy is so low.
IMHO if automatized translations keep advancing at the current rate it would not be crazy to assume that in a decade or two it will reach the level a human translator can get to... as if we didn't have enough jobs being taken over by machines.
As to your second question... it's not about the ease of learning; it's about the ease of converting along with the number of people working on the particular set of languages. The more rule-based the language the easier it is for a machine to learn it; that doesn't mean it takes less time for a human to learn it, since a human will have to manually learn all those rules whereas a machine just has them in its database. Same for vocab; 5000 words take about the same amount of time to learn in any language, but a computer will not have any issue with learning them.
inb4 5000 words in Spanish take less time than 5000 words in Japanese thanks to similarities, 5000 words with common roots will take more time to learn than 5000 loanwords in Japanese (la chimenea= learning the gender+word, though it is easier since you know the root... ベッド is an instant learn) so let's not get started on that.
Zgarbas: you don't attack Emily's second point which is why I think it will never happen. To what rate has MT advanced exactly?
[anecdote/] When I was at uni I had an acquaintance in the gym who had served in Vietnam and was in high up in pc maintenance for Liverpool. I asked him about MT in the future - "the future? they can do it now we don't need ******* humans."
I asked the guy in charge of languages at uni "no, never"
The native Japanese lecturer who taught me said "today quite good, probably never replace humans"
[anecdote
If we look at something like chatbots over the past 10-20 years, there has been some huge progress there. Its now possible for English speaking machines to actually fool humans into thinking they are real, quite a lot of the time. Now, I'm not sure how much of that correlates with the ability to make a good machine translation, but it would seem to indicate that computers are getting better and better at understanding what our language means.
Also, google has a huge database of what correct Japanese looks like, basically since they have access to just about every Japanese website, ever. With that information, its certainly possible for them to make some improvements beyond what we see today. They are also trying some crowdsourcing techniques. Did you ever notice when doing a machine translation that it may give you an option to suggest a better translation?
There is a lot of research going into this, and I think we will definitely see lots of improvements in the future. As good as a human though? Never. But I'd hope to be proven wrong.
Having access to every Japanese website ever? Sure.
Being able to use the information properly and sort out the good from the bad (legit sources vs. poorly-written, misspelled, filled with bad grammar et co) is a different aspect though.
Also, regarding the replacing translators with Google Translate... even if the quality is lacking horribly companies are still using Google Translate for a good part of their papers(at least the ones which are not too important), thus already affecting the translation industry. My sister's company would use MT for pretty much all their documents and only hire translators to proofread it, which from the start means less pay.
What makes you think that once MT has evolved to the point where it can replace a barely-average translator(doable soon given current progress) companies will bother hiring translators for anything but the most important documents (legal stuff, mostly)?
Zarxrax wrote:
If we look at something like chatbots over the past 10-20 years, there has been some huge progress there. Its now possible for English speaking machines to actually fool humans into thinking they are real, quite a lot of the time.
I'm not sure what chatbots you've been talking to... the technology still sucks and probably always will until an Einstein of AI comes along. They all reply with irrelevant, completely off the wall comments and show almost no understanding of what you're talking about. It's obvious they're running off of crude statistical text analysis. A program that could pass the Turing test would need human level AI plus a deep understanding of the world we live in, and that's still the stuff of science fiction.
Last edited by dizmox (2012 July 21, 10:29 am)
dizmox wrote:
the technology still sucks and probably always will until an Einstein of AI comes along.
Right...
http://www.nytimes.com/2011/02/17/scien … atson.html
...but that's just a sophisticated search engine... it doesn't have any concept of what's being talked about.
http://en.wikipedia.org/wiki/Watson_(computer)
Last edited by dizmox (2012 July 21, 6:35 pm)
Before 1997, when Deep Blue beat then Chess World Champion Garry Kasparov, people were speculating that computers would never achieve understanding of strategical concepts of the game and so, a world class human player would always win the game against the machine (on the premise that human's understanding of the game would give him an edge).
Just like Watson, Deep Blue was a big search engine with enormous processing capabilities but these days nobody questions the way the computers play the game. Advances since Deep Blue occurred in both processing power and better algorithms, and the once fuzzy strategic concepts are now part of those algorithms. The result is that 15 years after 1997, humans don's stand a chance against the machine and nobody questions computer's supremacy in playing the game.
Whether Watson, like Deep Blue, is a search engine or not will hardly matter when in future commonly available machines used on daily basis by ordinary men will be able to run programs similar to what Watson was loaded with, just like today your PC can run Junior, Shredder, Rybka and others.
dizmox wrote:
A program that could pass the Turing test would need human level AI plus a deep understanding of the world we live in, and that's still the stuff of science fiction.
http://www.newscientist.com/blogs/onepe … ty-wi.html
1% away. Pretty damn close.
Off topic but I feel like getting some facts right.
http://www.newscientist.com/blogs/onepe … ty-wi.html
“Held at Bletchley Park near Milton Keynes, UK, where Turing cracked the Nazi Enigma code during the second world war”
The author of this article should have done his homework rather than rely on some anecdotal evidence or misinformed sources. Turing was part of British team working on breaking Enigma and they successfully were breaking the encrypted messages during WWII, but as for who actually made a breakthrough in cracking Enigma it was Poles.
http://en.wikipedia.org/wiki/Polish_Cipher_Bureau
“Five weeks before the outbreak of World War II, on 25 July 1939, in Warsaw, the Polish Cipher Bureau revealed its Enigma-decryption techniques and equipment to representatives of French and British military intelligence, which had been unable to make any headway against Enigma.”
This article deals in general with what was involved in cracking Enigma:
http://en.wikipedia.org/wiki/Cryptanaly … the_Enigma
Inny Jan wrote:
Just like Watson, Deep Blue was a big search engine with enormous processing capabilities but these days nobody questions the way the computers play the game.
I think there's a a considerable difference between a chess machine and Watson. Chess AIs are programmed to "understand" the rules of chess and exhaust the various possibilities. Watson is really just a really powerful search engine that finds a bunch of data for a search term and responds with a set of words that pop up a lot.
Watson is a far cry from something that could pass a Turing test.
Last edited by JimmySeal (2012 July 22, 12:21 am)
JimmySeal wrote:
Inny Jan wrote:
Just like Watson, Deep Blue was a big search engine with enormous processing capabilities but these days nobody questions the way the computers play the game.
I think there's a a considerable difference between a chess machine and Watson. Chess AIs are programmed to "understand" the rules of chess and exhaust the various possibilities. Watson is really just a really powerful search engine that finds a bunch of data for a search term and responds with a set of words that pop up a lot.
Watson is a far cry from something that could pass a Turing test.
Despite of what it may seem like, most (if not all) chess programs are search engines as well. In any position they generate all possible candidate moves and based on evaluation of those candidates they either reject them or pass them into subsequent iterations of evaluations. There is no fixed set of answers to look for, I agree, but nevertheless the questions (candidate moves) are posed and the higher scoring one (after a search) is selected. The strength of the chess engine is proportional to depth of its search which is higher when a) processing power is higher, b) less candidate moves are analysed because the engine has a better pruning algorithm (search is more efficient).
Even if you argue that this is still not a classical understanding of searching, there is also this part of the game (endgame) where there are search tables involved. For endgame, where there is small number of pieces on board, there are tables that contain information of whether the position a win, draw or loss. Analysing the endgame, from machine perspective, is just matching the current position to one of those from tables and selecting move that has the highest, pre calculated score.
As for Watson, I would imagine that when working out the answer, after initial search, it still would have several options to choose from for the final answer, so maybe the similarities with chess engines are higher then you would thing. Anyway, I don't claim to know how Watson works, so anything but a paper on its inner workings is just a speculation.
(not a programmer, just my two cents. I might be wrong about this).
I always thought it was like this.
Chess machine:
1. calculate current situation (all pieces' movement on the chessboard)
2. calculate all possible movements for both players (all pieces' ability to move according to implemented rules as well as all empty spaces)
3. calculate best possible movement? (this is where I get lost, but I assume it takes opponent's possible movements into account, as well as highest increase in win%, maybe with special moves and situations into its database it can see the opportunity when it has one). So, I assume that it will know to avoid situations which could lead to a chess-mate, as well as know when to do unusual movements for the sake of a quicker win based on the opponents' mistakes.
(e.g. If the AI sees an opening for the 3-step win, instead of pursuing a normal game it will probably do nonstandard movement for the sake of that 3-step win maneuver).
Of course, I'm only assuming that's what it does in order to win a chess game. I could be wrong but the basic gist of it should be similar either way.
Now, let's take a translation machine.
1. It takes the given L1 words into account
2. It automatically searches their equivalents
3. It calculates the best possible way to put them in one go based on its given information regarding L2; where the verb should be, if there is any special expression involved, phrasal verb use, et co.
It sounds like a similar process, it's just that step 3 has a larger and a bit harder to input database thank its chess equivalent, as in it requires much more work, making the process harder on the people maintaining it and on the speed/memory consumption of the device; increase the number of people working on its database and make it supercomputer which can take on the load of this procress, and it might actually work. Of course, problems would occur regarding nuances and natural speech, but having all sorts of real situations in its DB would help choose the best way to handle those as well.
No?
It seems to me that computers are quickly getting better literal and idiomatic translation, and that that progress may create the impression they're nearing human levels of ability, but I don't think that is the case. Nothing I've seen suggests they've even made a dent in the biggest problem, which is their lack of ability to infer meaning from the source text. I'm not just talking about how you have to understand the source text to make the right decisions about how to express that in the target language, though that is a problem too; use of a search engine may allow the computer too find several possible ways of translating a word or phrase, but what criteria does it use to evaluate these choices? If this was a game of chess the criteria would be obvious, whether it leads to a win, a loss or a draw, but no such criteria exist for translation, so I fail to see how the computer could make a reliably good decision. (Though I admit it would likely be good enough for many situations.)
The main reason I think this is a problem is that there are often times when the target language requires that the implicit meaning of the source text be made explicit in the translation. Japanese to English translation often necessitates the inclusion of a subject where there was none, or choosing between the use of a plural or a singular, when the source text uses neither. There are many problems that can not be solved through linguistic analysis, which is what computers are getting good at, and instead require specific understanding of the source text, which as far as I'm aware, they have yet to make any progress in.
The chess analogy does demonstrate the fact that computers can exceed our expectations, and don't necessarily have to do things the same way we do, but the fact remains that the computers never learned to understand strategy, and the limitations of this are obvious when looking at other games.
Chess computers decide there moves by calculating every possible move they can make, then every reply their opponent could make, and so on until they've calculated every possible outcome of every possible move. The move that has the highest percentage of won games is considered to be the strongest move, and so the computer picks that one.
This only works because of the relatively limited number of moves that are available in chess. Games such as Go have a much larger number of possible moves and last time I checked (a few years ago), the current super computers were still unable to apply the same brute force method to beating a human opponent. Instead they used a scaled back version in which they pick random moves and then simulate random games based on those moves, but this was obviously much less effective. Most beginners could beat a home computer Go-bot on a full sized board.
Last edited by Splatted (2012 July 22, 6:12 am)
You can have the fastest computers in the world, but if the coders suck you don't get anywhere. Right now the coding for the Google J-E translation seems so sucky, I bet koohii forum members can do a better job coming up with code to do J-E translation--on a laptop. I think this is a case where coding skills and the coder's mastery of the languages trump CPU power.
Every time I use Google translate, I feel so much better about my Japanese language level, which is terrible as it stands.
Also, I look up words on Google translate and, behold, it spits out EDICT definitions.
I guess this is straying away from the topic of machine translation, I don't think anything nearly as powerful as an AI that could beat the Turing test would be needed for 99% accurate machine translation.
Back to the Turing test though: feed all the communication history of mankind into a machine with the the most sophisticated of text analysis algorithms conceivable if you want - it might be a fantastic oracle, but it will still be easily outed as a machine by someone asking the right questions - ones that rely on its lack of understanding of the real world (and more generally, anything) and its inability to deal with novel scenarios.
Maybe it could you fool into thinking it's an extremely dull witted human who didn't have an intelligent opinion on any new situation described to it mid-conversation (and thus wouldn't be able to just mindlessly search its data banks for one), but I'd argue that's not really in the spirit of the test.
dizmox wrote:
it might be a fantastic oracle, but it will still be easily outed as a machine by someone asking the right questions - ones that rely on its lack of understanding of the real world (and more generally, anything) and its inability to deal with novel scenarios.
But, but... this has been done before and the answer was... 42 ![]()
Inny Jan wrote:
dizmox wrote:
it might be a fantastic oracle, but it will still be easily outed as a machine by someone asking the right questions - ones that rely on its lack of understanding of the real world (and more generally, anything) and its inability to deal with novel scenarios.
But, but... this has been done before and the answer was... 42
No, the answer was Rick Deckard and a list of soon-to-be-deactivated replicants.
*(For those who have no idea)
Last edited by vileru (2012 July 22, 8:55 am)
Zarxrax wrote:
dizmox wrote:
A program that could pass the Turing test would need human level AI plus a deep understanding of the world we live in, and that's still the stuff of science fiction.
http://www.newscientist.com/blogs/onepe … ty-wi.html
1% away. Pretty damn close.
Sorry, but the judges in that test must have been incompetent or working under overly stringent conditions. Eugene Goostman is pretty unconvincing.
what happens if you ask one of the AIs "When did WW2 start?" or something anyone with access to wikipedia can easily check?
Can you really answer questions like that just with statistical text analysis?
nadiatims wrote:
what happens if you ask one of the AIs "When did WW2 start?" or something anyone with access to wikipedia can easily check?
Can you really answer questions like that just with statistical text analysis?
An AI doesn't have to know the answer to any question you might ask it. It just needs to be able to provide a convincingly human response in order to pass the turing test. Of course, any decent chat AI would have a wealth of facts that a typical human would know, and be able to associate each of them with other concepts.

