[WISH] An idea for a future built-in translator.

David Woolley forums at david-woolley.me.uk
Wed Jan 6 03:04:10 EST 2010


Kristoffer Grundström wrote:

> My idea is that Pidgin could have a built-in script or function that 
> doesn't really needs to be enabled to work that translates the incoming 
> text into YOUR language & the same back to the other person & the script 

That pre-supposes an open source machine translator.  I'm not aware of 
any such project.  Machine translation, even between Western European 
languages is a difficult problem and may well be beyond the resources of 
any reasonable open source development project, which will have to 
re-invent a lot of the work done for proprietary programs.

> would also recognize if this text is written like a document or letter 
> or just a plain conversation so that the text will turn out correct when 
> translated. I know that it's a big project, but think a bit.
> 
> If a person writes in their native language to you and you don't understand.
> If you want this text translated you have to spend time after time after 
> time after.........you know the drill......to find what that particular 
> word or sentence means if you DO find it.

What might be possible would be the facility to look up individual words 
  in a dictionary, but maintaining those dictionaries would have to be a 
project in its own right.  Dictionaries require a lot of work to 
compile, and you cannot use existing dictionaries in the process, 
because that would infringe on the dictionary compilers' copyrights. 
CEDICT has existed for some time, but only gives the translation, with 
no explanation on usage, and is missing many words.


> Google translate doesn't translate whole sentences correctly since many 
> words can mean many things.

Most words have multiple meanings!  Google and Babelfish will have cost 
a lot to develop (they may be based on commercial, standalone, 
products).  They do not translate word for word, but the fact that they 
are still produce very bad translations demonstrates how difficult the 
problem is for someone with lots of commercial resources, including the 
ability to licence existing dictionary data.



-- 
David Woolley
Emails are not formal business letters, whatever businesses may want.
RFC1855 says there should be an address here, but, in a world of spam,
that is no longer good advice, as archive address hiding may not work.



More information about the Support mailing list