copyright issues: wordlists/dictionaries/google translate

Index » 喫茶店 (Koohii Lounge)

  • 1
 
Reply #1 - 2012 May 10, 5:06 am
nadiatims Member
Registered: 2008-01-10 Posts: 1676

Are wordlists covering vocabulary for tests such as Eiken, JLPT and so on covered by copyright?
Is it possible to use such lists in things that will sold for money?

How about services that make use of translations sourced from google translate or data taken from online dictionaries?

Any advice would be greatly appreciated.

Reply #2 - 2012 May 10, 7:24 am
Blahah Member
From: Cambridge, UK Registered: 2008-07-15 Posts: 715 Website

My understanding is that under UK law, if you were to create your own word-lists based on examination of JLPT (or whatever) tests, then you own the copyright to that. I know that in the USA there is precedent for the creator of a word-list owning the copyright to that list, but that means only the complete list, as anyone can reasonably create a list of words conforming to certain conditions. The scrabble word lists have tested this in court.

So you will need to take into account the license for any data you plan to use. Jonathan Waller has JLPT word lists released under the CC-BY license, meaning you can use them commercially as long as he is given some credit somewhere.

Any translation service will be subject to terms of service, rather than copyright, and you'll need to study those for any service you plan to use.

The Google translate API is now a commercial (paid-for) service, as laid out in the TOS. You can use it for any sort of content, commercial or otherwise, unless it's to 'improve ... a substantially similar product or service', which basically means don't make your own translation website which secretly uses Google's. The TOS forbids any attempts to use the service whilst avoiding the fee-paying mechanism, which means you couldn't, for example, make HTTP requests to the translate web page and scrape the results. The prices are very fair, so you should consider signing up and paying - if you're using the API enough to incur big fees, your app should be making big money.

Dictionaries are certainly copyrightable in most countries which respect that sort of thing, being true creative works which involve generating new content and not just subsetting some other source. They will therefore always be under copyright unless the right have been waived by the author, and you'll need to look for a data license for each dictionary.

Online dictionaries generally have their own licenses published on their websites. For example, the licenses for JMDICT, ENAMDICT, KANJIDIC, etc. are under CC-BY SA license, as explained in full here. That means you can use them with attribution, including in commercial apps, but in this case they recommend (but don't require) making a small donation to the project if you do so. The SA part means Share-Alike, i.e. if you create a new list which is largely derived from the original one, you must release it under the same license (you don't have to release any software which uses the list under that license, though).

Last edited by Blahah (2012 May 10, 8:42 am)

Reply #3 - 2012 May 10, 8:48 am
nadiatims Member
Registered: 2008-01-10 Posts: 1676

thanks a lot smile

I'll have a read through all those links.

Advertising (register and sign in to hide this)
JapanesePod101 Sponsor
 
Reply #4 - 2012 May 11, 8:36 am
HelenF Member
From: UK Registered: 2012-04-11 Posts: 39

Blahah wrote:

Online dictionaries generally have their own licenses published on their websites. For example, the licenses for JMDICT, ENAMDICT, KANJIDIC, etc. are under CC-BY SA license, as explained in full here.

I noticed that Heisig isn't mentioned in "Special Conditions for the KANJIDIC, KANJD212 and KANJIDIC2 Files". The Heisig numbers have their own field. Most of the RTK keywords were added to the kanji meanings. I'm wondering about possible copyright issues with that.

Found the full early history. Heisig agreed to the inclusion of the numbers, but after they had already been included for some time.
I'm puzzled about how some of those things are allowed, e.g. the other meanings which "appear to have been based on Nelson".
http://www.csse.monash.edu.au/~jwb/kanj … tml#IREF07

Partially explained:
http://www.csse.monash.edu.au/~jwb/edic … tml#IREF04

Dictionary copyright is a difficult point, because clearly the first lexicographer who published "inu means dog" could not claim a copyright violation over all subsequent Japanese dictionaries. While it is usual to consult other dictionaries for "accurate lexicographic information", as Nelson put it, wholesale copying is, of course, not permissible, and contributors have been advised to avoid direct copying from other sources. What makes each dictionary unique (and copyright-able) is the particular selection of words, the phrasing of the meanings, the presentation of the contents (a very important point in the case of this project), and the means of publication.

Last edited by HelenF (2012 May 11, 9:21 am)

  • 1