![]() |
|
Inquiry on feasibility/usefulness of my project - Printable Version +- kanji koohii FORUM (http://forum.koohii.com) +-- Forum: Learning Japanese (http://forum.koohii.com/forum-4.html) +--- Forum: Learning resources (http://forum.koohii.com/forum-9.html) +--- Thread: Inquiry on feasibility/usefulness of my project (/thread-11777.html) Pages:
1
2
|
Inquiry on feasibility/usefulness of my project - toshiromiballza - 2014-04-21 November 1st, 2014 update: the project is now uploaded on GitHub: https://github.com/mifunetoshiro/kanjium Long post ahead... Over the course of the past 4 years I've been working on and off on a custom "kanji database," because I wasn't happy with the various clones that simply employ Jim Breen's databases. All in all it took me hundreds of hours of my time and my girlfriend's patience, but today I can finally say I've done it. Everything that I envisioned and wanted to add is now done, including extra information that serves no purpose to me personally, but might be useful/interesting to somebody else. It started back when Jisho.org still had no stroke order images, so I used a Greasemonkey script to use the KanjiStrokeOrders font to see the stroke order. I was also disappointed because the kanji had so many readings, but no indication of which readings were actually useful. I downloaded KANJIDIC, stripped it out of all the useless things and added the official readings in a separate field. So my database began, 1 MB in size. Later I would notice other oddities and annoyances, such as a kanji would say it has 12,13 strokes. So which one is it? Some would say it's irrelevant, but unfortunately in my Japanese writing class in college, exams would include tasks where you had to write the correct stroke order, so I was determined to confront this issue. I checked various Japanese sources to find which version is "correct," and manually fixed all such cases in the database. Then I noticed the stroke order font says otherwise! I would email Tim Eyre with these mistakes so he would correct them in a future version, and he did, but eventually I came across so many mistakes, either wrong stroke order, wrong element shape or wrong elements used, that I decided to just fix them myself without waiting. So I generated 6355 images out of the font and fixed the mistakes with Photoshop. I came across new ones all the time and eventually lost track of all the corrections, so I didn't forward them to Tim Eyre any more. I went "full in" and if I found a kanji that has a mistake in a certain stroke or element, I would go through all kanji that have that element to make sure my corrections are thorough. I would compare them to kakijun.jp to make sure they are in fact correct. So now, these stroke order images are as accurate as kakijun.jp I would say. Still later, I would notice discrepancies between the radical data in KANJIDIC and my textbooks, so I went and compared them to a digital version of Kanjigen, as that would surely be a more accurate source. In it I noticed the radicals also include the radical variant, which I thought was useful, so I added that to my custom database. Many were however missing, or did not make distinctions between radical variants (e.g. ⺌, ⺍), so I looked for all the possible radical variants in unicode, and went through all the 6355 kanji and gave each kanji the correct radical variant. Then I learned about phonetics, and was disappointed KANJIDIC does not provide this information. So I discovered RTK. Thanks to RTK, I could include the phonetic characters in my database, but since so many were outside the scope of KANJIDIC or JIS X 0213, I had to hunt through the tens of thousands of CJK characters in Unicode to finally find them. Still later, I would find a few phonetics that are not in RTK at all, so now, in my complete database, there are 438 phonetic characters in total. Well, I thought, I might as well add the RTK indices to my database now, as someone would probably find that useful. Then I thought I might as well add other cool features, to make this database really stand out from the other clones. I added antonyms, synonyms, homonyms, look-alike kanji, etc. See the list below for all the changes/additions. Little by little, my database grew from 1 MB to 34 MB. Since a kanji-only database isn't very helpful in terms of vocabulary, I included EDICT into it. However, since this was first and foremost a kanji dictionary, and the main focus was on the correct kanji information and stroke order, and was never meant to be a Jisho.org (or any other alternative) replacement in terms of vocabulary, I opted to only provide official readings and common/frequent jukugo for each kanji. So for each kanji, you are presented with words that are actually useful/common and/or "official." This means that instead of over 200,000 entries in EDICT, this database only has some 12,000 of them. Again, the database was never meant to be a jisho replacement, this is just something extra I added over time to make it more useful. However, with so few entries compared to EDICT, I knew I had to add something to it to make it stand out from the other clones. So I added pitch accent information for the words, 64 different conjugations and pitch accent information for conjugatable words, and, what I consider the most useful, particle information for verbs, so that you know which verbs take which particle. I also included percentage information, so that if a verb can take more particles, you'll know which one is the most commonly used. Clicking on a particle will show you 5 sentences where that verb is used with that particle. Note, however, that mistakes are possible, as some of the sentences come from Tatoeba.org. See the list below for all the changes/additions. Anyway, the database is complete, but I'm not a designer or web developer to finally make this public and freely and easily accessible to all. I considered just uploading the database and make it available for everyone, but after spending hundreds of hours on this, I thought I should at least host the website for it first. Since the database has so many features now, I thought it would be best to make it completely customizable/user-configurable. Don't want to see homonyms? Turn them off. Don't want to see any images other than the stroke order image? Turn them off. Don't want to see etymology links pointing to KanjiNetworks.com or ChineseEtymology.org? Turn them off. I also want to implement customizable personal kanji/vocabulary lists which would be exportable to Anki. For example, you could save the kanji you come across and all their information that you want (radical, radical variant, phonetic, strokes, etc.) into your personal list (and include notes) and later export and review them in Anki. You could do the same with words, jukugo, etc., and include the pitch accent and particle information. So this is what this thread is about. If you think this seems useful, I'd like to start an Indiegogo campaign to raise the funds to hire a designer/web developer and hosting to make this a reality. I'm not in a situation to be able to spend ~1000 € of my own money for this, unfortunately. If I get positive replies, I'll ask the users on reddit for their opinions too, because getting those kinds of funds from the Koohii community alone is not possible. If the campaign does go online, I'd also appreciate sharing it among fellow Japanese learners and other communities/forums. It would be a "Fixed funding campaign," by the way, which means that unless the required amount is met, nobody will be charged a penny, but unfortunately the project/website won't go live either (yet, at least). Here is an ugly representation of some of the features, which I hope the website won't look anything like, lol! And here's how I envisioned the multi-element look-up UI to look like. KANJIDIC differences: ● Includes 445 more kanji (for a total of 6800), most of which are jinmeiyō kanji, various kokuji kanji and kanji used as phonetics ○ includes all 861 jinmeiyō kanji ○ includes 503 kokuji kanji ● Radicals corrected to match 漢字源 (Kanjigen) ● Includes appropriate radical variants (e.g. for 羊: ? or ⺶) ● Includes phonetic kanji (only for jōyō and jinmeiyō kanji) ○ 438 different phonetics (where 2 or more kanji share the same phonetic and reading) ○ Shows how many kanji use that particular phonetic. E.g. 者: 24, 古: 12, etc. ○ Clicking on a phonetic would show all kanji that use that phonetic ○ Grouped into: Mixed-reading phonetics (e.g. 工: コウ, ク), Single-reading phonetics (e.g. 古: コ), Absolute phonetics (e.g. 夆: ホウ). Absolute phonetics are phonetics where a kanji will have ONLY that single reading, no other onyomi or kunyomi (referring to official readings, that is). Quick breakdown: Mixed-reading phonetics: 66, Single-reading phonetics: 250, Absolute phonetics: 122 ○ 3 phonetics aren't even encoded in Unicode yet (and won't be), but I've assigned them a Private Use Area unicode index and made the appropriate stroke order image to represent them ● Includes kanji shape. This does NOT use Jack Halpern's SKIP data with 4 possible shapes. See image for 22 variations ● Includes kanji type (Pictograph, Ideograph, Compound ideograph, Phono-semantic compound) ● Includes official jōyō readings in a separate field ● Onyomi readings include information when that reading was introduced to Japan (go-on 呉, kan-on 漢, tō-on 唐, kan'yō-on 慣) ● Shows how many jōyō kanji have the same (official) onyomi reading (e.g. 別: ベツ (2), clicking on "(2)" would show 蔑 as the only other kanji with that reading) ● Includes thousands of extra kunyomi readings that are otherwise not in KANJIDIC, especially those obscure ones found on the KanKen tests ● Stroke information is corrected to match the JIS X 0213 standard ● "Grade" information is updated, and besides the standard grade 1-6 kanji, the changes include: ○ High school kanji are further grouped into: grade 7 (1st grade of junior high school), grade 8 (2nd grade of junior high school), grade 9 (high school). Grade 7 and 8 are approximate estimates based on one Japanese school's education manual/pamphlet, KanKen level and frequency, and are therefore not "official." ○ Jinmeiyō kanji are further grouped into: Jinmeiyō (former Jōyō kanji), Jinmeiyō (traditional variant of Jōyō), Jinmeiyō (traditional variant of Jinmeiyō) ○ Hyōgaiji kanji are further grouped into: Hyōgaiji, Kyūjitai-Hyōgaiji, Hyōgaiji (tolerated Jōyō variant), Hyōgaiji (former Jōyō candidate), Hyōgaiji (former Jinmeiyō candidate) ● Includes antonym (521) and synonym kanji (628) (limited to jōyō kanji) ● Includes look-alike kanji (limited to jōyō kanji), total of 487 kanji ● Includes homonym information for all kunyomi readings (and some onyomi exceptions), and whether the homonyms have the same or different pitch accent (limited to jōyō kanji), total of 690 ● Includes variant kanji: ○ Japanese variants, 118 common ryakuji, kyūjitai, shinjitai, Traditional Chinese variants, Simplified Chinese variants, ultra-simplified (unofficial) Chinese variants (basically Chinese ryakuji) ○ Common (3,500 daily-use hanzi) Simplified Chinese variants are marked ○ Common (4,808 daily-use hanzi) Traditional Chinese variants are marked ○ Ryakuji and ultra-simplified Chinese variants are rendered with custom woff/svg fonts (few kB in size) ● Includes new JLPT information (since no official list exists any longer, there is some guesswork involved by taking the KanKen level and frequency into account) ● Includes KanKen information ● Frequency is based on several averages (Wikipedia, novels, newspapers, ...) ● Besides the standard KANJIDIC definitions, includes "Compact meanings" in a separate field (only for jōyō kanji), which are only the most common definitions ● Initially I stripped pinyin out, but somebody with a Chinese background said it's really helpful, so I included pinyin with tone marks (not numbered), including 145 cases where the pronunciation differs in Taiwan (readings that are in brackets) ○ Meh, decided I'd throw hangul in there as well... ● Includes indices for RTK 1 & 3 (old and new editions), 2001 & 2301 Kanji Odyssey, White Rabbit Press' Japanese Kanji Flashcards and the custom indices for my printable flashcards ● Codepoints include: decimal, hexadecimal, UTF-8, UTF-16, JIS level, minkuten. All other indices and codepoints have been stripped out ● Also includes braille (rokutenkanji, kantenji) information in unicode, because why the hell not. E.g. 亜: ⠠⠁⠃, ⠃⠊ KRADFILE (search by kanji parts) differences: ● Completely revamped. Thousands of additions and corrections ● 163 more elements to choose from (for a total of 415 instead of 252) ● Includes shape information filter, with 22 possible shapes to make searching more accurate, faster and easier: ![]() ● Groups kanji into 3 levels (jōyō, jinmeiyō and hyōgaiji), so you can easily filter which results you want to see ○ They are all also coloured with a darker/lighter font in the results window to distinguish them easier ● The UI layout on the website would take radical types into account, so that kanmuri radicals are at the top, rare kanmuri below them, on the left are hen radicals, on the right are tsukuri radicals, etc. ● 41 elements on the UI layout are rendered with a custom ~10 kB woff/svg font, so there is no need to replace them with images to make them show on computers/devices that don't have huge CJK fonts installed (though, they won't get rendered correctly on devices running Symbian or Windows Phone 7, because they don't support woff/svg, sorry) ● Takes element position into account. E.g. selecting 木 from the kanmuri elements would show 査, 李, etc., selecting 木 from the hen radicals would result in 柿, 横, etc. ● Differentiates between elements that the default KRADFILE treats as the same. E.g. selecting 辶 would only show kanji that have the "road" radical with 1 dot, selecting 辶 would only show kanji that have 2 dots, the same with ⺌, ⺍ and all other such cases ● Limit results to kanji with more than/less than X strokes, +- ● Option to display compact kanji meanings in results window ● Includes a "part of" field that consists of kanji that contain the particular kanji, e.g. for 阿: 婀,痾 ● You can also look-up kanji by searching for any element/kanji it consists of, regardless if it is part of the possible 415 elements or not. E.g. 右 and 若 are not part of the 415 elements, but inputting either of them into a special kanji look-up search box will result in 匿, 能 will result in 熊, etc. JMdict/EDICT differences: ● Words for each kanji are grouped into: regular words (e.g. 一: イチ, 上: あげる), compound verbs (e.g. 受: 受け入れる), jukugo (e.g. 一: 一部), yojijukugo (e.g. 一: 一人一人) ○ The first 5 compound verbs, jukugo and yojijukugo for each kanji are the 5 most common ones. The rest are in random order ○ 801 distinct idiomatic expressions (yojijukugo) ○ 214 distinct compound verbs ○ 7771 distinct jukugo ○ 3449 distinct regular words ● Does NOT include obscure and "unofficial" words (e.g. there are no entries for 食む, 藩祖, etc.) ○ only jōyō kanji (with a few exceptions) have words associated with their kanji. E.g. 烹 has therefore zero words associated with it. Remember, this was not supposed to be a word dictionary or a Jisho.org replacement, this is first and foremost a kanji dictionary, everything else is extra ● Compound verbs, jukugo and yojijukugo readings are segmented so that you can find compound words that include any possible (official) kanji reading. E.g. 火 か: 火事, 火 ひ: 火鉢, 火 び: 下火. All possible (official) readings (including rendaku) are listed for every kanji, and there is at least one word associated with it ● Includes pitch accent information with several possible (customizable) ways of displaying it: annotated (by inserting two unicode arrows in the text), CSS (html scripting), binary (e.g. L-H (Low-High)), accented mora position ● For regular words, includes information whether that particular word is taught in primary school, junior high school or high school ● Includes JLPT information for regular words, compound verbs and jukugo ○ Source is tanos.co.uk with corrections by me. Note that since the JLPT doesn't release official lists any more, some of the corrections might not be corrections at all, since it's all more or less guesswork ● Includes frequency information for regular words, compound verbs and jukugo (very common, common, uncommon, rare) ● Includes particle information for verbs, compound verbs and jukugo+する verbs, so that you know which particle usually goes in front of that verb. E.g. 飼う: を, 一致: と(56%),が(44%) ○ Clicking on a particular particle will show you 5 sentences where that verb is being used with that particle ○ The sentences come from smart.fm/inknow.co.jp (the Core 6000 deck) and the Tatoeba.org project. The sentences are sorted so that the Core 6000 sentences take priority over the Tatoeba ones, meaning they will show first as much as possible. Both are licensed under CC. Contains a total of 12,975 sentences ○ The sentences include the Japanese sentence, the Japanese sentence with furigana, the English translation, and a list of all the unique kanji in the sentence ● All regular verbs and い adjectives have conjugations available (plain polite, negative plain polite, past polite, passive, negative passive, past passive, negative past passive...). If it's a valid conjugation, it's there. A bit of an overkill, but who cares ○ Clicking on a conjugatable word will show you a list of 64 conjugated forms ○ Conjugated forms ALSO have pitch accent marked with CSS (not all, however) ○ All conjugations of irregular godan verbs and irregular する verbs have been taken care off and properly marked Word/kanji search: ● You can search in any conjugation form, whether Hepburn, Kunrei or kana. E.g. isogashikunai, isogasikunai or いそがしくない will result in 忙しい and 忙. Also taken into account are Kunrei exceptions such as tu/tsu/du/zu/hu/fu, etc. Note that various irregular Hepburn-Kunrei hybrid searches will show no results; this is to "enforce" proper form ● You can also limit your search to just the official readings, so that e.g. しるし will result in 印, but not 験, 璽, 徽, etc. ● You can search by inputting a Simplified Chinese hanzi and it will redirect to the appropriate Japanese kanji variant/s (if it exists) ● Search by indices/codepoints possible as well Images: ● The Kanji Stroke Order images (KSO) were generated using the KanjiStrokeOrders font by Tim Eyre. Copyright is held by Ulrich Apel and the Wadoku project. The images were NOT generated using the KanjiVG project ○ Some ~400 were made by me with Photoshop and are not part of the font ○ Includes hundreds of corrections and modifications to match the JIS X 0213 standard. kakijun.jp was very helpful with that, and I noticed a couple of mistakes even there! ● Gyōsho, tensho and sōsho images were generated with freeware fonts, I forgot their names... ● The "origin" images (oracle, bronze, large seal, seal) are composite images generated from Richard Sears' website (chineseetymology.org). Please ask for permission to use first. Double-sided printable flashcards (example: http://www.mediafire.com/download/ywz6kg48dl47gvy/example_flashcards.pdf): ● The front side of the card contains the stroke order image, 5 jukugo and the card index ○ The jukugo are (as much as possible) sorted so that for every new card, you should already know the other kanji in the jukugo. E.g. once you come to card 66 (生), you should have already seen the other kanji its jukugo consists of (一, 年, 先, 大, 学) ○ In cases where no such jukugo are possible, the jukugo are sorted so that the jukugo consist of new kanji closest to the current card index. E.g. card 1 (一) has its jukugo comprised of the following: 人, 日, 本, 大, 手 ○ The jukugo are common compounds only ● The back side of the card contains: radical, phonetic, stroke information, RTK (6th ed.) and White Rabbit Press index, official onyomi and how many other cards have that onyomi in superscript, official kunyomi and their accented mora in subscript, compact kanji meanings, jukugo readings, their accented mora and meanings, homonyms and their card index, lookalike kanji and their card index, word meanings for each kunyomi ● Includes all 2136 jōyō kanji + 57 extra cards ● The cards are sorted first by grade and then by KanKen level and frequency ● The flashcard data is a little outdated in certain aspects, because during the 4 years I've been working on the database, I stopped updating the flashcard data. E.g. some cards are missing the phonetic data, homonyms, lookalikes, use less common jukugo, etc. I don't intend on ever updating them because it's too much work ○ The reason why I stopped updating it is because of the prevalence of handheld devices (tablets, smartphones) that can do the same with Anki, and do not require you to carry physical paper cards in your pockets. The concept of printable flashcards just seemed outdated, but maybe there are still some people who prefer them... Edit: First reddit thread for feedback Fundraising reddit thread Indiegogo page Inquiry on feasibility/usefulness of my project - Flamerokz - 2014-04-21 I'm pretty blown away by all the work that seems to have gone into this. Looks like a kanji-lover's paradise; if you decide to run the indiegogo campaign and it fails I hope you'll make it available to learners by some other means anyway. Inquiry on feasibility/usefulness of my project - toshiromiballza - 2014-04-21 Yeah, I guess I'll save some money each month from my student part-time jobs, so I'd probably have enough by the end of the year or sometime next year. If something unexpected doesn't pop out, that is. So I'll definitely have a website ready eventually, it's just a question of when. Inquiry on feasibility/usefulness of my project - Northern_Lord - 2014-04-21 I salute you. You have gone and sacrifized so much of your own time to make something of this scale all alone, for all Japanese learners than may come after you. This looks like a tremendous gathering of useful information, and I'd love to see this available for everyone. Inquiry on feasibility/usefulness of my project - toshiromiballza - 2014-04-21 Northern_Lord Wrote:I salute you. You have gone and sacrifized so much of your own time to make something of this scale all aloneThanks. I actually went even further and did something else, inspired by this textbook. There are certain exercises where you "build" a kanji by circling and combining the correct elements, for example, かた (model): ⺍ (开) 木 十 豆 口 氏 一 亻 (刂) 日 歹 田 龷 (土) 宀 女 丸 = 型 I thought that would be a great game to play (either flash version or for Android/iOS), so I went and built a database for all the jōyō kanji with kunyomi (where possible, at least - you can't break down 一, etc., and it's kind of futile to do this with kanji that only have onyomi readings). I made it with two "difficulties," e.g. where you only need to find two elements (刑 + 土 = 型, 亻 + 立 = 位), and the "hard" mode where you need to find three elements (广 + 亻 + 寸 = 府, 氵 + 厶 + 口 = 治). A total of 1290 kanji were able to be broken down like that, and I even went ahead and made a custom font for it, because I had to break some kanji down into elements that don't even exist in Unicode. E.g. 辰 can't be broken down any further, so you can't play with 震 on "hard" mode, but I took advantage of Unicode's Private Use Areas and created that element and assigned it a unicode index, so now that's not a problem. The custom font contains 938 unique elements, and I used freeware fonts as a base (IPAexGothic, WenQuanYi Zen Hei, HAN NOM A & B). I also made it so you can sort and play only kanji in a certain school grade, JLPT level, KanKen level or up to a certain RTK index. After I've done that, I got an idea of extending this onto phonetics, which probably wouldn't be as popular, but the idea got into my head and I had to finish it... For example: You are given a phonetic (且), a jukugo hint (_ 父), a reading (そふ) and meaning (grandfather). Of the randomly generated elements, you have to find one (or two) to build the kanji. So in this case you need to find 礻 to make 祖 = 祖父. (古), (禁 _), (きんこ), (imprisonment): you need to find 金 and 囗 to make 錮 = 禁錮. Pretty logical, right? So that's what I did, and there are 284 kanji with single-reading and absolute phonetics that you can "play with." Later on I expanded further and made this possible with kanji that have mixed-reading phonetics, as a sort of "hard(er) mode." For example: 直, ショク、チ, _ 物, しょくぶつ, plant: you need to find 木 to make 植 = 植物 直, ショク、チ, 価 _, かち, value: you need to find 亻 to make 値 = 価値 Same gameplay basically... So anyway, by including the kanji with mixed-reading phonetics, I could add another 114 kanji to play with. Unfortunately my brain wouldn't give me a break and instead gave me another idea I had to finish. To expand this to jukugo. For example, you are given a hint jukugo (_ 明) or (文 _) [it's random], or no hint in "hard" mode (_ _), a reading (ぶんめい) and meaning (civilization). In the first case you need to find 文 to make 文明, in the second case 明, and in hard mode both. I also made it so you can sort and play only the jukugo in a certain JLPT level. It's got 3712 jukugo to play with. Then I expanded it even further, and instead of finding the missing kanji, why not build it by finding the right elements, like with the kunyomi words and phonetics? For example: _ 弟 or 兄 _, きょうだい, siblings You need to find 口 and 儿 to make 兄 = 兄弟, or 丷 and ? to make 弟 今 _, こんしゅう, this week You need to find 辶 and 周 to make 週 = 今週, or 辶, 冂 and ? in "hard" mode It's got 3588 jukugo to play with, and again, you can sort and play only the jukugo in a certain JLPT level. But anyway, this is something different entirely to the OP, and for the time being this will be on hold, because I almost had a heart attack when I learnt how much programmers charge for making an Android/iOS game/app, lol! Inquiry on feasibility/usefulness of my project - tashippy - 2014-04-21 This is really great. I hope we can make it happen. Inquiry on feasibility/usefulness of my project - toshiromiballza - 2014-04-22 Okay, I just added something extra. I found a free 4MB ryakuji font, but I extracted only 118 common ryakuji characters out of it (by comparing various online lists) and recompiled the font. So now it's 26 kB and I can for example render 機 as and make it copy/paste-able instead of an image. Pretty useless I suppose, but what the heck.
Inquiry on feasibility/usefulness of my project - TsugiAshi - 2014-04-24 Just speaking on the concept of seeing stroke order in general, the proper stroke order is something I've been interested in in regards to kana and kanji. When I was going through kana, I used a book that had a step by step sequence that showed how to write the basic kana for each individual one, even with arrows and numbers to indicate which direction you start and finish at. But the one thing that for the life of me I couldn't find for the longest time was how to properly write the dakuten and handakuten (i think that's what they're called) marks for the extended sounds. Now I know it sounds silly to be nit-picky over something so minor, but in my mind it was that if I'm going to learn how to write something, I'd rather learn how to write it properly the first time through. And that thought process has carried over to kanji. So in regards to kanji, the numbered marks and arrows for the step by step process involved with kana is what I'm personally interested in when it comes to kanji. Inquiry on feasibility/usefulness of my project - toshiromiballza - 2014-04-27 Question: Do you think such a layout for the multi-element look-up UI would be okay? Anything I should change/add? Maybe mush the "non-radicals" together with the "middle/other" radicals-proper, or keep them separate? Inquiry on feasibility/usefulness of my project - ikore - 2014-04-27 toshiromiballza Wrote:Question:For an interface that complicated you will need someone skilled to do the information architecture for you, or you will risk having a very complete database but not completely user friendly at all. Sadly it's outside my ability to help out with that, but it's something important to keep in mind. I hope this gets realized. Inquiry on feasibility/usefulness of my project - toshiromiballza - 2014-04-27 ikore Wrote:For an interface that complicated you will need someone skilled to do the information architecture for youWell yeah, I'll have to hire a web developer to do everything web/code-related, because I can't do any of that myself. I can just tell him how I've visualized it and he'll do the magic, hopefully. I made that image in Word and Photoshop, heh. Was just wondering whether it was more useful/user-friendly to group the "non-radical" elements with the proper radicals in the "middle/other" section, or keep them separate. Probably the former. Edit: decided to merge them together. Inquiry on feasibility/usefulness of my project - Zarxrax - 2014-04-27 Wow, this looks really awesome. I'd certainly be willing to put a little bit towards it. Inquiry on feasibility/usefulness of my project - NinKenDo - 2014-04-27 This is amazing. I'd pay to see this happen. Question though. If you were to crowdfund this, what could the rewards possibly be? Most people see crowdfunding as pretty much a way of preordering the available backer rewards, so I wonder what you could possibly come up with as rewards, as those tend to make or break a campaign. Inquiry on feasibility/usefulness of my project - juniperpansy - 2014-04-27 Curious, instead of paying someone to write the web app, have you thought about just getting a couple of volunteers to make it? I do android development (not web stuff unfortunately) but it looks what you want done is pretty simple. You have everything in a database. You just need a user interface and search functionality, right? Basically just a user interface for a dictionary? I think it should take less than a week to make. Actually if it is that simple I would be happy to make an android app for free to do this Inquiry on feasibility/usefulness of my project - Zgarbas - 2014-04-28 I would so pay to see this happen. Inquiry on feasibility/usefulness of my project - toshiromiballza - 2014-04-28 NinKenDo Wrote:Question though. If you were to crowdfund this, what could the rewards possibly be? Most people see crowdfunding as pretty much a way of preordering the available backer rewards, so I wonder what you could possibly come up with as rewards, as those tend to make or break a campaign.Yes, that is the thing that is bothering me, and why I still haven't launched a campaign. There really isn't much I can give out as rewards, except maybe make a page on the finished website thanking them... After all, this wouldn't be crowdfunding for a payable product which they can then receive for free or some such as a reward, because the finished product would be a website free for everyone anyway... If you've got any ideas, I'd like to hear them. juniperpansy Wrote:Curious, instead of paying someone to write the web app, have you thought about just getting a couple of volunteers to make it?That is tricky because: a) it's hard to find volunteers to do such a thing professionally for no monetary compensation b) I'm a little reluctant sharing the database with complete strangers on the internet who might take off with the database on their own, or leave certain backdoors in the code for later on c) I'd much rather meet the person in person (heh), have a drink with him/her and shake on it As I said, if the crowdfunding won't work, I'll pay for it myself, it'll just take longer for the website to go live, because I'm not in a position to spend ~1,000€ on this just yet. juniperpansy Wrote:I do android development (not web stuff unfortunately) but it looks what you want done is pretty simple. You have everything in a database. You just need a user interface and search functionality, right? Basically just a user interface for a dictionary? I think it should take less than a week to make.Yes, pretty much all there is missing is the code logic behind it all, and the design. I'm pretty sure the user-specific settings and the custom lists and exporting to Anki thing would be the hardest to code (and the design), the rest would pretty much be querying the database. juniperpansy Wrote:Actually if it is that simple I would be happy to make an android app for free to do thisYou mean an Android app of the website? I haven't thought of that before. Maybe that's something I'll take into consideration once the website is done. With all the images taken into account, it'd be 170+ MB, though. So maybe it would be better to make it require internet connection... But anyway, I'll ponder on this later on. In the meantime, have you read this post and the game/database I have in mind? How hard would that be, and would you consider doing that for free? Inquiry on feasibility/usefulness of my project - toshiromiballza - 2014-04-28 NinKenDo Wrote:what could the rewards possibly be?Okay, I think I've thought of some reasonable "awards": Quote:4 €: Supporter:Something else? The contribution options would be in Euros, but made to somewhat match $5, $10, $15, $25, $50, $100. I've also just posted my inquiry on Reddit. Inquiry on feasibility/usefulness of my project - riogray - 2014-04-28 This is open source too, have a look. They send you some small something for the higher tiers, which will probably cost very little compared to how much they back. Otherwise search Kickstarter for open source, there are a lot of results and surely you can find some good ideas on there. Inquiry on feasibility/usefulness of my project - Sauzer - 2014-04-28 Just want to add another voice to the chorus of "I will throw money at this" It's always great to see a database resource that doesn't turn out to just be a repackage of Jim Breen
Inquiry on feasibility/usefulness of my project - headphone_child - 2014-04-28 Ambitious project, hopefully this happens. I'm a web developer and I'd be able to work on this if I'm available for work at the time you're ready to hire someone, but I'm on east coast USA so I'm guessing we wouldn't be able to meet in person. Let me know if you're interested anyway though -- I think it'd be a plus to have this developed by someone with some knowledge of the Japanese language. juniperpansy Wrote:Curious, instead of paying someone to write the web app, have you thought about just getting a couple of volunteers to make it?If the project was actually that small, I'd throw it together for free, but this isn't a simple, week-long project. Or if it was done in a week, it'd likely be bare-bones and missing features like these: toshiromiballza Wrote:Since the database has so many features now, I thought it would be best to make it completely customizable/user-configurable. Don't want to see homonyms? Turn them off. Don't want to see any images other than the stroke order image? Turn them off. Don't want to see etymology links pointing to KanjiNetworks.com or ChineseEtymology.org? Turn them off. I also want to implement customizable personal kanji/vocabulary lists which would be exportable to Anki. For example, you could save the kanji you come across and all their information that you want (radical, radical variant, phonetic, strokes, etc.) into your personal list (and include notes) and later export and review them in Anki. You could do the same with words, jukugo, etc., and include the pitch accent and particle information.And it depends on how is the data currently stored. Which DBMS? How many tables? Is it 3NF? If not, the tables could need redesigning too. And additional tables are needed for storing user settings as described above. toshiromiballza Wrote:I agree with this assessment. Too many risks; you want to keep your data safe, and you want a quality product.juniperpansy Wrote:Curious, instead of paying someone to write the web app, have you thought about just getting a couple of volunteers to make it?That is tricky because: toshiromiballza Wrote:You mean an Android app of the website? I haven't thought of that before. Maybe that's something I'll take into consideration once the website is done. With all the images taken into account, it'd be 170+ MB, though. So maybe it would be better to make it require internet connection... But anyway, I'll ponder on this later on.An app for the website itself shouldn't be necessary. The website just needs to adapt responsive design, and then it'll be easy to use on all phones and tablets. What could be nice is an app completely separate from the website where you download all the dictionary data onto your device initially when installing the app, so that you can use the app without an internet connection (a la JED for Android). I'd probably use an app like that. toshiromiballza Wrote:Okay, I think I've thought of some reasonable "awards":The acknowledgements, advertising, and name picking ones look good, but I'm not sure about the other two. A forum where anyone can pay to become a moderator... I'd be worried about the trustworthiness of such moderators. And of course, not everyone would consider being a moderator an award, since it's more like a responsiblity/obligation on their part. Maybe it'd be better to just have contributors of all amounts denoted as such in the forums. But mainly, the really risky one is "Be a voice in the process" -- you have to be very careful of causing feature creep with this, and it could easily increase the cost of development. Of course this project isn't on the same scale as that Double Fine project, but the same concept applies. It depends on the billing style of the person you hire (hourly vs flat rate), but if you want to make significant changes after things have already been developed, in either case it'll cost extra. If you give contributors too much say, the budget could go out of control, along with the scale of the project. But if you don't give them enough say, then they didn't get their money's worth with this "award". If you want to allow input from contributors at all, I think it would be best to have all of it done during the initial requirements analysis phase -- before any development is even started. Inquiry on feasibility/usefulness of my project - toshiromiballza - 2014-04-28 headphone_child Wrote:I'm a web developer and I'd be able to work on this if I'm available for work at the time you're ready to hire someone, but I'm on east coast USA so I'm guessing we wouldn't be able to meet in person. Let me know if you're interested anyway though -- I think it'd be a plus to have this developed by someone with some knowledge of the Japanese language.That would indeed be a plus, as well as save me the time of having to instruct the developer on what this or that is, where this or that should be, etc. I can already imagine the endless phone conversations and additional meetings I'd have to have with the otherwise clueless developer... How much would you say you'd charge for such a project (the flat rate)? I imagine the prices are way higher in the US than where I am (Slovenia). headphone_child Wrote:And it depends on how is the data currently stored. Which DBMS? How many tables? Is it 3NF? If not, the tables could need redesigning too. And additional tables are needed for storing user settings as described above.Initially I went with MySQL (because it was the only SQL database I was somewhat familiar with). I soon realized it's not adequate and doesn't even support many of the CJK characters my database consists of... So I googled around and came across SQLite; it was perfect and lightweight, and I think it really is the best option for the job (not sure about all the user-related stuff, though). 23 tables, but I'm sure I could merge some together (e.g. there are two tables for 'shinjitai' and 'kyuujitai', perhaps storing both that information in the 'variants' table would be a better idea, not sure if it really matters speed-wise, though?). No idea what 1NF, 2NF, 3NF is, even after I've read something about it, lol. Also, some of the entries in the tables would require additional Python/PHP magic before being output to the user. For example, here is an example of the 'particles' field: "が(59%),に(30%),を(11%)" Before this is output, Python/PHP should "explode" the entry to separate the actual particles from the brackets, percentages and commas, so that clicking on a particle would load the appropriate sentences by querying the 'sentences' table for that 'particle+word' combination. headphone_child Wrote:What could be nice is an app completely separate from the website where you download all the dictionary data onto your device initially when installing the app, so that you can use the app without an internet connection (a la JED for Android). I'd probably use an app like that.I suppose, but this would have so many features, how do you display everything on the small screens of mobile devices, or even tablets? This would require a lot of scrolling or opening different tabs just to get to the part you wanted to see... I'm not entirely sure the app idea is feasible. And remember, this in no way replaces EDICT as a vocabulary dictionary, it's got 12,000+ entries compared to EDICT's 200,000+. So sure, it's a great kanji resource with cool extra bits for all the "official" words and common jukugo, etc., but a lot of people would be disappointed after searching, for example, "こんにちは" or some obscure jukugo, and there would be no results. I mean, I could append the rest of EDICT into the database, but those entries would have no extra features, they would just be as-is. I don't think it's worth it... headphone_child Wrote:The acknowledgements, advertising, and name picking ones look good, but I'm not sure about the other two. A forum where anyone can pay to become a moderator... I'd be worried about the trustworthiness of such moderators. And of course, not everyone would consider being a moderator an award, since it's more like a responsiblity/obligation on their part.That's a good call on the moderator thing, I'll scratch that off the list. headphone_child Wrote:But mainly, the really risky one is "Be a voice in the process" -- you have to be very careful of causing feature creep with this, and it could easily increase the cost of development.Hm, well, I think I covered all the possible features myself already, and there really isn't anything to add! Well, somebody at Reddit did mention it would be nice to have pinyin (and Korean) included, so I guess I'll throw that in too... But I was referring more to the design itself. I mean, it would be better for people to see and comment on the design first, so appropriate changes can be made based on user input, instead of finishing the website and then people complaining this should be changed, this is ugly... Although, if somebody has some great ideas and I can include it, why not. In any case, I think access to the beta (or just preview screenshots) seems like a valid "award." Thanks for your long input, I appreciate it! Inquiry on feasibility/usefulness of my project - juniperpansy - 2014-04-28 toshiromiballza Wrote:b) I'm a little reluctant sharing the database with complete strangers on the internet who might take off with the database on their own, or leave certain backdoors in the code for later onb) is a non issue. Put a software license (or whatever it would be) on your database. If anybody steals it, it would be obvious. It is such a niche project that it is unlikely to be any any financial incentive to steal it. Having it made open source (hosted on sourceforge or github) could easily help the above issues. toshiromiballza Wrote:Yes, pretty much all there is missing is the code logic behind it all, and the design. I'm pretty sure the user-specific settings and the custom lists and exporting to Anki thing would be the hardest to code (and the design), the rest would pretty much be querying the database.Exporting to Anki is the hardest thing to code? On android that can be coded in less than an hour... Seriously what you have written above is simple to code. The only thing I see as possibly being an issue is having the user interface look pretty. I'm not too worried though. toshiromiballza Wrote:You mean an Android app of the website? ... So maybe it would be better to make it require internet connection...No everything would be stored on the phone itself. Requiring internet connectivity to get an dictionary type app to work is terrible for the end user. toshiromiballza Wrote:In the meantime, have you read this post and the game/database I have in mind? How hard would that be, and would you consider doing that for free?I have very little graphics programming experience so I'm not sure about the game part. I would worry about the basic functionality first. Even if I am unable to do the game part, it would be very simple for someone to add it in later.Whatever I would do would be free and open source. Take what headphone_child says with a grain of salt. He is looking for a paid job. headphone_child Wrote:If the project was actually that small, I'd throw it together for free, but this isn't a simple, week-long project.1000 euro = 1 week @ 40 hour, 25 euro/hour. Pretty standard payment rate. So in other words your idea of getting a quality product is to pay a developer sub par wages for 'weeks' of work? headphone_child Wrote:I agree with this assessment. Too many risks; you want to keep your data safe,If the DB is on your device all you need to do is root/jailbreak it get the DB... Inquiry on feasibility/usefulness of my project - toshiromiballza - 2014-04-28 juniperpansy Wrote:b) is a non issue. Put a software license (or whatever it would be) on your database. If anybody steals it, it would be obvious. It is such a niche project that it is unlikely to be any any financial incentive to steal it. Having it made open source (hosted on sourceforge or github) could easily help the above issues.I really don't want to put any restrictive licenses on the database. At the moment nothing is licensed, except for everything that comes from Jim Breen, and after I do release it, sometime after the website goes live, it will be licensed under a CC license so anyone can use it without permission. But until then, I really don't want somebody random to just take it to start his copycat project before mine even goes online. juniperpansy Wrote:Exporting to Anki is the hardest thing to code?No, no... Not the exporting thing, but the logic behind the personal customizable lists and the user-specific settings. I mean, compared to the rest of the website, which mostly just queries the database. juniperpansy Wrote:Requiring internet connectivity to get an dictionary type app to work is terrible for the end user.I agree on that. But I'm not entirely sure if this deserves a mobile app, or if it would actually make sense, as laid out in my previous post. So for now, the focus is on the website. juniperpansy Wrote:I have very little graphics programming experience so I'm not sure about the game part. I would worry about the basic functionality first.There are zero graphics involved with the game. It'd essentially be a "word game," like a crossword puzzle or some such. I actually found an open-source type of Android game that could be used (hopefully) as the base: http://code.google.com/p/lexic/ juniperpansy Wrote:Take what headphone_child says with a grain of salt.Well, I do consider this to be a paid job, heh. juniperpansy Wrote:So in other words your idea of getting a quality product is to pay a developer sub par wages for 'weeks' of work?Those might be sub-par wages in the US, but they're not in Slovenia! Because if I do end up hiring someone knowledgeable both in Japanese and web development, who is incidentally from the US, I'd probably have to increase the campaign funding goal... Unless, as you say, someone is willing to work for sub-par wages.
Inquiry on feasibility/usefulness of my project - ikore - 2014-04-28 toshiromiballza Wrote:Those might be sub-par wages in the US, but they're not in Slovenia!Even then 1000 euro wouldn't cut it, that would be maybe just enough to pay for a very bare-bone prototype. This is not a small project, something as complex like this needs to be thought out very well in every stage of the project. If you don't, you're more than likely to end up with a sub-par end result. I know it's cheaper in Slovenia, but this is easily a few months of full-time work (which would make it more into the range of 3000-6000 euro). Inquiry on feasibility/usefulness of my project - toshiromiballza - 2014-04-28 ikore Wrote:but this is easily a few months of full-time work (which would make it more into the range of 3000-6000 euro).Wow, seriously? I didn't think it costs that much, or that it takes that long... I was imagining this being done in two weeks tops... Maybe I should learn to code and make a nice living off of that, lol. Will a website in Python cost more than one done in PHP, by the way? I read some good articles about how it's becoming more and more used for web development, whereas PHP is "getting outdated." |