kanji koohii FORUM
smart.fm corpus - Printable Version

+- kanji koohii FORUM (http://forum.koohii.com)
+-- Forum: Learning Japanese (http://forum.koohii.com/forum-4.html)
+--- Forum: Learning resources (http://forum.koohii.com/forum-9.html)
+--- Thread: smart.fm corpus (/thread-3959.html)

Pages: 1 2 3


smart.fm corpus - avparker - 2009-09-09

mafried Wrote:Suspended facts have the 'Suspended' tag, right? In any case you can ask Damien about this. For bold text, you'll just have to insert the HTML around the word in the field.
They did in old versions, but this was changed in a recent version of Anki.
Now the card (or maybe the fact, I'm not sure) has a flag which marks them as suspended.
You can verify this by using the "Suspended" filter in the "Browse Items" dialog, the facts don't have a suspended tag.

I've briefly looked at some of the Anki code, there are some query helper methods that let you search using syntactic sugar. The "Browse Items" dialog searches for "isConfuseduspended".


smart.fm corpus - dawhite - 2009-09-09

Yeah, I'm looking myself and it seems it's suspended if the priority is -3. I've even gotten it to find all of the suspended cards.

However, now the exporter is being funny and not respecting limitCardIDs....

Also, before that, it didn't know where the backup directory was in windows and I had to manually change the library source code.

This is hell.


smart.fm corpus - avparker - 2009-09-09

dawhite Wrote:Also, before that, it didn't know where the backup directory was in windows and I had to manually change the library source code.
In the "Preferences->Saving" dialog, there is a link "Open backup folder", which will open the folder in Windows Explorer.


smart.fm corpus - dawhite - 2009-09-09

IN YOUR FACE, ANKI!

I had to change the library twice, but goddamnit I got it. Now, to do the actual work...


smart.fm corpus - dawhite - 2009-09-09

avparker Wrote:
dawhite Wrote:Also, before that, it didn't know where the backup directory was in windows and I had to manually change the library source code.
In the "Preferences->Saving" dialog, there is a link "Open backup folder", which will open the folder in Windows Explorer.
Yeah, well, I knew where it was. But the library didn't know where it was. It was actually using forward slashes in windows, and it was missing a folder... I have no idea why the distributed compiled version works.


smart.fm corpus - bombpersons - 2009-09-10

From what I've seen in the anki source code, priority == -3 means the card is suspended. Have a look at the suspendcards and unsuspendcards functions in deck.py


smart.fm corpus - bombpersons - 2009-09-10

How did you manage to install the python bindings for mecab? Whenever I try it gives a me list out of range error...


smart.fm corpus - dawhite - 2009-09-10

Yup, the priority thing works. That was a good catch.

See my post "Installing the MeCab Python Binding" for just how pissed off MeCab makes me =P.

Also, my sample script is missing a colon in the for loop. Whoops.

AND, finally, I've just realized that probably drilling straight vocab isn't the way to go anyway. So... yeah, let's all live our lives?

The stuff you've already done is going to be so incredibly helpful, though. I can barely describe how much of my hero you are.


smart.fm corpus - ruiner - 2009-09-10

bombpersons Wrote:
ruiner Wrote:
bombpersons Wrote:Uploaded. Contains both scripts.

http://www.massmirror.com/c296db18447745c2c53175d411d3f2b8.html
Awesome! I actually don't have a list offhand but I'll add some cards to one in a bit and test it out. You're well on your way to Programmer-God status. Just to be sure: 'newline' just means what it says, it's not some kind of computer formatting jargon, right?
Nope, newline just means have each word on a seperate line.

Also make sure you downloaded the latest version. I was lazy and forgot to test it the last time, and it wasn't properly searching the question side. I've fixed it now.

http://massmirror.com/b5564ce5910930f72cf2351d45746b40.html
So back on topic, should I keep and upload my list of words when I'm done? It won't be a complete list of words from an episode/film/etc because I'm only doing this for words I don't know, but if it becomes a collective, centralized list of say, all the words from Death Note anime, then it'd just be a matter of each user going to it and culling it according to their preferences. The difficulty of the latter aside, at least this way we can have new templates set up for new orders to complete Core 6000.

So it could be RTKO2001/smart.fm in the KO2001 order, then move on to C6K and do those sentences according to themes/tropes/favourite shows?

After finishing the basic grammar/RTK Lite/C2k in the kanji-focused way mentioned above, then one does C6k, focusing on doing it in an order based on vocabulary relevant to subs2srs shows. They DL a centralized c6k deck w/ all cards suspended, then grab a list: 'The Vocabulary of Death Note' to run your script, unsuspending the relevant cards, and studying it before watching the show (or in my case before studying it again as a subs2srs video deck and focusing on viewing comprehension in SRS bite-sized pieces).

If that's the case, perhaps making the script standalone/have a GUI or something would be best.


smart.fm corpus - bombpersons - 2009-09-10

ruiner Wrote:
bombpersons Wrote:
ruiner Wrote:Awesome! I actually don't have a list offhand but I'll add some cards to one in a bit and test it out. You're well on your way to Programmer-God status. Just to be sure: 'newline' just means what it says, it's not some kind of computer formatting jargon, right?
Nope, newline just means have each word on a seperate line.

Also make sure you downloaded the latest version. I was lazy and forgot to test it the last time, and it wasn't properly searching the question side. I've fixed it now.

http://massmirror.com/b5564ce5910930f72cf2351d45746b40.html
So back on topic, should I keep and upload my list of words when I'm done? It won't be a complete list of words from an episode/film/etc because I'm only doing this for words I don't know, but if it becomes a collective, centralized list of say, all the words from Death Note anime, then it'd just be a matter of each user going to it and culling it according to their preferences. The difficulty of the latter aside, at least this way we can have new templates set up for new orders to complete Core 6000.

So it could be RTKO2001/smart.fm in the KO2001 order, then move on to C6K and do those sentences according to themes/tropes/favourite shows?

After finishing the basic grammar/RTK Lite/C2k in the kanji-focused way mentioned above, then one does C6k, focusing on doing it in an order based on vocabulary relevant to subs2srs shows. They DL a centralized c6k deck w/ all cards suspended, then grab a list: 'The Vocabulary of Death Note' to run your script, unsuspending the relevant cards, and studying it before watching the show (or in my case before studying it again as a subs2srs video deck and focusing on viewing comprehension in SRS bite-sized pieces).

If that's the case, perhaps making the script standalone/have a GUI or something would be best.
Nice, that sounds like a great idea! I might get round to writing a GUI sometime (Or if someone else wants to, feel free to do so).


smart.fm corpus - ruiner - 2009-09-10

There ought to be a website or something that lets you use it as a front-end.

Still working on the conjugation issue sans mecab. I'm thinking if there's a way to look for the kanji/compound in conjunction with the English definition/translation, that could work. Or I guess I ought to make sure the list has the C6k versions I'm using.

For 'strays' that aren't in C6k I'm just adding single words to Anki, using the Breen/Japanesepod101 audio dictionary for audio. *After* checking the KO2001 deck w/ audio that I don't have because that would be wrong.


smart.fm corpus - dawhite - 2009-09-10

I still don't get quite what you guys are talking about but it sounds much more badass than what I was trying to make yesterday =P.

Just a word of warning, though -- if you guys are trying to mine sentences based on words (which I think you are), you're really going to have to get Mecab working. I wish you luck.


smart.fm corpus - bombpersons - 2009-09-10

ruiner Wrote:There ought to be a website or something that lets you use it as a front-end.

Still working on the conjugation issue sans mecab. I'm thinking if there's a way to look for the kanji/compound in conjunction with the English definition/translation, that could work. Or I guess I ought to make sure the list has the C6k versions I'm using.

For 'strays' that aren't in C6k I'm just adding single words to Anki, using the Breen/Japanesepod101 audio dictionary for audio. *After* checking the KO2001 deck w/ audio that I don't have because that would be wrong.
Ah yes, I'm working on this too! If we use Django (a python web framework), I'm sure it'll be pretty easy to add anki export function etc.

I have a basic site up. You can add sentences, and look at them (with media). Though it needs users, search, etc. I can upload the code to a github, if anyone wants to help out. I'm not too handy with Django so the more help the merrier =D


smart.fm corpus - mafried - 2009-09-10

ruiner Wrote:Still working on the conjugation issue sans mecab. I'm thinking if there's a way to look for the kanji/compound in conjunction with the English definition/translation, that could work.
There's a really simple API designed to do just that.

It's called mecab.


smart.fm corpus - ruiner - 2009-09-10

mafried Wrote:It's called mecab.
How bout I mecab you upside the head, foo.


smart.fm corpus - dawhite - 2009-09-20

So did anyone ever get Mecab working? Turns out that that whole thing I wanted originally would be incredibly useful after all...


smart.fm corpus - nest0r - 2010-11-16

So I tried to run the unsuspend.py again with my new computer, it almost works but I'm getting an error:

UnicodeEncodeError: 'charmap' codec can't encode characters in position etc.: character maps to <undefined>

This occurs whether the using -s and a list of terms, -f, and Japanese or English text.

Okay I was able to do some conversions in UltraEdit to get past that stage, but now I'm getting a 'Session object has no attribute 'clear' error. I give up again, I'll wait for some magic tool to scan a deck and unsuspend cards containing a list of words. ;p