kanji koohii FORUM
smart.fm corpus - Printable Version

+- kanji koohii FORUM (http://forum.koohii.com)
+-- Forum: Learning Japanese (http://forum.koohii.com/forum-4.html)
+--- Forum: Learning resources (http://forum.koohii.com/forum-9.html)
+--- Thread: smart.fm corpus (/thread-3959.html)

Pages: 1 2 3


smart.fm corpus - ruiner - 2009-09-09

So, those of you who are savvy about this sort of thing: If you wanted to take the vocabulary from a show or movie and create a deck where the the smart.fm sentences are only the ones w/ this vocabulary, how would you go about automating this process? Same with KO2001, but mostly smart.fm. The idea being to convert those sentences with simple, consistent grammar into support for the stuff you're subs2srsing so you get the vocab/readings out of the way with the smart.fm sentences and can focus on listening/parsing the new context-rich sentences from actual media.

What I've been doing is I make a list of new words in a given subs2srs deck--usually the ones w/ kanji esp. kanji I don't know readings for, I search for sentences with this word that have a minimum of new information in the iknow/ko2001 decks, then I study that sentence first.

This is good enough for me as an individual, but if I were to try and come up w/ something for posterity/mass use, seems like something different would be better.

You could say I'm looking for viable means to accomplish what we've been discussing since these resources were released--I know Nukemarine has been adamant about using smart.fm's Core 6000 as a supplementary vocab corpus for a while.


smart.fm corpus - bombpersons - 2009-09-09

So what you wan't basically is a way to export sentences with certain words (in a list) from a smart.fm deck? You could write a pretty simple python script to do this.


smart.fm corpus - dawhite - 2009-09-09

I'd just like to second how sweet this would be. Also, as I mentioned yesterday (http://forum.koohii.com/showthread.php?tid=3964), a slight modification that let you go the other way (sentence to vocab) would also rock my socks numerous times.

I mean, this would clearly be exceptionally simple to do if you had sentence cards, vocab cards, and already-known vocab cards in dictionaries. I just don't know the API and I am too stressed trying to power through Tae Kim to figure it out. I would be glad to write the input dictionaries -> output dictionary part of the algorithm if that would help, though. Could someone maybe hack this together with a preexisting script they've written that loads and stores decks to dictionaries?

EDIT: If this script of yours incorporated Mecab, well, so much the better.


smart.fm corpus - bombpersons - 2009-09-09

Made a script to do what I think you wanted. Download here

It searches through the cards in an anki deck and finds cards with question that have certain words (determined in a file or by command line arguments) and then exports them to a seperate deck.

@dawhite:
Using mecab would be cool, but I have no idea how to use it =(

*Edit* Ah, stupid mistake, searching in the answer field rather than the question. I'll upload a fixed version.


smart.fm corpus - ruiner - 2009-09-09

I feel like it's definitely something really easy, I just lack the scripting skills.

I guess another way to do it would be to create a list of words at smart.fm from programs, and add the sentences from C6K, so you'd have a Death Note Episode 01 list of all the words from that episode in C6K sentences, then import them using the smart.fm import plugin in Anki?

Personally I already have the iKnow c6k deck suspended and unsuspend as I go along, so I'd rather be able to export such a list and then have a script that runs through and toggles unsuspend for the sentences in the Anki deck that contain those words (or compounds?).


smart.fm corpus - ruiner - 2009-09-09

bombpersons Wrote:Made a script to do what I think you wanted. Download here

It searches through the cards in an anki deck and finds cards with answers that have certain words (determined in a file or by command line arguments) and then exports them to a seperate deck.

@dawhite:
Using mecab would be cool, but I have no idea how to use it =(
Oh cool. When you say 'answer' do you mean the target words have to be on the Answer side? Because all my kanji for the C6k deck are on the Question side. I guess I can modify deck properties, though. Is there a way to modify the script so it just 'unsuspends' the cards in the target deck?

Mecab could be interesting--or something that can be used to parse sentences into words to create a list?

If we start making lists of words (where you can select them and export/DL them from a central point) then we can simplify the process further, DL it and run it through the script bombpersons wrote. Then I guess it'd be very cloudlike if there's also a shared .media folder (Core 6000 .media folder?) offline somewhere like dropbox/drop.io that Anki is linked to via deck properties Media URL form?


smart.fm corpus - bombpersons - 2009-09-09

ruiner Wrote:
bombpersons Wrote:Made a script to do what I think you wanted. Download here

It searches through the cards in an anki deck and finds cards with answers that have certain words (determined in a file or by command line arguments) and then exports them to a seperate deck.

@dawhite:
Using mecab would be cool, but I have no idea how to use it =(
Oh cool. When you say 'answer' do you mean the target words have to be on the Answer side? Because all my kanji for the C6k deck are on the Question side. I guess I can modify deck properties, though. Is there a way to modify the script so it just 'unsuspends' the cards in the target deck?

Mecab could be interesting--or something that can be used to parse sentences into words to create a list?

If we start making lists of words (where you can select them and export/DL them from a central point) then we can simplify the process further, DL it and run it through the script bombpersons wrote. Then I guess it'd be very cloudlike if there's also a shared .media folder (Core 6000 .media folder?) offline somewhere like dropbox/drop.io that Anki is linked to via deck properties Media URL form?
Ah, sorry I meant to say question instead of answer. I finished an unsuspend script and am uploading now =)


smart.fm corpus - bombpersons - 2009-09-09

Uploaded. Contains both scripts.

http://massmirror.com/b5564ce5910930f72cf2351d45746b40.html

*Edit* Gah, sorry that didn't fix it either, uploading properly fixed versions..

*Edit2* Right, updated, now working properly!


smart.fm corpus - dawhite - 2009-09-09

Very cool, bombpersons. This is allllmost perfect for the sentences -> vocab thing. Do you know how to access whether or not cards are suspended via the API?

Because, if so, we could just hack this script slightly as so: it goes through your vocab file and exports all of the questions as a text file. Then, it uses that as the "search" file and goes through ONLY THE UNSUSPENDED cards in your "sentences" deck. Every time it finds a sentence containing one of the vocab words in the search file, it marks down the ID of that vocab word. Then it goes back to the vocab deck and unsuspends all of the vocab cards whose IDs had been marked.

We might get some false positives from words that are included in other words (that's where Mecab would come in), but I don't think there would be any false negatives and it would be something approximating fantastic.

So I guess my question is... do you know how to access whether or not cards are suspended via the API?


smart.fm corpus - dawhite - 2009-09-09

Woah, that was fast. Now the script is... three quarters of the way to being perfect for me. You, bombpersons, are a Python God.


smart.fm corpus - ruiner - 2009-09-09

bombpersons Wrote:Uploaded. Contains both scripts.

http://www.massmirror.com/c296db18447745c2c53175d411d3f2b8.html
Awesome! I actually don't have a list offhand but I'll add some cards to one in a bit and test it out. You're well on your way to Programmer-God status. Just to be sure: 'newline' just means what it says, it's not some kind of computer formatting jargon, right?


smart.fm corpus - bombpersons - 2009-09-09

ruiner Wrote:
bombpersons Wrote:Uploaded. Contains both scripts.

http://www.massmirror.com/c296db18447745c2c53175d411d3f2b8.html
Awesome! I actually don't have a list offhand but I'll add some cards to one in a bit and test it out. You're well on your way to Programmer-God status. Just to be sure: 'newline' just means what it says, it's not some kind of computer formatting jargon, right?
Nope, newline just means have each word on a seperate line.

Also make sure you downloaded the latest version. I was lazy and forgot to test it the last time, and it wasn't properly searching the question side. I've fixed it now.

http://massmirror.com/b5564ce5910930f72cf2351d45746b40.html


smart.fm corpus - dawhite - 2009-09-09

Yay for fixes.

Also, as relates to the auto-search-generation thing I mentioned above, I just realized it probably only makes sense to create a search file with your SUSPENDED vocab, the reason being that you'll never need to unsuspend vocab that's already unsuspended...


smart.fm corpus - bombpersons - 2009-09-09

dawhite Wrote:Yay for fixes.

Also, as relates to the auto-search-generation thing I mentioned above, I just realized it probably only makes sense to create a search file with your SUSPENDED vocab, the reason being that you'll never need to unsuspend vocab that's already unsuspended...
I added a suspend script as well so you can do that too =D

http://massmirror.com/b5564ce5910930f72cf2351d45746b40.html


smart.fm corpus - bombpersons - 2009-09-09

IceCream Wrote:I'm still looking for a way for anki to recognise the kanji in the smartfm sentences, if anyone knows how i can get it to do this, please let me know!
What do you mean? Can't search?


smart.fm corpus - dawhite - 2009-09-09

Er... there may be a chance I'm missing something here. I just took a look at suspend.py and it looks like what it does is suspend all terms that match the search terms. Is that right?

What I was trying to express before, in my characteristically unclear manner, was that I wanted to be able to do the following:

1 - Open your anki vocab deck
2 - Get a dictionary of the cards in that deck THAT ARE ALREADY SUSPENDED
3 - Export this to a text file
4 - Use this text file as the "search" file on your sentence deck. However, what gets outputted is not a list of the sentences that were matched, but rather a list of the (suspended) vocab words that matched sentences.
5 - Now unsuspend those vocab words in your vocab deck.

I'm guessing this would be a pretty easy mod. The only thing I am missing is how to tell if a card is suspended or not. Is there documentation for this somewhere, incidentally?


smart.fm corpus - bombpersons - 2009-09-09

dawhite Wrote:Er... there may be a chance I'm missing something here. I just took a look at suspend.py and it looks like what it does is suspend all terms that match the search terms. Is that right?

What I was trying to express before, in my characteristically unclear manner, was that I wanted to be able to do the following:

1 - Open your anki vocab deck
2 - Get a dictionary of the cards in that deck THAT ARE ALREADY SUSPENDED
3 - Export this to a text file
4 - Use this text file as the "search" file on your sentence deck. However, what gets outputted is not a list of the sentences that were matched, but rather a list of the (suspended) vocab words that matched sentences.
5 - Now unsuspend those vocab words in your vocab deck.

I'm guessing this would be a pretty easy mod. The only thing I am missing is how to tell if a card is suspended or not. Is there documentation for this somewhere, incidentally?
Huh? I'm not following you. Do you mean to find all the vocab that are in suspended sentences?


smart.fm corpus - bombpersons - 2009-09-09

IceCream Wrote:no, i mean, like, you know the bit in anki (i think its part of the japanese plugin) that gives you the kanji figures for your deck, like which ones you've seen and not seen yet? Well, when i dl smartfm sentences, the kanji in them isn't recognised in the kanji count, so i don't know how many i've actually learnt... i dunno if theres anything i can do to change it?
Mh.. I don't know... Probably something to do with fields the plugin looks at.


smart.fm corpus - dawhite - 2009-09-09

bombpersons Wrote:
dawhite Wrote:Er... there may be a chance I'm missing something here. I just took a look at suspend.py and it looks like what it does is suspend all terms that match the search terms. Is that right?

What I was trying to express before, in my characteristically unclear manner, was that I wanted to be able to do the following:

1 - Open your anki vocab deck
2 - Get a dictionary of the cards in that deck THAT ARE ALREADY SUSPENDED
3 - Export this to a text file
4 - Use this text file as the "search" file on your sentence deck. However, what gets outputted is not a list of the sentences that were matched, but rather a list of the (suspended) vocab words that matched sentences.
5 - Now unsuspend those vocab words in your vocab deck.

I'm guessing this would be a pretty easy mod. The only thing I am missing is how to tell if a card is suspended or not. Is there documentation for this somewhere, incidentally?
Huh? I'm not following you. Do you mean to find all the vocab that are in suspended sentences?
Haha, sorry, OK. I'm sort of referring to other threads and old posts and all kind of things so it's probably impossible to tell what I'm going for. So I will provide all the context, which is apparently important in this horrible language anyway.

So, what I would like to do is, given a sentence deck and a vocab deck, unsuspend all the words in the vocab deck that are in unsuspended sentences in the sentence deck.

So, for example, I have the Core 2000 deck and Nukemarine's Tae Kim deck. Let's suppose I've unsuspended the first ten sentences in the Tae Kim deck and all of the Core 2000 deck is suspended. What I'd like to do is unsuspend all of the words in the Core 2000 deck that are featured in one of those 10 unsuspended sentences in Tae Kim.

The algorithm I suggested above would be to just look at all the suspended cards in the Core 2000 deck and see if any of them are in one of the unsuspended sentences in the Tae Kim deck. If any of them are, then it unsuspends them in the Core 2000 deck.

The way I suggested doing it before just leverages what you've already done: it goes through the Core 2000 deck and finds all the suspended words (since the goal of this script is to unsuspend words, there's no need to look at words that are already unsuspended).

Then it searches for those words in your sentence deck simply by applying the script you wrote before with an input file generated from the list of suspended words in the vocab deck. The difference is that the OUTPUT of that search is not a list of the matching SENTENCES, but rather a list of the matching VOCAB words. So if, for example, gakusei (I don't know how to type kana yet so please do not hate me) is found to be in "Jimu ga gakusei", it would add "gakusei" to the output file.

Then THIS output file is used as the input file to the script you wrote, this time applied to your VOCAB deck and completely unmodified, so that it simply unsuspends those vocab words that were found to be in the unsuspended sentences in the sentence deck.

Does that clear things up or make it worse?


smart.fm corpus - ruiner - 2009-09-09

Ahem, okay, how do I uh, make it work? I installed Python ;p and I got as far as something about an sqlmodule that I don't have? Can't I push a button and make everything work?


smart.fm corpus - ruiner - 2009-09-09

dawhite Wrote:
bombpersons Wrote:
dawhite Wrote:Er... there may be a chance I'm missing something here. I just took a look at suspend.py and it looks like what it does is suspend all terms that match the search terms. Is that right?

What I was trying to express before, in my characteristically unclear manner, was that I wanted to be able to do the following:

1 - Open your anki vocab deck
2 - Get a dictionary of the cards in that deck THAT ARE ALREADY SUSPENDED
3 - Export this to a text file
4 - Use this text file as the "search" file on your sentence deck. However, what gets outputted is not a list of the sentences that were matched, but rather a list of the (suspended) vocab words that matched sentences.
5 - Now unsuspend those vocab words in your vocab deck.

I'm guessing this would be a pretty easy mod. The only thing I am missing is how to tell if a card is suspended or not. Is there documentation for this somewhere, incidentally?
Huh? I'm not following you. Do you mean to find all the vocab that are in suspended sentences?
Haha, sorry, OK. I'm sort of referring to other threads and old posts and all kind of things so it's probably impossible to tell what I'm going for. So I will provide all the context, which is apparently important in this horrible language anyway.

So, what I would like to do is, given a sentence deck and a vocab deck, unsuspend all the words in the vocab deck that are in unsuspended sentences in the sentence deck.

So, for example, I have the Core 2000 deck and Nukemarine's Tae Kim deck. Let's suppose I've unsuspended the first ten sentences in the Tae Kim deck and all of the Core 2000 deck is suspended. What I'd like to do is unsuspend all of the words in the Core 2000 deck that are featured in one of those 10 unsuspended sentences in Tae Kim.

The algorithm I suggested above would be to just look at all the suspended cards in the Core 2000 deck and see if any of them are in one of the unsuspended sentences in the Tae Kim deck. If any of them are, then it unsuspends them in the Core 2000 deck.

The way I suggested doing it before just leverages what you've already done: it goes through the Core 2000 deck and finds all the suspended words (since the goal of this script is to unsuspend words, there's no need to look at words that are already unsuspended).

Then it searches for those words in your sentence deck simply by applying the script you wrote before with an input file generated from the list of suspended words in the vocab deck. The difference is that the OUTPUT of that search is not a list of the matching SENTENCES, but rather a list of the matching VOCAB words. So if, for example, gakusei (I don't know how to type kana yet so please do not hate me) is found to be in "Jimu ga gakusei", it would add "gakusei" to the output file.

Then THIS output file is used as the input file to the script you wrote, this time applied to your VOCAB deck and completely unmodified, so that it simply unsuspends those vocab words that were found to be in the unsuspended sentences in the sentence deck.

Does that clear things up or make it worse?
What is a vocab deck?


smart.fm corpus - dawhite - 2009-09-09

It is a rare species of antelope.

Alternately, it can be an anki deck containing cards of the following format:
Front - word in kanji and kana
Back - reading and definition

You can import these from Smart.fm using the importer pretty easily


smart.fm corpus - bombpersons - 2009-09-09

Ah, I understand now =D It would be possible, but the only problem would be determining where words start and end. It would be very difficult for the script to find out what the seperate words in the sentence are. I guess I could use mecab, but I have no clue how to use it =(


smart.fm corpus - ruiner - 2009-09-09

dawhite Wrote:It is a rare species of antelope.

Alternately, it can be an anki deck containing cards of the following format:
Front - word in kanji and kana
Back - reading and definition

You can import these from Smart.fm using the importer pretty easily
Ohh, your c2k deck is vocabulary only? Can't you just make a list of words you don't know when you're doing Tae Kim sentences and then tell the script bombpersons wrote to unsuspend the cards in the C2k deck that have those listed words? Not sure why you want to use Tae Kim as a guide for vocab anyway, best to just use grammar cards for grammar. Don't need to memorize the words in grammar sentences as long as you get the gist from the translation enough to understand the points.


smart.fm corpus - bombpersons - 2009-09-09

ruiner Wrote:Ahem, okay, how do I uh, make it work? I installed Python ;p and I got as far as something about an sqlmodule that I don't have? Can't I push a button and make everything work?
You need to download and install sqlalchemy, http://www.sqlalchemy.org/download.html (this is what anki uses to access the sqlite anki decks).

If you have any trouble, I can guide you through it if you like.