Joined: Sep 2008
Posts: 1,674
Thanks:
1
Is there any way we could get some sort of program that you enter a particular word you're looking for or grammar point you want to study and then you parse a whole lot of text from say a web-page or a pdf through it and it extracts the sentences that match your criteria and compiles it to an anki deck?
I'm guessing it could be done and implemented much more usefully than my primitive concept of it.
But what do you guys think?
Joined: Oct 2008
Posts: 890
Thanks:
0
Sounds like a good idea, especially if you have a large portion of text to go through. I would imagine it would be quite easy to write a python script to do it, I'll have a go =D
The problem comes in seperating the sentences though, only when 。 or ? or ! appears?
Joined: Aug 2008
Posts: 3,289
Thanks:
0
Isn't this something Mecab can do?
Joined: Oct 2008
Posts: 890
Thanks:
0
What exactly is mecab? It has python bindings, but I can't seem to find what it does or find any documentation?
Joined: Aug 2008
Posts: 3,289
Thanks:
0
Mecab is a Japanese parser. It is for example used by Anki to generate readings.
Joined: Oct 2007
Posts: 4,582
Thanks:
0
Maybe creating a searchable online database of the combined subs2srs decks would be the best path--not necessarily multimedia itself, just the text for the initial search, then users can use the results to create/download the media files alongside the desired cards. Something like that. I'm not sure precisely how these corpus tools work, but between mecab and parts-of-speech tagging and the other metadata possibilities... It could even be collaboratively annotated, using Microformats and the Firefox Operator addon? Or whatever. Or, going back to previous conversations, would it be best to try and fit this into smart.fm's lists/API stuff?
Also, the future: video cards in Anki, created via subs2srs? ;p Or stick an SRS algorithm onto a videoplayer playlist.... Eating all that rice made with my neuro fuzzy has made my logic fuzzy too.
Speaking of annotation--I haven't yet delved into the possibilities of turning Flickr images and Youtube videos into picture 'dictionaries', as it were. Or just using the ones already made by .jp natives. Wonder if you can audio-annotate as well. Or perhaps incorporating Omnisio or something into Anki images could work? I guess Omnisio is video. But you know, whatever is used for image tagging on Flickr.
Edited: 2009-08-16, 3:30 pm
Joined: Jan 2008
Posts: 131
Thanks:
0
This was an idea of some times ago....think that you have like 200+ jap movie, with 200+ jap OCRed subtitles.
1.You write in a txt files the japanese words that you like
2. A program run through the media folder following the instructions of the text file and extracting the videos that matched the chosen words
3. The result will be a lot of videos from different movies in different contest of the words.
You know Spielberg called me the other day and suggest me thisXD
Joined: Oct 2008
Posts: 890
Thanks:
0
You can set the media URL so that when you are reviewing online, it can fetch the media from a seperate server. Quite useful if you have dropbox. You can just put your anki deck and media in the public folder and copy and paste the public URL as the media url and you can review your deck with media anywhere =)
Joined: Jan 2008
Posts: 131
Thanks:
0
maybe the most useful thing to do is collect all the anki deck with media from all the members of the site...who want to partecipate sure! and then do a massive 50.000+ sentences deck....even if there aren't all the media it will be phenomenal °_° for learning
It's only an idea though, a deck are like personal thing for the person who did it...hmm i don't know