Joined: Jan 2008
Posts: 419
Thanks:
0
I m a bit (OK very) rusty on my python and I was wondering if someone knows a software or has some code which enables to cut a text into sentences
you know you would point the source file , provide one or several divider such as 」 or ?!。 ,input the name of the destination file and it would create a text file with all the sentence of the text .
thanks in advance
Edited: 2010-03-05, 6:21 am
Joined: Feb 2008
Posts: 1,322
Thanks:
0
I'm not too experienced in python, but it seems like what you're talking about is just getting the input, giving it a custom delimiter to parse by, and then outputting a text file with each sentence in a separate line?
Sounds pretty simple to make, but I'm mot too good in Python.
I'm going to go see what I can whip up, but if anyone responds quicker than me with something better, feel free! I can't guarantee anything I come up with will be too wonderful
Joined: Feb 2008
Posts: 1,322
Thanks:
0
lol it seems he's right, except that you'll probably have to look online how to do it. The help don't tell you a whole lot...
Joined: Jan 2008
Posts: 419
Thanks:
0
I know it s some basic stuff and I m utterly ashamed of even THINKING about asking that...But I didn t do any python for about 8 months (shame on me again:first rule of programmation is that you should code something everyday....) and right now I try to spare as much time as possible as I have 3 big project to carry out while maintaining my japanese learning in the process. Those few line of code would dramatically improve my flow .
Edited: 2010-03-05, 8:30 am
Joined: Apr 2009
Posts: 723
Thanks:
0
Sounds like that sort of functionality could create a sort of text2srs program similar to subs2srs. If it could pull off filtering like subs2srs can (only sentences with kanji, or above a certain length, or only those containing a certain word, or no sentences containing a certain word) you could use it to split up an entire txt file (like the ones nest0r didn't post) into individual cards for Anki. That would be amazing...or am I just getting overexcited?
Edited: 2010-03-05, 7:38 pm
Joined: Aug 2006
Posts: 1,022
Thanks:
1
In Python you could use regular expression's 'split' function.
import re
result = re.split(u'[...]', string_to_split)
Where ... are the characters you want to split by.
Joined: Apr 2009
Posts: 723
Thanks:
0
I managed to get text2srs working (I had to resave my text file as UTF8 and then it worked fine). Its pretty cool actually, it basically turns a raw txt file into an anki importable output file. Is that basically what you wanted ghinzdra?
Hmmm...I wonder...if I were to do this on all of those files I didn't get from nest0r...that would end up being one kickass corpus deck...
Edited: 2010-03-06, 1:26 am
Joined: Jan 2008
Posts: 419
Thanks:
0
Well I would prefer a text file to check the result up and modify if necessary in a spreadsheet . I m not too much an advocate of modifying in anki . It turns out to be a real pain in the ass .
But yeah it s that .
Could you give me a tutorial for text2srs to test it ? If I run into a IDLE module I have "No input file specified! Use the -i option" error message . How could I specify a text file since it closes altogether the python module ?
Joined: Apr 2009
Posts: 723
Thanks:
0
I'm on a Mac and typed this into the terminal:
python /Users/David/Downloads/text2srs.py -i /U/David/Desktop/zero_no_tsukaima.txt
and it output the results to a file called output.txt which I then imported into Anki. I had to make sure the input file was saved as UTF8 first, though.
I don't know python at all and I don't know what an IDLE module is, so apart from the fact that I typed the above command and it worked, I can't really help you sorry.