kanji koohii FORUM
Cut a text into sentences (python code or software) - Printable Version

+- kanji koohii FORUM (http://forum.koohii.com)
+-- Forum: Learning Japanese (http://forum.koohii.com/forum-4.html)
+--- Forum: Learning resources (http://forum.koohii.com/forum-9.html)
+--- Thread: Cut a text into sentences (python code or software) (/thread-5137.html)



Cut a text into sentences (python code or software) - ghinzdra - 2010-03-05

I m a bit (OK very) rusty on my python and I was wondering if someone knows a software or has some code which enables to cut a text into sentences
you know you would point the source file , provide one or several divider such as 」 or ?!。 ,input the name of the destination file and it would create a text file with all the sentence of the text .

thanks in advance


Cut a text into sentences (python code or software) - Asriel - 2010-03-05

I'm not too experienced in python, but it seems like what you're talking about is just getting the input, giving it a custom delimiter to parse by, and then outputting a text file with each sentence in a separate line?

Sounds pretty simple to make, but I'm mot too good in Python.
I'm going to go see what I can whip up, but if anyone responds quicker than me with something better, feel free! I can't guarantee anything I come up with will be too wonderful


Cut a text into sentences (python code or software) - trusmis - 2010-03-05

It looks you are looking for the "cut" utility that every unix (and Linux) have since 20 years ago.
Just install linux or Bash in windows or something like that and type "cut --help" Tongue


Cut a text into sentences (python code or software) - Asriel - 2010-03-05

lol it seems he's right, except that you'll probably have to look online how to do it. The help don't tell you a whole lot...


Cut a text into sentences (python code or software) - ghinzdra - 2010-03-05

I know it s some basic stuff and I m utterly ashamed of even THINKING about asking that...But I didn t do any python for about 8 months (shame on me again:first rule of programmation is that you should code something everyday....) and right now I try to spare as much time as possible as I have 3 big project to carry out while maintaining my japanese learning in the process. Those few line of code would dramatically improve my flow .


Cut a text into sentences (python code or software) - blackmacros - 2010-03-05

Sounds like that sort of functionality could create a sort of text2srs program similar to subs2srs. If it could pull off filtering like subs2srs can (only sentences with kanji, or above a certain length, or only those containing a certain word, or no sentences containing a certain word) you could use it to split up an entire txt file (like the ones nest0r didn't post) into individual cards for Anki. That would be amazing...or am I just getting overexcited?


Cut a text into sentences (python code or software) - nest0r - 2010-03-05

Did someone say text2srs? http://forum.koohii.com/showthread.php?pid=70023#pid70023

I definitely think that if we had didn't have don't know what you're talking about raw Japanese texts and different ways to manipulate them, that'd not be uncoolly cool.

Related: smart.fm corpus? http://forum.koohii.com/showthread.php?tid=3959 - This thread came to mind back when I first read mezbup's text2srs post, I think. Kind of, or maybe it was another thread. Back when bombpersons was active, we had some good programming ideas. Or I fantasized and they came up with cool scripts I didn't know how to run.


Cut a text into sentences (python code or software) - blackmacros - 2010-03-05

I checked out text2srs but I couldn't get it to work with Zero no Tsukaima Sad I kept getting an error related to Unicode and UTF8 whenever I tried to execute the program


Cut a text into sentences (python code or software) - vosmiura - 2010-03-05

In Python you could use regular expression's 'split' function.

import re
result = re.split(u'[...]', string_to_split)

Where ... are the characters you want to split by.


Cut a text into sentences (python code or software) - Asriel - 2010-03-05

nest0r Wrote:...Related: smart.fm corpus?...
Currently working on a -somewhat- user friendly way to use the smart.fm API from the command line. So far I've just got 4 functions working: add and delete for vocab and sentences.

But it shouldn't be hard at all to add all the other functions in the API as well. They're all incredibly similar.

I'm currently just uploading the Kore series up to smart.fm using KO2001 order. This is all I have so far:
http://smart.fm/goals/252522

But creating items, creating sentences, it's all very simple to do. (ie. smart.fm lists of subs2srs decks, etc)


Cut a text into sentences (python code or software) - nest0r - 2010-03-05

Asriel Wrote:
nest0r Wrote:...Related: smart.fm corpus?...
Currently working on a -somewhat- user friendly way to use the smart.fm API from the command line. So far I've just got 4 functions working: add and delete for vocab and sentences.

But it shouldn't be hard at all to add all the other functions in the API as well. They're all incredibly similar.

I'm currently just uploading the Kore series up to smart.fm using KO2001 order. This is all I have so far:
http://smart.fm/goals/252522

But creating items, creating sentences, it's all very simple to do. (ie. smart.fm lists of subs2srs decks, etc)
That's cool, even though that particular 'smart.fm corpus' was about something else, I'm definitely interested in making it easy to upload Anki stuff to smart.fm, using that site as a kind of hub in a more expansive way than its system of themed lists and suchlike. This was something we were talking about from the first page of the subs2srs thread, but it never really took off, mostly I think because we didn't have the same amount of resources as we do now...

Also: http://forum.koohii.com/showthread.php?pid=66682#pid66682 - Bombpersons set up a site or something, but I don't suppose it took off. Found it: http://forum.koohii.com/showthread.php?tid=4016 (This was a way, methinks, of dealing with copyright issues... )


Cut a text into sentences (python code or software) - Asriel - 2010-03-05

Well, the whole subs2srs corpus thing was just an idea...not necessarily what this had to be used for.

Basically all I'm doing is taking the things that are useful from http://developer.smart.fm/docs and making a c program (because for some reason, I couldn't get libcurl to work with Java) to implement them.
What it'll be like is something the lines of:

smartfm username:password api_key file.txt [OPTIONS]

I'm not sure how the regulations are, but either one USER can make 5,000 calls a day, or one DEVELOPER can make 5,000 calls a day. So api_key may/may not be necessary. Hopefully not, because then users can use it without having to sign up as a developer...

Once that's done, shouldn't be too hard to make a GUI...or essentially, ANY smart.fm program.


Cut a text into sentences (python code or software) - blackmacros - 2010-03-06

I managed to get text2srs working (I had to resave my text file as UTF8 and then it worked fine). Its pretty cool actually, it basically turns a raw txt file into an anki importable output file. Is that basically what you wanted ghinzdra?

Hmmm...I wonder...if I were to do this on all of those files I didn't get from nest0r...that would end up being one kickass corpus deck...


Cut a text into sentences (python code or software) - ghinzdra - 2010-03-06

Well I would prefer a text file to check the result up and modify if necessary in a spreadsheet . I m not too much an advocate of modifying in anki . It turns out to be a real pain in the ass .
But yeah it s that .

Could you give me a tutorial for text2srs to test it ? If I run into a IDLE module I have "No input file specified! Use the -i option" error message . How could I specify a text file since it closes altogether the python module ?


Cut a text into sentences (python code or software) - blackmacros - 2010-03-06

I'm on a Mac and typed this into the terminal:

python /Users/David/Downloads/text2srs.py -i /U/David/Desktop/zero_no_tsukaima.txt


and it output the results to a file called output.txt which I then imported into Anki. I had to make sure the input file was saved as UTF8 first, though.

I don't know python at all and I don't know what an IDLE module is, so apart from the fact that I typed the above command and it worked, I can't really help you sorry.