kanji koohii FORUM
separating vocab fields in a TXT document, any faster ways? - Printable Version

+- kanji koohii FORUM (http://forum.koohii.com)
+-- Forum: Learning Japanese (http://forum.koohii.com/forum-4.html)
+--- Forum: Learning resources (http://forum.koohii.com/forum-9.html)
+--- Thread: separating vocab fields in a TXT document, any faster ways? (/thread-7581.html)



separating vocab fields in a TXT document, any faster ways? - Tolerence91 - 2011-04-02

I've been using Japanese on the ipod for a long time now, and well, I've been sending the vocab words to my computer via txt documents .I have been tabbing between each field to separate it in order for it import into anki. Is there an easier way to do this? is somehow importing the document into a spreadsheet easier or the same for separating fields? maybe highlighting rows and columns?


separating vocab fields in a TXT document, any faster ways? - SheekuAltair - 2011-04-02

I'm not sure what you mean. But when you open a "txt" file with Excel 2007 it gives you options to separate columns by tabs, spaces, commas, numbers or whatever you like. I organize it and then save it again as txt file.


separating vocab fields in a TXT document, any faster ways? - Asriel - 2011-04-02

I'm not sure what you mean by "separating fields." Do you mean it's exported as such:
漢字,ひらがな,definition
but you want it to be tabs instead of commas? If so, you could just do a find/replace if your text editor does that.
Otherwise, importing into a spreadsheet editor and then exporting as a tab-delimited file would be a lot easier than doing it manually


separating vocab fields in a TXT document, any faster ways? - Tolerence91 - 2011-04-02

I meant fields like kanji, reading, and meaning. This is an example
連盟(れんめい)league, federation, union, alliance
I usually tab them to look like this
連盟 (れんめい) league, federation, union, alliance
This is so Anki differentiates between the 3 fields and imports them correctly. Its takes a long time and I can't seem to get my Openoffice to import it into a spreadsheet


separating vocab fields in a TXT document, any faster ways? - Asriel - 2011-04-02

OK, I get a list like this:
理性的(りせいてき)
rational

大概(たいがい)
in general, almost all; mainly, mostly, most likely; moderately, suitably

Note the line breaks after the word and the definitions. You could do a search for ")\n" and replace with ")" to get rid of that (find out what your newline character is. Mine was \r).
But from your example it looks like you've got that solved. Also replace "\n\n" with "\n" to not have empty lines.
Now save it as whatever.csv

Open it up in OpenOffice, and under Separator Options, it will probably be marked "Separated By"
Under that, you check "Other" and put () in the box (that's the Japanese parentheses)
The preview should give you a spreadsheet with the 3 columns kanji/reading/meaning. Then you can export it however you need to.

Note that this only works for words that have kanji. I'd suggest a separate "Hiragana Words" wordlist so you can import those separately and not have to deal with individual cases in the big spreadsheet.


separating vocab fields in a TXT document, any faster ways? - SheekuAltair - 2011-04-02

I'd import that from a txt file (important) to Excel and then check "spaces" as seperator.

連盟(れんめい)league, federation, union, alliance

it would look like this

連盟 (れんめい) league, federation, union, alliance
cell1 cell2 cell3 cell4 cell5 cell6

Then I'd merge all cells from cell3 to cell6, into the cell3 also called column C. Either by copying into another program(Notepad, Word) and paste it again in Column C or Cell 3. The point is to merge all 3-6 cells.

Or just do this instead: Choose an empty cell, in this case cell 7. Enter the formula =C1&" "&D1&" "&E1&" "&F1&" "&G1. The result looks like this, after deleting above cell 4-6, with 3 columns:

連盟 (れんめい) league, federation, union, alliance
cell1 cell2 cell3
You can do the same for the rest of the rows. Easiest would be to drag down the cell3, and Excel will automatically know what you mean (instead of pasting in the above formula multiple times)


separating vocab fields in a TXT document, any faster ways? - nest0r - 2011-04-02

Looks like you could just do a find/replace on the ( and ) w/ a tab marker like ^t or w/ a forward slash instead of ^, or a semicolon replacement, right?

In Notepad++ (free) this would be Ctrl+H and then Replace All? (Oh and if it's not on the same line like Asriel pointed out, add in the newline symbol.)


separating vocab fields in a TXT document, any faster ways? - Asriel - 2011-04-02

nest0r Wrote:Looks like you could just do a find/replace on the ( and ) w/ a tab marker like ^t or w/ a forward slash instead of ^, or a semicolon replacement, right?
D'oh! That makes so much sense. Does the same thing, except faster and straight-to-the-point.

1. Get everything on one line
2. Do what nest0r says

and you win Smile

Doing hiragana lists will be slightly different, but shouldn't be too bad. Just replace the newline (excluding double newlines = new phrase) with a tab or whatever


separating vocab fields in a TXT document, any faster ways? - Tolerence91 - 2011-04-03

So basically what I do is stick with a TXT. file, CRTL+H replace () with ^t to create a tab automatically if I follow nest0r's method?

@asriel what do you mean everything on one line? for what purpose?

I apologize, I've never dealt with Spreadsheets in my life, so anything to deal with them has been problematic from finding out what format will display the characters correctly, to what cells were or what ^t and w/ means @_@


separating vocab fields in a TXT document, any faster ways? - Asriel - 2011-04-03

Well, in your example, you have:
連盟(れんめい)league, federation, union, alliance

But if I did this on my machine, it'd end up like:
連盟(れんめい)
league, federation, union, alliance

See how the definition is on the 2nd line? As long as you're doing kanji words, you can replace ")\n" with just ")" in order to remove the line breaks between the hiragana and definition.*

But yeah, once you have each kanji/hiragana/definition on its own line, you can just replace all the ( and ) with tabs** and have a file you should be able to importable.

*I don't know what your newline or tab characters are represented by. My newline is \r, but I've had files that use \n. You can probably just copy/paste the part where it actually goes from one line to another. You won't be able to see anything selected, but I've gotten it to work successfully before.
**Same goes for tab. Could be \t, ^t, I don't know. You can also copy/paste tabs as well.


separating vocab fields in a TXT document, any faster ways? - Tolerence91 - 2011-04-03

thanks guys! I've taken advice from everyone
@asriel understood. thanks a lot for the explanations my only problem now, is i've gone ahead and created the tab spaces for each field using nest0r's spacing. the import isn't working right though. after i finish it looks like this

初対面 しょたいめん first meeting,first interview with

実際 じっさい practicality, practical; reality,actuality, actual conditions; bhutakoti (limit of reality)

初対面 しょたいめん first meeting, first interview with

` チンピラ young hoodlum, small-time yakuza, delinquent boy, delinquent girl, hooligan, punk
a lot of the cards are missing fields for some reason. I've made sure there was a tab and made a "`" for cards that are missing kanji out of personal perference.


separating vocab fields in a TXT document, any faster ways? - Asriel - 2011-04-03

Hmm...Well it seems you've got the spreadsheet-creation working just fine.

The only thing I can think of off the top of my head is that the number of fields in your Anki deck don't match the 3 fields you put in your spreadsheet? Is Anki looking to split up by commas?

Unfortunately, right now I can't think of what the problem actually is, but as far as I can see, the spreadsheet is OK...