![]() |
|
Little Charo bilingual scripts - Printable Version +- kanji koohii FORUM (http://forum.koohii.com) +-- Forum: Learning Japanese (http://forum.koohii.com/forum-4.html) +--- Forum: Group study (http://forum.koohii.com/forum-15.html) +--- Thread: Little Charo bilingual scripts (/thread-7326.html) |
Little Charo bilingual scripts - Zarxrax - 2011-02-18 April 17, 2015 Update: The latest Spreadsheet will be located here: https://docs.google.com/spreadsheets/d/1ySDULrBBxXFlzOOgoMw3BbERiYW5lhXSdFqWP_0Yixc/edit?usp=sharing I will be posting a new chapter on it every week until complete. Original post follows below. After much effort, I was finally able to write a program to extract the scripts from the game Little Charo for Nintendo DS. For those unfamiliar with the game, its sort of like a Bilingual Visual novel designed to help teach english to Japanese kids. The story is quite long and fairly interesting. (so far the playthrough is taking me about 1 hour per chapter, and there are about 30 chapters) But the most important aspect of it is that the language is very basic. If you have been searching for some native material at an "easy" level, then this game is for you! The sheer amount of text in the game probably makes it good for intermediate learners as well. There are a lot of good sentences here for putting into anki. However, the script is currently split up into 375 different parts, and there is no telling what each part actually contains. SO, in order for all of this data to become useful, It needs to be organized and matched to the corresponding chapters from the game, and finally added to the spreadsheet. Any help with this is very appreciated! Scripts are here: http://dl.dropbox.com/u/10652649/Charo.zip The vast majority of files seem to contain text related to gameplay events, and are not terribly important. File 1 = some key words that are used in the game File 2 = Multiple Choice English answers File 86 = Chapter 1 main script part 1 File 90 = Chapter 1 main script part 2 File 93 = Chapter 2 main script Files 247-248 = Chapter 1 cut-scenes File 249-251 = Chapter 2 cut-scenes Files 314-339 = Japanese explanation of English phrases + Many example sentences Spreadsheets containing the completed scripts can be found here: https://docs.google.com/spreadsheets/d/1ySDULrBBxXFlzOOgoMw3BbERiYW5lhXSdFqWP_0Yixc/edit?usp=sharing I will be adding sentences to anki and studying them 1 chapter at a time. So, it might take me a long time to add spreadsheets for the entire game. If I'm too slow for you, feel free to go ahead and process the further chapters yourself ^_^ Full instructions can be found in my next post. Little Charo bilingual scripts - furrykef - 2011-02-19 I've concatenated the text files, which might make it easier to weed out all the junk data and focus on the text. (The concatenated files are here.) I did this using the following bash script: Code: #!/bin/bashThere are some large chunks of data that have small, well-hidden bits of English (or Japanese), though, so you have to be REALLY careful before just deleting huge blobs of data. For the English text, a script can probably be written that filters out junk data. For the Japanese text, this probably isn't so easy. But doing it just on the English text could still help us identify which bits of text are particularly interesting. (EDIT: I've edited the script so that there are blank lines around the filenames in the output files now. I've updated the upload to reflect this as well.) Little Charo bilingual scripts - Zarxrax - 2011-02-19 I'm not sure if its good to have them all joined together. It could make searching through the whole thing simpler, but then its all just 1 giant stream of text, so it could be harder to manually work through it. I was looking at the files in Notepad++ for windows, and it displays the names of the character codes for a lot of the "junk" characters. For instance, null characters appear at line breaks, so it would probably be good to convert those to actual line breaks, which would make the text more readable. (EDIT: some info moved up to the first post) Directions for adding a new chapter to the spreadsheet: I'm using Notepad++ for windows, which can make the whole process quite simple. - First you need to actually find the files that you need. So for example, if you want to get the script for chapter 2 of the game, first load up the game and go to chapter 2, and look at some of the text. Then search the files for some of the text that you read in the game. This should show you which file contains the script for that chapter. - There will be lots of crap in the files at the beginning. Just delete it. - The files contain line breaks that should be replaced by spaces. So do a search and replace over the files to replace them. (In notepad++, search for \n) In the EN file, you should replace the line breaks with a normal space, but in the JP file use a full-width space. This space keeps words and sentences from running together. - Next, the scripts contain null characters where the actual line breaks should be. Using notepad++, you can do a search and replace, replacing \0 for \n - Now, the scripts probably have some junk characters in them which you should manually go through and remove. (these correspond to the green highlighted text in the game) - Finally you can select the entire text that remains, and paste to a spreadsheet! Japanese in 1 column, English in the other! Make sure it lines up. If not, you might need to do some adjusting. - Each chapter of the game might be spread across more than one file, so be sure to get it all. In particular, cutscenes are in separate files. Little Charo bilingual scripts - Zarxrax - 2011-02-20 I have completed processing of chapter 1 and posted to a spreadsheet (linked from the first post) I wont do any other chapters until I finish studying sentences from chapter 1. If anyone wants to go ahead, just follow the directions in my previous post. Little Charo bilingual scripts - caivano - 2011-02-20 I guess you might know this already but this is on Japanese TV too Little Charo bilingual scripts - Zarxrax - 2011-02-20 Yea I tried downloading an episode, but it mostly just seemed like a bunch of people talking, along with short cartoon clips. It didn't seem nearly as useful as the game. |