![]() |
|
Script to steal the audio from the news from TBS - Printable Version +- kanji koohii FORUM (http://forum.koohii.com) +-- Forum: Learning Japanese (http://forum.koohii.com/forum-4.html) +--- Forum: Learning resources (http://forum.koohii.com/forum-9.html) +--- Thread: Script to steal the audio from the news from TBS (/thread-3115.html) Pages:
1
2
|
Script to steal the audio from the news from TBS - mentat_kgs - 2009-05-22 Hi I've done a cute script to download the videos from TBS, along with the text and put them in the current folder. I'm using it on Ubuntu, but it probably works well on any platform with ruby and mencoder run (including windows and mac). You only need ruby, hpricot and mencoder installed. Download it from: http://www.inf.ufsc.br/~emilio/japanese/aranha.rb Example of use: $ ruby aranha.rb It will download the last 15 articles from TBS that have video, convert the videos to mp3, and throw it all at the same folder together with each article's text. ###### ONLY for windows users: ######## You can download the mplayer package (which contains mencoder) from this link: http://sourceforge.net/project/showfiles.php?group_id=205275&package_id=248631&release_id=683443 Then you must set the MENCODER_PATH inside the script to wherever you install mencoder (notice the double \\ instead of only \ in pathnames). For example: ### BEGIN CONFIGURATION ### MAX_FILES = 15 MENCODER_PATH = "c:\\my\\mencoder\\folder\\mencoder.exe" HOST = "news.tbs.co.jp" ### END CONFIGURATION ### Script to steal the audio from the news from TBS - Codexus - 2009-05-22 I've tested it on my Ubuntu, I just had to install the necessary packages and it worked without problems! Great script! Thanks! Script to steal the audio from the news from TBS - vengeorgeb - 2009-05-22 mentat_kgs Wrote:Hi I've done a cute script to download the videos from TBS, along with the text and put them in the current folder.Hey mentat, with no other intention than to educate, if you have some time, go over explaining this code in detail, I think everyone, programmers and non-programmers would appreciate your personal insight. Script to steal the audio from the news from TBS - denus - 2009-05-22 Oh, wow, this is brilliant! Thanks a lot. Script to steal the audio from the news from TBS - ahibba - 2009-05-22 Users of Ubuntu and Linux are lucky. We don't have that luxury! Script to steal the audio from the news from TBS - nac_est - 2009-05-22 いただきま〜す Script to steal the audio from the news from TBS - Tobberoth - 2009-05-22 Good stuff. Might I ask why you put it all in a begin block though? You're not using that block at all. Script to steal the audio from the news from TBS - Tobberoth - 2009-05-22 jorgebucaran Wrote:If you know Ruby, his code is very clear and quite easy to understand. Instead of reading an explanation on his code, you should probably learn the basics of Ruby and read it yourself. I think you would learn a lot from it. Ruby is so easy to read that it's pretty much self-documenting.mentat_kgs Wrote:Hi I've done a cute script to download the videos from TBS, along with the text and put them in the current folder.Hey mentat, with no other intention than to educate, if you have some time, go over explaining this code in detail, I think everyone, programmers and non-programmers would appreciate your personal insight. Script to steal the audio from the news from TBS - kazelee - 2009-05-22 Interesting title, though, you're not actually 'stealing' it. Script to steal the audio from the news from TBS - sethg - 2009-05-23 Oh my 神... this is amazing! I could kiss you!! Thanks so much On a side note, paste and save the ruby into a document in Ubuntu and it should color code it for even easier reading
Script to steal the audio from the news from TBS - sethg - 2009-05-23 After downloading, though, when opening in Ubuntu's default text app, Text Editor, I get this: Quote:$B!!5v2D$J$/309q?M$H@\?($7$?$H$7$F5/AJ$5$l$?%_%c%s%^!<$NL1<g2=1?F0;XF3<T%"%&%s!&%5%s!&%9!<!&%A!<$5$s$,#2#2F|!"K!Dn$G=i$a$FH/8@$N5!2q$rM?$($i$l!"!V$I$N$h$&$J:a$bHH$7$F$$$J$$!W$HL5:a$r<gD%$7$^$7$?!#(BHowever, if I Open With > Emacs22 (Client), it comes out as normal Japanese... in... an ugly editor Any way I can just view it normally with Text Editor?
Script to steal the audio from the news from TBS - Codexus - 2009-05-23 sethg, why not simply open them with firefox? It recognizes the character encoding automatically and that way you get to use rikai-chan. Script to steal the audio from the news from TBS - Tobberoth - 2009-05-23 sethg Wrote:After downloading, though, when opening in Ubuntu's default text app, Text Editor, I get this:What is your locale? en_US.UTF-8? Script to steal the audio from the news from TBS - mentat_kgs - 2009-05-23 @jorgebucaran Hey, I don't think this would be an easy task to do, but I'll tell what the code does: He opens the yomiuri frontpage. Gets the first 15 links with a video icon. Visieats each link and discovers the place where the video is and copy the text. Then he downloads each of the videos and extracts the mp3. wget and mencoder are external programs. wget is a download manager and mencoder is the encoding twin of mplayer. @Toberoth I put the begin block just to make the dirty part of the code clearer from the configuration variables. Btw, I don't know what to do if your text file gets ugly, but try opening the file with $ LC_ALL=ja_JP.UTF-8 gedit dirty_file.txt Script to steal the audio from the news from TBS - sethg - 2009-05-23 I've tried with Firefox, and changed it to every Japanese encoding, but nothing works :/ Tobberoth, I would assume it is, but I honestly don't know how to check that. mentat_kgs, I tried your code, but it still showed up as the symbols I pasted above. Awfully frustrating problem. I despise the Emacsen interface. I *could* go through and cut all the text from emacsen to Gedit and replace the file, but that just takes away from the beautiful simplicity of the script. Anybody have any more ideas? Edit: Actually, I have, indeed, got it working in Firefox. I just had to set Auto-detect to Japanese. I don't know why this working... I ran through, literally, all 3 Japanese settings yesterday. :: sigh :: Well, at least it's working, and with rikaichan for quick lookups, to boot
Script to steal the audio from the news from TBS - Tobberoth - 2009-05-23 Checking your locale is easy. Just open a terminal and run "locale". As long as it ends in .UTF-8, simply copying into gedit should work... Script to steal the audio from the news from TBS - denus - 2009-05-25 If you use gedit, you can open it manually and select the encoding ISO-2022-JP, it should show it. A script someone showed me: Code: for i in *.txt; do f=$(basename $i); iconv -f ISO-2022-JP -t UTF-8 $i > $f.utf8;doneEdit: On that note, I just frigging wish Japan would adopt Unicode, the stubborn so-and-sos. >.< Script to steal the audio from the news from TBS - mentat_kgs - 2009-05-25 Ok, I just fixed the text issue. The script was not tested on windows, but it should work fine now. Script to steal the audio from the news from TBS - sethg - 2009-05-25 mentat_kgs Wrote:Ok, I just fixed the text issue. The script was not tested on windows, but it should work fine now.YAY! Totally works fine now Thanks so much NOW it is incredible awesome
Script to steal the audio from the news from TBS - sethg - 2009-05-25 denus Wrote:A script someone showed me:Thanks for this as well Converted all the ones I'd already downloaded, which saved me a lot of copying and pasting!
Script to steal the audio from the news from TBS - denus - 2009-05-25 Thanks so much, mentat. The script is indeed beyond awesome now. :3 Script to steal the audio from the news from TBS - Tobberoth - 2009-05-25 Wow, I had no idea wget was preinstalled on Windows... Not that saving files down manually in Ruby is hard, but that's good to know, it's a useful tool. Script to steal the audio from the news from TBS - mentat_kgs - 2009-05-25 It is not. I made ruby check the OS so it will only download using wget on linuxes. Script to steal the audio from the news from TBS - Tobberoth - 2009-05-25 mentat_kgs Wrote:It is not. I made ruby check the OS so it will only download using wget on linuxes.Ah yes, I see it now... why did you decide to use wget on linux though? Is it faster? (I haven't tried it myself). Script to steal the audio from the news from TBS - mentat_kgs - 2009-05-25 Not faster, but it resumes downloads and checks if the file was already downloaded. |