Hi I've done a cute script to download the videos from TBS, along with the text and put them in the current folder.
I'm using it on Ubuntu, but it probably works well on any platform with ruby and mencoder run (including windows and mac).
You only need ruby, hpricot and mencoder installed.
Download it from:
http://www.inf.ufsc.br/~emilio/japanese/aranha.rb
Example of use:
$ ruby aranha.rb
It will download the last 15 articles from TBS that have video, convert the videos to mp3, and throw it all at the same folder together with each article's text.
###### ONLY for windows users: ########
You can download the mplayer package (which contains mencoder) from this link:
http://sourceforge.net/project/showfile … _id=683443
Then you must set the MENCODER_PATH inside the script to wherever you install mencoder (notice the double \\ instead of only \ in pathnames). For example:
### BEGIN CONFIGURATION ###
MAX_FILES = 15
MENCODER_PATH = "c:\\my\\mencoder\\folder\\mencoder.exe"
HOST = "news.tbs.co.jp"
### END CONFIGURATION ###
Last edited by mentat_kgs (2009 May 25, 2:59 pm)
vengeorgeb
Member
Registered: 2008-12-22
Posts: 308
mentat_kgs wrote:
Hi I've done a cute script to download the videos from TBS, along with the text and put them in the current folder.
Hey mentat, with no other intention than to educate, if you have some time, go over explaining this code in detail, I think everyone, programmers and non-programmers would appreciate your personal insight.
sethg
Member
From: m
Registered: 2008-11-07
Posts: 505
After downloading, though, when opening in Ubuntu's default text app, Text Editor, I get this:
$B!!5v2D$J$/309q?M$H@\?($7$?$H$7$F5/AJ$5$l$?%_%c%s%^!<$NL1<g2=1?F0;XF3<T%"%&%s!&%5%s!&%9!<!&%A!<$5$s$,#2#2F|!"K!Dn$G=i$a$FH/8@$N5!2q$rM?$($i$l!"!V$I$N$h$&$J:a$bHH$7$F$$$J$$!W$HL5:a$r<gD%$7$^$7$?!#(B
$B!!<+Bp$K?/F~$7$?%"%a%j%+?M$NCK@-$rGq$a$?$H$7$F9q2HKI8fK!0cH?$N:a$KLd$o$l$F$$$k%9!<!&%A!<$5$s$O#2#2F|$NK!Dn$G!"!V;d$O$I$N$h$&$J:a$bHH$7$F$$$J$$!#L5<B$G$9!W$HH]G'$7$^$7$?!#(B
$B!!$^$?!"?3M}$N$"$H!"%9!<!&%A!<$5$s$OJ[8n;N$KBP$7!"!VCK@-$,?/F~$G$-$?$N$O7YHw$K<jMn$A$,$"$C$?$+$i$@!#;d$O?MF;E*N)>l$+$iCK@-$rGq$a$?$@$1!W$H=R$Y$?$H$$$&$3$H$G$9!#(B
$B!!0lJ}!"%"%a%j%+?M$NCK@-$OK!Dn$G!V;d$,K,$l$?$N$O%9!<!&%A!<$5$s$,0E;&$5$l$kL4$r8+$?$+$i$@!W$H?/F~$7$?M}M3$r>Z8@$7$^$7$?!#(B
$B!!:[H=$OMh=50J9_$bB3$/M
j$G!"%9!<!&%A!<$5$s$OM-:a$H$5$l$l$P:GD9$G6X8G#5G/$N7:$K=h$5$l$k2DG=@-$,$"$j$^$9!#!J(B23$BF|(B01:33$B!K(B
However, if I Open With > Emacs22 (Client), it comes out as normal Japanese... in... an ugly editor
Any way I can just view it normally with Text Editor?
Last edited by sethg (2009 May 23, 12:44 am)
@jorgebucaran
Hey, I don't think this would be an easy task to do, but I'll tell what the code does:
He opens the yomiuri frontpage.
Gets the first 15 links with a video icon.
Visieats each link and discovers the place where the video is and copy the text.
Then he downloads each of the videos and extracts the mp3.
wget and mencoder are external programs. wget is a download manager and mencoder is the encoding twin of mplayer.
@Toberoth
I put the begin block just to make the dirty part of the code clearer from the configuration variables.
Btw, I don't know what to do if your text file gets ugly, but try opening the file with
$ LC_ALL=ja_JP.UTF-8 gedit dirty_file.txt
Last edited by mentat_kgs (2009 May 23, 7:27 am)
sethg
Member
From: m
Registered: 2008-11-07
Posts: 505
I've tried with Firefox, and changed it to every Japanese encoding, but nothing works 
Tobberoth, I would assume it is, but I honestly don't know how to check that.
mentat_kgs, I tried your code, but it still showed up as the symbols I pasted above.
Awfully frustrating problem. I despise the Emacsen interface. I *could* go through and cut all the text from emacsen to Gedit and replace the file, but that just takes away from the beautiful simplicity of the script.
Anybody have any more ideas?
Edit: Actually, I have, indeed, got it working in Firefox. I just had to set Auto-detect to Japanese. I don't know why this working... I ran through, literally, all 3 Japanese settings yesterday. :: sigh :: Well, at least it's working, and with rikaichan for quick lookups, to boot 
Last edited by sethg (2009 May 23, 11:29 am)
denus
Member
Registered: 2009-02-01
Posts: 22
If you use gedit, you can open it manually and select the encoding ISO-2022-JP, it should show it.
A script someone showed me:
can convert all those text files into Unicode, so it should display fine thereafter. Back up beforehand.
Edit: On that note, I just frigging wish Japan would adopt Unicode, the stubborn so-and-sos. >.<
Last edited by denus (2009 May 25, 9:56 am)