![]() |
|
Automatic audio extraction from NHK News Easy? - Printable Version +- kanji koohii FORUM (http://forum.koohii.com) +-- Forum: Learning Japanese (http://forum.koohii.com/forum-4.html) +--- Forum: Learning resources (http://forum.koohii.com/forum-9.html) +--- Thread: Automatic audio extraction from NHK News Easy? (/thread-12343.html) |
Automatic audio extraction from NHK News Easy? - JackBS - 2014-11-21 Do you know of any way to automatically extract the audio from the five or so daily articles at NHK News Easy? http://www3.nhk.or.jp/news/easy I can do it manually with programs like TubeMaster++ to get the mp3 files, but I'm looking for more of an automated daily process. Thanks in advance! Automatic audio extraction from NHK News Easy? - Inny Jan - 2014-11-21 I use something that I brewed myself - sorry, but I don't have time to provide any kind of support, so it's just you and this code: Code: #!/usr/bin/env pythonI've found some time now to record the steps to make use of the above script. 1. Save the python code to the dl.py file If you don't care about tagging the mp3 files, then 2. Remove lines 118-138 (otherwise things get even messier...) In Google Chrome (I believe that other browsers allow for similar workflow as well) 1. Open "http://www3.nhk.or.jp/news/easy/index.html" 2. Select "11月26日(水)" 3. Right-click > Inspect Element on "「黒人を撃った警察官を訴えない」アメリカ中で抗議" In Developer Tools: 1. Locate '<ul class="newslisteven">' 2. Right-click > Copy '<ul class="newslisteven">' 3. Save the Clipboard to the newslisteven.html file (this file has to be in the same directory as the dl.py script) Command box part: 1. Open the command window 2. Change the current directory to the location where dl.py and newslisteven.html are 3. Execute 'python -u dl.py' After execution of the script finishes you should see .txt and .mp3 files for 11月26日(水). Automatic audio extraction from NHK News Easy? - JackBS - 2014-11-22 Thank you! I'll find out how to use this. Automatic audio extraction from NHK News Easy? - RawToast - 2014-11-26 If you're after more content, I believe there is an archive of NHK News Easy articles with separate audio mp3s in buonaparte's resources thread. http://forum.koohii.com/showthread.php?tid=6840 Automatic audio extraction from NHK News Easy? - JackBS - 2014-11-30 RawToast Wrote:If you're after more content, I believe there is an archive of NHK News Easy articles with separate audio mp3s in buonaparte's resources thread.Thank you! I've practiced already with a good chunk of Buonaparte's archives, but I'm incredibly "hard of hearing" and so I'd like to try to get this to work to get more free new content every day. Inny Jan Wrote:I've found some time now to record the steps to make use of the above script...That is very kind of you, thank you very much! I confess I've battled for days to get it to work, and I know you're busy, so the following question is for anyone who might be able to spot the error, be it another programmer or a layman like me who nevertheless got it to work. In the python-code-saving step, I removed lines 118 to 138 and saved the resulting code as "dl.py" in the folder "C:\Users\Jack\Desktop\python": Code: #!/usr/bin/env pythonCode: <ul class="newslisteven"><li><span class="newstitle"><a href="./k10013485691000/k10013485691000.html">「<ruby>黒人<rt>こくじん</rt></ruby>を<ruby>撃<rt>う</rt></ruby>った<ruby>警察官<rt>けいさつかん</rt></ruby>を<ruby>訴<rt>うった</rt></ruby>えない」アメリカ<ruby>中<rt>じゅう</rt></ruby>で<ruby>抗議<rt>こうぎ</rt></ruby></a></span><span class="date">[11月26日 17時00分]</span><span class="sound">音声</span><span class="movie">動画</span></li><li class="even"><span class="newstitle"><a href="./k10013485081000/k10013485081000.html"><ruby>中国<rt>ちゅうごく</rt></ruby>の<ruby>香港<rt>ほんこん</rt></ruby> <ruby>抗議<rt>こうぎ</rt></ruby>する<ruby>人<rt>ひと</rt></ruby>たちのバリケードを<ruby>片<rt>かた</rt></ruby>づける</a></span><span class="date">[11月26日 17時00分]</span><span class="sound">音声</span><span class="movie">動画</span></li><li><span class="newstitle"><a href="./k10013466411000/k10013466411000.html">タカタのエアバッグ <ruby>自動車会社<rt>じどうしゃがいしゃ</rt></ruby>は<ruby>早<rt>はや</rt></ruby>く<ruby>修理<rt>しゅうり</rt></ruby>して</a></span><span class="date">[11月26日 17時00分]</span><span class="sound">音声</span><span class="movie">動画</span></li><li class="even"><span class="newstitle"><a href="./k10013453111000/k10013453111000.html"><ruby>世界<rt>せかい</rt></ruby>の<ruby>人身売買<rt>じんしんばいばい</rt></ruby>のうち33%は<ruby>子<rt>こ</rt></ruby>ども</a></span><span class="date">[11月26日 11時30分]</span><span class="sound">音声</span><span class="movie">動画</span></li><li><span class="newstitle"><a href="./k10013454921000/k10013454921000.html"><ruby>鎌倉<rt>かまくら</rt></ruby>の<ruby>寺<rt>てら</rt></ruby>で<ruby>夜<rt>よる</rt></ruby>の<ruby>紅葉<rt>こうよう</rt></ruby>を<ruby>光<rt>ひかり</rt></ruby>で<ruby>照<rt>て</rt></ruby>らす</a></span><span class="date">[11月26日 11時30分]</span><span class="sound">音声</span><span class="movie">動画</span></li></ul>If anyone has any hints as to where I might be making a mistake, I'll be very much in your debt! Automatic audio extraction from NHK News Easy? - balloonguy - 2014-11-30 In the same folder as dl.py and newslisteven.html, create a file called run.bat containing: Code: dl.pyAutomatic audio extraction from NHK News Easy? - Inny Jan - 2014-11-30 @JackBS What you are doing sounds ok. I followed the same steps as you described and also run into some problems. I'm at work at the moment so can't look at the issue in detail but when I get home I will see what happens. Interestingly enough, right now I'm getting timeout in line 79: (filename, headers) = urllib.urlretrieve(url) when url is: http://www3.nhk.or.jp/news/easy/k10013454921000/k10013454921000.html They might have implemented testing for "User Agent" (they were not doing that so far...) and if that's the case then the script will have to be updated. It's also possible that my problem is related to how our network is configured... (BTW, are you executing "python -u dl.py" within the DOS box? I mean, you need to type that command on the command line and the DOS box should not disappear. Also, the current directory must be where the "dl.py" and "newslisteven.html" are located. balloonguy's "run.bat" can be helpful although if the issue is with the "User Agent" it's unlikely that it will fix your problem). Edit: Righty-oh, it's all good in my place. If balloonguy's "run.bat" has issues then you can always put "python dl.py" there. Good luck. Automatic audio extraction from NHK News Easy? - JackBS - 2014-12-08 Thank you for your help last week! I've been learning about the command window and Python scripts. (I was not following correctly the very last part of the instructions.) With the help of my son, I think I've gotten closer this week, but unfortunately I'm not there yet! I'll be thankful if anyone can find the error, but I'll continue trying to get this to work in any case. The relevant files are in this directory: C:\Users\Jack\Desktop\python\dl.py C:\Users\Jack\Desktop\python\newslisteven.html C:\Users\Jack\Desktop\python\run.bat The contents of the files are exactly as posted above. We've tried "run.bat" containing "dl.py" and then another one containing "python dl.py". Our attempts have been as follows: 1. According to the instructions... Code: C:\Users\Jack>Code: C:\Users\Jack>Code: C:\Users\Jack>Code: C:\Users\Jack>Automatic audio extraction from NHK News Easy? - Inny Jan - 2014-12-08 Your problems can be summarised as: 1. python.exe is not your PATH 2. Python compatibility issues To solve problem 1. you can: 1. Modify your system wide PATH, or 2. Modify your run.bat file, so it reads: Code: PATH=%PATH%;C:\Python343. Modify your run.bat file, so it reads: Code: C:\Python34\python dl.pyTo solve problem 2. you can refer to this article ("Print Is A Function") and replace all print statements (like: print title) to print functions (print(title)) as described in the article. Alternatively, you could just remove the lines with print statements (they are there just to indicate progress of downloading.) After you've done the above you should be able to put and execute run.bat in any directory on your disk (even with your mouse if you wish ...).
Automatic audio extraction from NHK News Easy? - balloonguy - 2014-12-09 Actually, to solve problem 2 you need to install Python 2, you currently have Python 3. However, to make it easier, and I hope Inny Jan doesn't mind, I've turned the script into an executable so that you don't need to install python or deal with the command prompt. It can be downloaded at http://dropcanvas.com/b5d3q. Once downloaded, place newslisteven.html in the same directory as dl.exe and just double click dl.exe. The downside is that there is no indication of progress, the files just appear. Automatic audio extraction from NHK News Easy? - JackBS - 2014-12-10 The exe file worked like a charm! (Before that I had managed to fix the print statements, but then some indentation errors started coming up!) Thank you very much to both of you! For the sake of other learners, mostly those unfamiliar with coding, here is then the procedure pared down to its essentials: 1. Go to NHK Easy and go to the calendar section. Choose a day, highlight the title of the first article, right-click on it and select Inspect Element. 2. In the Inspector window, locate a nearby line called <span class="newslisteven">, right-click on it and select Copy Outer HTML. Paste this code in a text editor and save it as "newslisteven.html". 3. Download ballonguy's exe version of Inny Jan's script, place both the html and the exe in the same directory, and double-click the exe file. This will help immensely. Thank you again for your time and patience! Automatic audio extraction from NHK News Easy? - Inny Jan - 2014-12-10 Great to hear it worked out for you! |