Back

Netflix Japanese content

#76
(2017-01-25, 8:33 pm)Zarxrax Wrote: Hmmm, this is weird. My google credit is only around $20 now, and I haven't used it since the other day when it last told me I had about $160 left. That is a serious delay in stats reporting or something Huh
In any case, looks like I will not be able to OCR anymore subs using this method.
Wow! So it burnt through $280 credit. それは高すぎる!

I could be mistaken, did the Google API also count in the time-stamps during the OCR process? Huh
Reply
#77
(2017-01-25, 8:41 pm)eslang Wrote:
(2017-01-25, 8:33 pm)Zarxrax Wrote: Hmmm, this is weird. My google credit is only around $20 now, and I haven't used it since the other day when it last told me I had about $160 left. That is a serious delay in stats reporting or something Huh
In any case, looks like I will not be able to OCR anymore subs using this method.
Wow!  So it burnt through $280 credit.  それは高すぎる!

I could be mistaken, did the Google API also count in the time-stamps during the OCR process? Huh

Not sure exactly what you are asking about the timestamps, but only the subtitle images were sent over.
Reply
#78
zx573 Wrote:I don't really watch dramas so I'm good I guess. Unless there's a good (and not cheaply made) horror-themed drama I should know about, then you'd have my interest. Tongue
I have only watched some Japanese horror dramas in late 90s and in early 2000, and recently, most of it are Japanese horror movies, but not the dramas.

Some of the titles for the Japanese horror dramas, extracted from Wikipedia Category:ホラードラマ
学校の怪談 (1994・KTV)
リング (1995・FUJI) SPECIAL
らせん (1999・FUJI)
催眠 (2000・TBS)
怪談百物語 (2002・FUJI)
放送禁止 (2003・FUJI)
着信アリ (2005・ASAHI)
地獄少女 (2006・NTV)
百鬼夜行抄 (2007・NTV)
妖怪人間ベム (2011・NTV)
死幣-DEATH CASH- (2016・TBS)


I think the last two dramas, 妖怪人間ベム and 死幣-DEATH CASH- already have Japanese subtitles captured by JP-subbers.

Zarxrax Wrote:Not sure exactly what you are asking about the timestamps, but only the subtitle images were sent over.
I see...... if it is only the subtitle images, then the Google API is terribly expensive. I don't think I will use it.
Edited: 2017-01-25, 8:54 pm
Reply
JapanesePod101
#79
Suddenly I have $160 credit again... so I guess I could still OCR some more subs. Maybe Google is confused Huh
Reply
#80
Zarxrax Wrote:Suddenly I have $160 credit again... so I guess I could still OCR some more subs. Maybe Google is confused Huh
Maybe Googleさん read this thread and our comments. Big Grin

Anyway, when the google credit runs out, and there is time and interest to explore the OCR script further.....
I'm wondering whether zx573 script program can send the image files to OneNote for OCR instead of the Google API?  

I remembered from  RawrPk post in another thread
RawrPk Wrote:Then I thought to myself, "why don't I add the pages into OneNote which has OCR capabilities and I can extract the text from it?".
I am guessing that Microsoft OneNote is the latest/upgrade version of MODI which were previously bundled together into Microsoft 2003/2007 Office package.  

Subtitle Edit and Idxsubocr uses MODI as the OCR tool, but the Idxsubocr (Chinese simple gui app) program script is not written to recognize vertical text positioning, because most of the Chinese subtitles are presented in horizontal lines.  

I tried out this "Autosub Tool" (a utility for automatic speech recognition and subtitle generation) on my "modern" computer last year, and the result is not that good for dramas, average for documentary programs, and pretty good result for NHK news announcer.
http://forum.koohii.com/thread-14237-pos...#pid240518

wibr from the http://www.chinese-forums.com Wrote:The speech-to-text is done by https://cloud.google.com/speech/ and the API key is hardcoded in the script, so I wonder who will pay for the usage? It's only free for up to 60min per month, after that it's 1.5$ per hour.
Reply
#81
Zarxrax,

Could you please do Star Trek Into the Darkness? Thank you.
Reply
#82
(2017-01-27, 8:59 am)supermancampus Wrote: Zarxrax,

Could you please do Star Trek Into the Darkness? Thank you.

This is not currently available on USA Netflix. If you can provide me with DVD subs (sub/idx) I could OCR those.
Reply
#83
(2017-01-22, 7:12 pm)Zarxrax Wrote: Alright, I've got the subs all OCR'ed and ready to go.

Netflix SRT pack
Contains:
  • Atelier (Underwear)
  • Good Morning Call season 1
  • Hibana (Spark)
  • Midnight Diner: Tokyo Stories
  • Terrace House - Boys and Girls in the City
  • Mischievous Kiss ~ Love in Tokyo (Itazura na Kiss)
  • Mischievous Kiss 2 ~ Love in Tokyo (Itazura na Kiss 2)
Also, Tiger and Dragon subs per Nukemarine's request.

All Japanese subs were OCRed and may contain mistakes. All or most of the Japanese subs are closed captions rather than true subtitles. Subtitle Edit can be used to automatically strip out sounds and character names (Tools > Remove text for hearing impaired).

English subtitles are included in the Netflix pack. The English subtitles were ripped as text, and contain no mistakes.

eslang has kindly proofread a few episodes to correct some of the OCR mistakes:
Terrace House 01
Midnight Diner 08
This is great, thanks.  But I wonder about the English translations.  For example, here is item #4 from episode 1 of Terrace House:

共同生活する様子を

ただただ記録したものです

The English translation is given as:

living together, and we observe

how they interact with each other.

but doesn't the Japanese mean something like "It is a record of just how we live together" or something like that (nothing about "observing").

UPDATE: I went back and watched the show with the English subtitles on, and I see the English version given is "we observe how they live together." So I guess the English translations are "free" translations rather than literal.
Edited: 2017-01-28, 3:09 am
Reply
#84
(2017-01-24, 11:26 am)gaiaslastlaugh Wrote: FYI, the wife (a major Netflix addict) just saw that TERRACE HOUSE: ALOHA STATE is now available on Netflix US. Have at it, everyone!!

Too much English and American attempts at Japanese. I think the other one is much more useful.

(2017-01-23, 9:54 am)johndoe2015 Wrote:
(2017-01-22, 7:12 pm)Zarxrax Wrote: Alright, I've got the subs all OCR'ed and ready to go.

Netflix SRT pack
Contains:
  • Atelier (Underwear)
  • Good Morning Call season 1
  • Hibana (Spark)
  • Midnight Diner: Tokyo Stories
  • Terrace House - Boys and Girls in the City
  • Mischievous Kiss ~ Love in Tokyo (Itazura na Kiss)
  • Mischievous Kiss 2 ~ Love in Tokyo (Itazura na Kiss 2)
Also, Tiger and Dragon subs per Nukemarine's request.

All Japanese subs were OCRed and may contain mistakes. All or most of the Japanese subs are closed captions rather than true subtitles. Subtitle Edit can be used to automatically strip out sounds and character names (Tools > Remove text for hearing impaired).

English subtitles are included in the Netflix pack. The English subtitles were ripped as text, and contain no mistakes.

eslang has kindly proofread a few episodes to correct some of the OCR mistakes:
Terrace House 01
Midnight Diner 08


I'm finished proactively ripping subs, but I'll be glad to take requests (for now) if someone has some other shows that they want subs for.

This is so great. Thank you, Zarxrax.

(2017-01-22, 11:06 pm)eslang Wrote: Thank you very much, Zarxrax.  That is really awesome! Heart

Just curious, how much $$ did it burn through with the OCR of subtitles in the Netflix Pack and Tokyo & Dragon using the Google Cloud Platform credit system?

At  juniperpansy
At  johndoe2015

Here are the two episodes from Terrace House:
Proofread and Edited
Terrace House episode 25
http://pastebin.com/GwrgUstH

Terrace House episode 26
http://pastebin.com/hZkHLccJ

Next, I'll proofread and edit Hibana episode 1 and 2, it should be up here by end of this week. Smile

Eslang, thank you. This is so helpful.

I am having a hard time loading ep25 to subs2srs. I'm getting an invalid time code error. AegisSub won't recognize it either. Anything I can do?
Edited: 2017-01-29, 12:17 am
Reply
#85
(2017-01-29, 12:09 am)johndoe2015 Wrote: I am having a hard time loading ep25 to subs2srs. I'm getting an invalid time code error. AegisSub won't recognize it either. Anything I can do?

Try opening it in Subtitle Edit and then re-save it. Looks like the problem might be periods where commas are expected. Subtitle edit is a lot more "accepting" of little things like that than some other programs.
Reply
#86
(2017-01-29, 10:01 am)Zarxrax Wrote:
(2017-01-29, 12:09 am)johndoe2015 Wrote: I am having a hard time loading ep25 to subs2srs. I'm getting an invalid time code error. AegisSub won't recognize it either. Anything I can do?

Try opening it in Subtitle Edit and then re-save it. Looks like the problem might be periods where commas are expected. Subtitle edit is a lot more "accepting" of little things like that than some other programs.

Thanks. I'll try that. I installed a trial of Crossover on my Mac, so we'll see how it goes Smile.
Edited: 2017-01-29, 6:04 pm
Reply
#87
I generated a word frequency report for Terrace House: http://pastebin.com/xgHNytec
Might be useful for anyone who wants to try learning some of the more common words before watching.
Reply
#88
(2017-01-22, 7:12 pm)Zarxrax Wrote: Alright, I've got the subs all OCR'ed and ready to go.

Netflix SRT pack
Contains:
  • Atelier (Underwear)
  • Good Morning Call season 1
  • Hibana (Spark)
  • Midnight Diner: Tokyo Stories
  • Terrace House - Boys and Girls in the City
  • Mischievous Kiss ~ Love in Tokyo (Itazura na Kiss)
  • Mischievous Kiss 2 ~ Love in Tokyo (Itazura na Kiss 2)
Also, Tiger and Dragon subs per Nukemarine's request.

All Japanese subs were OCRed and may contain mistakes. All or most of the Japanese subs are closed captions rather than true subtitles. Subtitle Edit can be used to automatically strip out sounds and character names (Tools > Remove text for hearing impaired).

English subtitles are included in the Netflix pack. The English subtitles were ripped as text, and contain no mistakes.

eslang has kindly proofread a few episodes to correct some of the OCR mistakes:
Terrace House 01
Terrace House episode 25
Terrace House episode 26
Midnight Diner 08
Hibana episode 1
Hibana episode 2

Late to this party but I just wanted to thank you and everyone who contributed to this. I watched this rendition of Terrace House as a guilty pleasure thing, and indeed the current one in Hawaii as well, but with English subs because to be quite honest they're very good (free, but to my reading very accurate in terms of capturing nuance). But I've been debating whether to rewatch either with J-subs, and having this just seals the deal. As others like OP commented the show really is great for catching up on what "the young kids" are speaking, and the 解説 parts with YOU, Trindle, etc. will be great material to get better at お笑い programs. Also looking forward to Hibana and Midnight Diner!
Reply
#89
Hello all, new here of course. This was quite a bit of info that helped me rip subs from Japanese Netflix on Android as well. Thank you!  Big Grin

EDIT: Assuming no one has tried it, extracting the subs, editing them and repacking them in the .nfs format breaks the subs Sad
Edited: 2017-03-02, 4:58 pm
Reply
#90
(2017-01-29, 10:01 am)Zarxrax Wrote:
(2017-01-29, 12:09 am)johndoe2015 Wrote: I am having a hard time loading ep25 to subs2srs. I'm getting an invalid time code error. AegisSub won't recognize it either. Anything I can do?

Try opening it in Subtitle Edit and then re-save it. Looks like the problem might be periods where commas are expected. Subtitle edit is a lot more "accepting" of little things like that than some other programs.

So I think the reason why we get the invalid time code error is due to the following:

00:00:02,933 --> 00:00:04

when it expects to see a comma after the 04 and maybe 3 more numbers.

Resaving it through subtitle edit does enable it to be used through subs2srs, but it basically deletes the lines that had the problems on it.  I'm guessing there's no way to preserve them without manually going through a text editor and fixing those problems?
Reply
#91
(2017-03-28, 7:59 am)elhnad Wrote:
(2017-01-29, 10:01 am)Zarxrax Wrote:
(2017-01-29, 12:09 am)johndoe2015 Wrote: I am having a hard time loading ep25 to subs2srs. I'm getting an invalid time code error. AegisSub won't recognize it either. Anything I can do?

Try opening it in Subtitle Edit and then re-save it. Looks like the problem might be periods where commas are expected. Subtitle edit is a lot more "accepting" of little things like that than some other programs.

So I think the reason why we get the invalid time code error is due to the following:

00:00:02,933 --> 00:00:04

when it expects to see a comma after the 04 and maybe 3 more numbers.

Resaving it through subtitle edit does enable it to be used through subs2srs, but it basically deletes the lines that had the problems on it.  I'm guessing there's no way to preserve them without manually going through a text editor and fixing those problems?

Hmm, ought to be possible to write a script or use regular expressions or something to make the correction. I was planning to convert some more series soon, and I can try taking a look at that too.


Edit: Actually, um... sorry, where exactly are you seeing this error? I'm looking through terrace house 25 and other files, and I'm not seeing this issue.
Edited: 2017-03-28, 5:29 pm
Reply
#92
every mischeivous kiss japanese sub gave me the error in aegissub and subs2srs
Reply
#93
(2017-03-28, 8:19 pm)elhnad Wrote: every mischeivous kiss japanese sub gave me the error in aegissub and subs2srs

I believe I have a solution to the bad timestamps.
I'll reupload all of the subtitles with the corrections in a week or two.
In the meantime, it can be corrected using a tool that lets you do search and replace using regular expressions, such as Notepad++

Find: (\d\d:\d\d:\d\d,\d?\d?)(\s)
Replace: \10\2
Run this 2-3 times to fill in missing 0's at the end of timestamps.

Find: (\d\d:\d\d:\d\d)(\s)
Replace: \1,000\2
This will fix lines that are missing the milliseconds altogether.
Reply
#94
I have updated my subtitle pack with some new shows and fixed timestamps for all the old ones.

Download from here (Mega) or here (MediaFire)

Contents:
  • Atelier (Underwear)
  • Good Morning Call
  • Hibana (Spark)
  • Midnight Diner: Tokyo Stories
  • Mischievous Kiss (Itazura na Kiss)
  • Mischievous Kiss 2 (Itazura na Kiss 2)
  • My Little Lover (Minami Kun No Koibito)
  • Terrace House: Boys and Girls in the City
  • Terrace House: Aloha State (Parts 1+2)
  • Pee Wee's Big Holiday (Japanese subs for English language movie)
  • Stranger Things (Japanese subs for English language series)
I have also written up a guide explaining how to rip subtitles yourself: http://www.nihongonobaka.com/extracting-...m-netflix/

Edit: Still didn't catch some of the messed up timestamps. Fixed version is up now.
Edited: 2017-04-04, 9:02 pm
Reply
#95
(2017-04-04, 5:14 pm)Zarxrax Wrote: I have updated my subtitle pack with some new shows and fixed timestamps for all the old ones.

Download from here (Mega) or here (MediaFire)

Contents:
  • Atelier (Underwear)
  • Good Morning Call
  • Hibana (Spark)
  • Midnight Diner: Tokyo Stories
  • Mischievous Kiss (Itazura na Kiss)
  • Mischievous Kiss 2 (Itazura na Kiss 2)
  • My Little Lover (Minami Kun No Koibito)
  • Terrace House: Boys and Girls in the City
  • Terrace House: Aloha State (Parts 1+2)
  • Pee Wee's Big Holiday (Japanese subs for English language movie)
  • Stranger Things (Japanese subs for English language series)
I have also written up a guide explaining how to rip subtitles yourself: http://www.nihongonobaka.com/extracting-...m-netflix/

Awesome work as always. 

Shame that the subtitles don't match the spoken dialogue in Stranger Things. Hell, that's been a major complaint of mine for years not unique to NetFlix. Still, it would be interesting attempting to create a subs2srs from that even with that limitation. With Memrise being multiple choice, it would be a test where you hear something and have to pick text that somewhat matches up to the meaning of the audio. Maybe something to test out later.
Reply
#96
(2017-04-04, 8:32 pm)Nukemarine Wrote: Awesome work as always. 

Shame that the subtitles don't match the spoken dialogue in Stranger Things. Hell, that's been a major complaint of mine for years not unique to NetFlix. Still, it would be interesting attempting to create a subs2srs from that even with that limitation. With Memrise being multiple choice, it would be a test where you hear something and have to pick text that somewhat matches up to the meaning of the audio. Maybe something to test out later.

Wow, that's a really interesting idea for a listening proficiency test!
Reply
#97
(2017-04-04, 8:32 pm)Nukemarine Wrote: Awesome work as always. 

Shame that the subtitles don't match the spoken dialogue in Stranger Things. Hell, that's been a major complaint of mine for years not unique to NetFlix. Still, it would be interesting attempting to create a subs2srs from that even with that limitation. With Memrise being multiple choice, it would be a test where you hear something and have to pick text that somewhat matches up to the meaning of the audio. Maybe something to test out later.

This is unfortunately true of all the originally-English Netflix productions. The sub and dub translations are clearly done independently across the board. originally-Japanese titles simply use the script for subtitling, so other than the occasional odd edit (different politeness level or whatnot), it's pretty nearly word for word.

YMMV, but I find it's easier to just watch without subtitles. i can rewind and turn subtitles on if I get really lost. (So far I've watched Stranger Things, Daredevil, and Jessica Jones in Japanese, and the first couple episodes of a couple other titles. I really like the Marvel stories, but it is a bit weird watching them in Japanese because they are such very American storylines.)

Anyway, trying to read along with something that isn't what's being said gets really disorienting. If I'm mostly reading and hit kanji I don't know, I try to listen for it but the word isn't there or if it is the sentence is structured differently so it's offset quite a bit; and if I'm mostly listening and hit a word I don't recognize, I look down to a mess of characters that have nothing to do with what was just said. For me at least, this kind of subtitle makes it harder rather than easier to follow the dialogue because I'm either trying to understand two different lines simultaneously, or I'm ignoring one line in favor of the other and the one I'm not focusing on is just a distraction.
Reply
#98
My reading ability is better than my listening, so sometimes I watch Japanese subtitles with English dialogue. The English is automatic for me, so my brain doesn't have to spend any effort thinking about it at all, and so I'm free to focus on trying to read as fast as possible. The English can help me figure out some of the Japanese words that I wouldn't get otherwise.
Reply