Back

Rules for when the sokuon (little tsu) is used?

#1
I need to fix a bug on the Kanji Koohi website, and I'm trying to figure out which way to approach.

The sokuon aka "little tsu" aka っ (as opposed to つ) is not always shown properly when it is part of an On reading.

The reason for that is that my php script split the words into individual parts with their readings. On the flashcards' example words, the individual pronunciations are put back together.

It works for Kun yomi as in " 三つ "  >  " みっつ "   and it can highlight the kanji reading as in  みっつ 

However for Onyomi there is a bug : 雑誌  ( ざっし  )  " journal; magazine; " shows up as   ざつ

I could change my script but... I could also perhaps restore the sokuon even if it is not store in the database.

TLDR  Is there a consistent rule that can programmatically be implemented to replace つ  with っ  just by  parsing the full reading?  In other words are there exceptions to how the sokuon behaves?
Edited: 2016-09-06, 9:06 am
Reply
#2
I don't think you can formulate a rule that's universal. You've got 雑誌 and 颯爽, but you've also got 殺人, you've get 結婚 and 結構 but you've also got 吸血蝙蝠.

In other words, the さつ/ざつ+さ行 might get a sokuon or it might voice the next mora.

けつ+か行 is much closer to a rule, but large compounds where two on-yomi words are made into a larger one do not always follow this rule.

I can't think of examples immediately but I think there are also cases where sokuon is inserted between two か行 mora where there's no つ in the original reading, and cases where mora other than つ are made into sokuon.
Edited: 2016-09-06, 9:32 am
Reply
#3
Thanks. There are indeed at least two entire in JMDICT with けつこう (no sokuon).

That answers my question... I'll have to edit my old scripts that break down kanji compounds .. to store both the pronunciation as it is in a given compound so that will handle any exceptions.

What is a か行 mora ?
Reply
(March 20-31) All Access Pass: 25% OFF Basic, Premium & Premium PLUS! 
Coupon: ALLACCESS2017
JapanesePod101
#4
The ka row (ka ki ku ke ko). He probably thinks of stuff like 客観的.
Reply
#5
Workaround: global-replace sokuon markers with custom markup (like ^), and then just swap sokuon back in at the end of the process?
Reply
#6
No I was looking for a way to infer the sokuon based on some reliable rules. For example infer ざっし from ざつ and し. You can easily think of some basic patterns but there's always going to be exceptions.
Edited: 2016-09-13, 10:39 am
Reply