Well, as embarrassing as it is, you're right, I had 32-bit python. I had it for a while and for some reason I assumed I'd have 64-bit version, but it actually installs 32-bit if you take what they give you. Had to browse the site for 64-bit specifically.
Everything compiled and installed (other than not finding vcvarsall.bat).
For those who might try to replicate this later, note that if the instructions say to get Visual Studio 2013, then get that version. In e.g. 2015 they rearranged where all the files are, so you'd have to figure out all the new paths and settings yourself.
Anyway, mecab now works for me on the command line, but it doesn't want to recognize python strings. I ran the test app that has the hardcoded string "太郎はこの本を二郎を見た女性に渡した。" and the output I get with Shift-JIS dictionary is:
Code:
螟 名詞,一般,*,*,*,*,*
ェ 名詞,一般,*,*,*,*,*
驛弱 名詞,一般,*,*,*,*,*
・ 記号,一般,*,*,*,*,*
縺薙 名詞,固有名詞,組織,*,*,*,*
・ 記号,一般,*,*,*,*,*
譛 名詞,固有名詞,組織,*,*,*,*
ャ 名詞,一般,*,*,*,*,*
繧剃 名詞,一般,*,*,*,*,*
コ 名詞,一般,*,*,*,*,*
碁 名詞,一般,*,*,*,*,碁,ゴ,ゴ
ヮ 名詞,一般,*,*,*,*,*
繧定 名詞,一般,*,*,*,*,*
ヲ 名詞,一般,*,*,*,*,*
九 名詞,数,*,*,*,*,九,キュウ,キュー
◆ 記号,一般,*,*,*,*,◆,◆,◆
螂 名詞,固有名詞,組織,*,*,*,*
ウ 名詞,一般,*,*,*,*,*
諤 名詞,一般,*,*,*,*,*
ァ 名詞,一般,*,*,*,*,*
縺 名詞,一般,*,*,*,*,*
ォ 名詞,一般,*,*,*,*,*
貂 名詞,一般,*,*,*,*,貂,テン,テン
。 名詞,サ変接続,*,*,*,*,*
縺励 名詞,一般,*,*,*,*,*
◆ 記号,一般,*,*,*,*,◆,◆,◆
縲 名詞,固有名詞,組織,*,*,*,*
・記号,一般,*,*,*,*,*
EOS
With UTF-8 dictionary it's even worse:
Code:
螟ェ驛・蜷崎ゥ・蝗コ譛牙錐隧・莠コ蜷・蜷・*,*,螟ェ驛・繧ソ繝ュ繧ヲ,繧ソ繝ュ繝シ
縺ッ 蜉ゥ隧・菫ょ勧隧・*,*,*,*,縺ッ,繝・繝ッ
縺薙・ 騾」菴楢ゥ・*,*,*,*,*,縺薙・,繧ウ繝・繧ウ繝・
譛ャ 蜷崎ゥ・荳€闊ャ,*,*,*,*,譛ャ,繝帙Φ,繝帙Φ
繧・蜉ゥ隧・譬シ蜉ゥ隧・荳€闊ャ,*,*,*,繧・繝イ,繝イ
莠・蜷崎ゥ・謨ー,*,*,*,*,莠・繝・繝・
驛・蜷崎ゥ・荳€闊ャ,*,*,*,*,驛・繝ュ繧ヲ,繝ュ繝シ
繧・蜉ゥ隧・譬シ蜉ゥ隧・荳€闊ャ,*,*,*,繧・繝イ,繝イ
隕・蜍戊ゥ・閾ェ遶・*,*,荳€谿オ,騾」逕ィ蠖「,隕九k,繝・繝・
縺・蜉ゥ蜍戊ゥ・*,*,*,迚ケ谿翫・繧ソ,蝓コ譛ャ蠖「,縺・繧ソ,繧ソ
螂ウ諤ァ 蜷崎ゥ・荳€闊ャ,*,*,*,*,螂ウ諤ァ,繧ク繝ァ繧サ繧、,繧ク繝ァ繧サ繧、
縺ォ 蜉ゥ隧・譬シ蜉ゥ隧・荳€闊ャ,*,*,*,縺ォ,繝・繝・
貂。縺・蜍戊ゥ・閾ェ遶・*,*,莠疲ョオ繝サ繧オ陦・騾」逕ィ蠖「,貂。縺・繝ッ繧ソ繧キ,繝ッ繧ソ繧キ
縺・蜉ゥ蜍戊ゥ・*,*,*,迚ケ谿翫・繧ソ,蝓コ譛ャ蠖「,縺・繧ソ,繧ソ
縲・險伜捷,蜿・轤ケ,*,*,*,*,縲・縲・縲・
EOS
Not just the input, but even the output is messed up. My Windows is currently in Japanese locale.
When I saved test.py with Shift-JIS encoding, everything printed correctly (with Shift-JIS dictionary). Is it simply that Windows cmd doesn't support UTF-8?
EDIT: Let me ask a more specific question: If I don't care about what's printed to stdout, I just want to check which part of speech certain words are in input text, then I should be fine using UTF-8 string literals in python source with UTF-8 input files and UTF-8 MeCab dictionary?
P.S. I implore the Japanese to get their encoding anarchy under control.