Wiktionary talk:Japanese transliteration

Concerns about compound words

Latest comment: 3 years ago5 comments4 people in discussion

After poking about some, I happily stumbled across this article just last night. By and large, the proposed romanization policy looks pretty good to me, but I have a couple concerns I'd like to bring up for discussion, in the aim of possibly changing this draft proposal. I'll go through my concerns one by one below.

=== Strict rules ===

Every word in Japanese is to be expressed as a single romanized "word", with no spaces or hyphens.
A word, in this context, is a single sentence component, including compound nouns, and most particles (助詞). Conjugated forms are counted as a single word, as are verbs consisting of する affixed to a noun. The following all count as a single word, and appropriate romanization is given in parentheses:
- しりとり (shiritori)
- 現代的 (gendaiteki)
- 県庁所在地 (kenchōshozaichi)
- が (ga)
- 勉強する (benkyōsuru)

Most of this looks good, but for the treatment of compounds, which raises big red flags for me, particularly the two red example items above. 県庁所在地 is basically 県庁 kenchō (prefectural hall) + 所在地 shozaichi (location), both distinct morphemes in and of themselves. Likewise, 勉強する is basically 勉強 benkyō (study [noun]) + する suru (to do), both also distinct morphemes in and of themselves. Considering the former example as a single compound word would be similar to treating any noun + "location" combination in English as a single word, a patently untenable concept on the one hand, and an unnecessary increase in the amount of work required of us at Wiktionary on the other. Likewise, the latter example would be similar to treating any noun + "do" combination in English as a single word, also untenable and an unneeded increase in work and complexity. I made the same point over a year ago over at w:Talk:Agglutinative language#Number of Japanese irregular verbs. Never mind the sheer number of noun + suru combinations...

The purple item above contradicts an example further down the page:

=== Relaxed rules ===

If a compound word can be broken into discrete parts, especially when one of the parts is a common prefix or suffix, use a hyphen to separate the parts.
- 明治時代 (meiji-jidai)
- 理想的 (risō-teki)
- 岐阜市 (gifu-shi)
- 不可算名詞 (fu-kasan-meishi)

Here we have 理想的 risō-teki. This hyphenated usage makes it more clear that we have 理想 risō (ideal [noun]) + 的 -teki ([adjectival / adverbial suffix]). Given that -teki is in fact a suffix, and does not have any stand-alone usage, the hyphenated form would seem to be ideal (no pun intended here).

The two red example items in this latter group are again compounds that should probably be separated. 時代 jidai, for instance, is a word unto itself, that means "age" or "era" or "period". So rather than Meiji-jidai, we should probably write Meiji jidai. Fu-kasan-meishi is a combination of an affix and a compound, with 不 fu- (un-) + 可算 kasan (countable) + 名詞 meishi (noun), and as such should probably also be broken down differently, either as fukasan meishi, or as fu-kasan meishi. We should probably use the latter version, for consistency with the risō-teki example.

My aims here are 1) clarity, and 2) reducing our workload and the corpus of unique words. I think these changes in the proposal would do well on both counts. I'd like any feedback people are willing to give, so by all means please respond. Thank you, Eiríkr Útlendi | Tala við mig 23:40, 28 April 2006 (UTC)Reply

What you say makes sense to me, though I have mixed feelings about it. There are many ways to approach these "terms", and each has their justifications. I think using the strict rules would be problematic in the cases you give, because the hyphens help to show where the terms can in some sense be "broken up" (I wish German had hyphens for this purpose). However, I'm not sure I agree that larger compound terms, like 明治時代 should be spaced apart. The difference between compound words and multiple adjacent words is not easy to discern in Japanese, since spaces are usually not used between words in writing. But just because something can serve as a distinct morpheme doesn't mean it is detached from the other part of the term. Writing 明治時代 as meiji jidai seems odd to me, though there is some precedent for it. But I think it's useful to use hyphens to show the compoundness of the term, making it more distinct from two attached words, e.g., 明治の時代. After all, what do you do when you have 明治時代的 (not the best example, I know), where a suffix is now attached to the whole compound and not just the latter segment of it. meiji jidai-teki? That seems wrong.

I feel especially strong with regards to +する, which gets special treatment in every Japanese dictionary I've seen. Probably this is in part because you can't attach する to just any noun (をする is closer to the English do + noun, but of course it's hard to draw any perfect parallels between such disparate languages). Writing benkyo suru indicates, IMO, too much separation between the two pieces, since the term is usually considered a verb, and not some compound verb or verb phrase. After all, most +する verbs are translated into English as simple verbs (e.g., to study, to travel, to practice, etc.). The ones that are more like the English to do homework, for example, are more likely to be をする in Japanese, as far as I remember seeing.

Jun-Dai 20:01, 22 June 2006 (UTC)Reply

I'm late to the party, but I'll leave my opinion on the topic anyway. I believe compound words like 明治時代 should be connected by a hyphen, to show it's one word. This is because those are not just two words put one next to the other, but phonetically they are one word. Their accent as individual words would be 明治 (Méiji) and 時代 (jidai), but as a compound they become 明治時代 (Mëiji-jìdai), one word, one accent.

Affixes and suffixes, on the other hand, don't need a hyphen, since their very nature is to be bound to another word and couldn't exist in isolation, so 理想的 (risōteki), 明治時代的 (Mëiji-jïdaiteki), etc.

As for 勉強する, I guess either would work, but I think it would be better to be able to differentiate (1) 勉強する (benkyō-suru) as "one word" and (2) 勉強する (benkyō suru) (2) when abbreviation of 勉強をする (benkyō wo suru), i.e. the difference between the following two sentences:

一人で英語を勉強するのもたいせつですが…
hitóri de eigo wo benkyō-surù no mo taisetsu désu ga...
一人でTOEICの勉強するのもありだと思います。
hitóri de tôikku no benkyō surù no mo ári da to omoimásu.

Sartma (talk) 14:42, 20 September 2021 (UTC)Reply

Since a related matter came up in a discussion at User_talk:Robert_Ullmann and I just discovered this page, I'm going to pipe in here, too :-) It's only about the use of spaces in places where words are linked. In such places it would be convenient if most of the time links consisted of single space-separated words. I'll count "-" as a kind of space here. It would likewise be convenient if most of the exceptions would be sequences of such words. In other words, things like must are best, nice one second best and eat ing (note the links) worst. So 勉強する would best be rendered as べんきょうする, 明治時代 as めいじ　じだい (unless you create 明治時代, but it's not idiomatic) and 理想的 as りそうてき. (excuse my use of kana - I'm ignoring non-space aspects of romanization, and whether each space in the kana means a " " or "-" is also a side issue)

I'd space the examples this way:

しりとり => しりとり (idiomatic)
現代的 => げんだいてき (Form of the adjective 現代的だ. There's no simple rule to decide whether N的だ adjective exists for a noun N or not)
県庁所在地 => 県庁　所在地 (both idiomatic, combination is not)
勉強する => 勉強する (verb derived from 勉強; same reason as for 現代的だ)
行く、行って、行きます、行けば、行かない ==> いく、いっ　て、いき　ます、いけ　ば、いか　ない (split to grammatical atoms; combinations may or may not be idiomatic or useful to include, I'll avoid that issue for now)
僕は純苔です => ぼく　は　じゅんだい　です (each should have an entry, combinations obviously not)
サチより日本語が上手だ => サチ　より　にほんご　が　じょうずだ (same reason, and -語 words need entries due to lack of a general rule - there's no アメリカ語 or イギリス語 for instance, but there are 米語 and 英語)

Using the above and rendering spaces as "-" when the term after it is used as a suffix (i.e. requires a prefix in the used sense), you'd get romanizations such as "it-te-ki-masu" (or if you want suffix entries to start with "-", then "it -te -ki -masu"), "sō-na-n-desu-yo-ne" and "watashi-wa iroirona rōmaji-no tsukai-kata-o mi-te-ki-ta." (possibly combining parts if they're deemed idiomatic, but at least all these entries should exist - personally I think that adding combinations will never work because of combinatorial explosion, but that's a separate matter)

If one doesn't want to link to elements whose only relevant part would be a description of Japanese grammar, then one can omit spaces from those parts. E.g. "itte-kimasu", "watashi-wa iroirona rōmaji-no tsukai-kata-o mite-kita.", or if you think the explanation of (say) "masu" is clear enough, "itte-ki-masu". And if you want to link each grammatical element, then you can't avoid "it-te-ki-masu". The drawback is that if you make links such as "watashi-mo yarasareteita", then someone will want to add the "word" "yarasareteita" and a thousand others. -- Coffee2theorems 02:36, 8 May 2007 (UTC)Reply

On a second thought, while links such as it te ki masu are somewhat confusing (you don't see link boundaries), using hyphens or spaces between the parts suggests that there's a pause between them, which there isn't. Of course it's ugly too, but that's generally bad as the only reason for rejecting something :-) But if you go solely by pauses, you get spacing like Japanese tend to use on the rare occasions they use them, i.e. わたしも　いく (watashi mo iku). Surely there must be some rules which produce a fair result and can be described in a couple of short sentences.. -- Coffee2theorems 02:53, 11 May 2007 (UTC)Reply

Hepburn romanization of long "i"

Latest comment: 9 years ago21 comments10 people in discussion

At school we used Hepburn romanization, and I have seen it in textbooks. For some reason, the macron was used for a, u, e and o only. We always wrote long i as "ii". Seeing the examples on this page, I now understand why. The visual distinction between "i" and "ī" is miniscule because the macron replaces the dot rather than appearing over it.

I suggest for the purpose of visual clarity that we adopt traditional Hepburn style on this point and write "ii" rather than "ī". Raichu2 03:05, 22 December 2006 (UTC)Reply

PS I've come home and I happen to be using my son's old 2nd hand CRT monitor instead of my brand new LCD screen at work, and honestly, I can't even see the difference between i and i-macron! I personally recommend that this policy needs to be rectified.

I have no issues with the other macrons, but I am also unhappy about ī. This goes against the various established romanization systems. For example, look at Niigata. What is to be gained by spelling it as "Nīgata"? I strongly suggest using "ii" in place of "ī".

I discovered this page via a Wikipedia link when someone tried to use these romanization rules to justify spelling reform. This is original work. Please conform to common romanization systems and / or clearly specify the scope of this. Even if it is to only be used on Wiktionary, people will continue to quote from here and there will be no end to the romanization problems. 210.138.88.178 00:06, 19 April 2007 (UTC)Reply

I also agree with the above comments. This proposal does not seem to be very active anymore. The last edit on main page is dated October 24th, 2006. Excluding the last two unresponded to comments, the talk page has been silent since December 22nd, 2006. Multiple comments have now been made regarding long "i"s. It goes against common practice and without any further discussion or opposition, there seems to be a consensus. I will make appropriate changes. If at a later time someone wishes to discuss it further, please feel free to discuss it here. 122.18.155.87 12:55, 23 April 2007 (UTC)Reply

Don't go assuming a consensus without involving the major contributors. And: on the wikt we have many very quiet talk pages; when issues are brought up it is essential to mention them on WT:BP especially if the discussion is elsewhere. Finally, editing a policy or policy subpage when you can't even be bothered to log in will probably just get it reverted. Robert Ullmann 13:06, 23 April 2007 (UTC)Reply

There are currently five comments. Four of those comments make similar statements against ī. The last comment, yours, does not add anything (support or opposition) to the topic. At the moment, there is not any opposition. And weeks later there is still no comments by the major contributors. Nor was there discussion in the first place to establish the original policy. Checking the history, it was arbitrarily decided upon by a single editor. Finally, don't go assuming that I am too "bothered to log in" as I do not have an account to log in to. Feel free to check the IP (Japan). 122.29.162.129 09:28, 8 May 2007 (UTC)Reply

Since you seem to be unaware of the fact, you can create an account on Wiktionary and then log in to it. Perhaps if you logged in and contributed in some other way than trying to order around the whole community... Cynewulf 14:34, 8 May 2007 (UTC)Reply

I am aware of the fact, thank you. I have several Wikipedia accounts (for multiple languages). It is enough to manage all of those without adding yet another one. I have specifically chosen not create. Exactly how have I "ordered around the whole community"? With the exception of your comment and Robert Ullmann's (which do not support or oppose the topic on hand), all comments make statements in opposition to "ī". Do you have anything to specific to add to the topic? As originally stated (not ordered), "If at a later time someone wishes to discuss it further, please feel free to discuss it here.". There is still no opposition. 122.29.162.129 17:21, 8 May 2007 (UTC)Reply

The correct romanisation for the long "i" is - use "ī" only when the Japanese Kana spelling is イー or いー - when ー is used. いい　and イイ is never romanised as "ī" but "ii". I think the reason being that いい is a common ending for i-adjectives. My source is my studying through standard Japanese books including those for preparation of JLPT, I haven't seen any respectable reference book or dictionary using "ī" for いい. --Anatoli ^{(обсудить)} 02:19, 12 December 2011 (UTC)Reply

Occasionally えい and エイ are also romanised as "ē" to reflect the pronunciation but this is incorrect. The correct romanisation is "ei". "ē" is only used again as with the long "i" when the elongation symbol ー is used, e.g. えー and エー. By the way, User:Ryulong reverted my last edit and removed my macron symbols (ā, ē, ī, ō, ū). I DO agree with "The combination いい is never indicated by ī always ii". The paragraphs were not mine, I only added the symbols to make romanisation easier. We need to add them, so that people could copy/paste. --Anatoli ^{(обсудить)} 02:29, 12 December 2011 (UTC)Reply

Actually, ā, ē, and ī are only used when the ー is involved. Someone on Wikipedia has discovered that the more recent versions of the dictionary has revised this "always use a macron for long vowels" usage. Only O an U get macrons for their long vowels, and I have been fixing this across the project.—Ryūlóng (竜龍) 03:14, 12 December 2011 (UTC)Reply

Please provide the source and don't change anything at Wiktionary before there is an agreement. ラーメン is romanised as "rāmen", not "raamen", メーデー is romanised as "mēdē", not "meidei" or "meedee" and ビートルズ is romanised as "Bītoruzu", not "Biitoruzu". --Anatoli ^{(обсудить)} 03:34, 12 December 2011 (UTC)Reply

Also, your sentence "The long vowels あ, い, and え are rare in native words, but they are never indicated by macrons and always doubling the vowel as done in the kana form." is not clear. Macrons are used, even though the words are not native Japanese. What about お婆さん,　お母さん,　ああ? They are native Japanese words and romanised with "ā". Non-native Japanese with the elongation symbol ー all use macrons. --Anatoli ^{(обсудить)} 03:42, 12 December 2011 (UTC)Reply

You are completely misinterpretting everything I have done. I have never said that words with ー do not get macrons. All I have been doing is trying to correct a major error when it comes to the ああ, いい, and ええ forms. The latest version of the Hepburn dictionary is available through Google and they have "obaa-san" (page 468), "okaa" (475), and "nesan" (448). "Ee" is on 82 and "Aa" is on page 1. There is no use of ā or ē outside of words that have ー in them.—Ryūlóng (竜龍) 11:05, 12 December 2011 (UTC)Reply

No, I'm not misinterpreting. We have been using the traditional Hepburn system, not the revised, where お婆さん,　お母さん,　ああ all romanised with "ā" and お姉さん is "onēsan", not oneesan. Words where a-a and e-e belong to different roots are obviously the exception. No problem with "ii" when it's not indicated by an elongation. Please stop changing or rather forcing the rules, otherwise I will have to revert all your edits. --Anatoli ^{(обсудить)} 12:53, 12 December 2011 (UTC)Reply

The "traditional Hepburn" system is out of date and has several errors in it that are mainly the four words that you have brought up and that I tried to fix until you reverted everything I've done. The Hepburn system from 1903 is the best set up there is out there, and those words should be romanized with "aa", not "ā", and "ee" or "e", but not "ē".—Ryūlóng (竜龍) 20:18, 12 December 2011 (UTC)Reply

I have to explain again to you that changes are possible but not before the changes are discussed and agreed upon. To name a few User:Haplology, User:Eirikr, User:TAKASUGI_Shinji are heavily involved in Japanese editing and they have discussed the romanisation specifically, favouring the methods I have explained to you. I have given you the discussion link on your talk page or you can approach the users for clarification yourself. --Anatoli ^{(обсудить)} 21:45, 12 December 2011 (UTC)Reply

@Atitarev, Eirikr, Haplology, Wyang I’d like to revisit this problem. I think we here in Wiktionary now have an agreement on the romanization of お母さん, お姉さん, and お父さん: okāsan, onēsan, and otōsan respectively. As for お兄さん, we currently romanize it as onīsan, which seems to me consistent with other long vowels, but the Wikipedia article Hepburn romanization clearly opposes it and recommends oniisan instead. The specification ANSI Z39.11-1972 (the only available document I have found is a Japanese translation), already withdrawn, specifically states that いい is ii while イー is ī. I know that in practice they almost always use ii like Niigata probably because it is difficult to distinguish i and ī. How should we romanize お兄さん, 小さい, 詩歌 then? — TAKASUGI Shinji (talk) 01:00, 5 March 2015 (UTC)Reply

I prefer to continue to use "ī", despite the Wikipedia rules (we disagree in various transliterations with them, not just Japanese, e.g. Arabic), taking into account exceptions but I'll wait for other responses. --Anatoli T. ^{(обсудить}/^вклад) 01:13, 5 March 2015 (UTC)--Anatoli T. ^{(обсудить}/^вклад) 01:13, 5 March 2015 (UTC)Reply

Your comment (that i and ī might be visually difficult to distinguish) is the first rational argument I've run across for treating long い differently than other doubled vowels.

That said, I still think the general rule is a good one: long vowels are romanized with the macron, unless the two halves of the long vowel occur across a morpheme or inflection boundary. So the long い in お兄さん or 新潟 would be rendered as ī, since the long vowel is part of single morpheme, while the long い in 新しい or 言います or 委員 would be rendered as ii, since the two halves belong to different inflectionary or morphemic units. Likewise, the single word いい would be ii, since the second い is part of the inflection, and thus functionally separate and distinct from the first い.

Words that have been traditionally romanized a certain way, such as placenames, would have an English entry (such as Niigata) and a separate romanized Japanese entry (such as Nīgata (Nīgata)).

For what it's worth, our system is based on Hepburn, but isn't Hepburn exactly. Much as modified Hepburn changed the rules of traditional Hepburn to make it more regular, the system we have been using so far at EN WT is essentially a further modification of modified Hepburn, smoothing out the remaining odd inconsistency in the handling of long い.

That's my understanding, anyway. Does that make sense to you? Is that acceptable? ‑‑ Eiríkr Útlendi │ Tala við mig 01:30, 5 March 2015 (UTC)Reply

Thank you for describing again, I didn't repeat myself, since I described it to user Ryulong above. I totally agree with Eirikr and this is also my understanding of what our current practice is and should be. --Anatoli T. ^{(обсудить}/^вклад) 01:48, 5 March 2015 (UTC)Reply

OK, we have agreed to continue using ī. — TAKASUGI Shinji (talk) 15:22, 7 March 2015 (UTC)Reply

Diaresis

Latest comment: 17 years ago4 comments4 people in discussion

Greetings. I am concerned about the guidelines concerning diaresis usage. What is the rational for this? After more than a decade of living in Japan, translation (J-E) work, and linguistic study, this is one of the oddest proposals that I have come across. None of the major (Hepburn, Kunrei-s(h)iki, and Nihon-s(h)iki), or even minor, romanization systems use the diaresis. A few examples given include: mizuümi, ōümi, and goön. There are no advantages to this over "mizuumi", "ōumi", or "goon". If the reason is to differentiate between long vowels, the lack of a macron over the said vowel already clarifies that. This is quite disturbing. 122.18.198.80 13:47, 18 April 2007 (UTC)Reply

Same as above, there is no discussion. Without discussion or opposition, I will make the suggested changes. If at a later time there are new comments, please make them here. 122.18.155.87 13:00, 23 April 2007 (UTC)Reply

No, it's a policy page; if there is no discussion it means the relevent people or concerns have not been addressed. Create an account, log in, and put a notice on WT:BP that changes are being considered, pointing here. And I'm inclined to agree with you, but we need to see what other people think. Robert Ullmann 11:36, 8 May 2007 (UTC)Reply

Thank you, but I will decline from creating an account. That is my prerogative. There is no need to put a notice on WT:BP. Comments were and are being made directly on the page involved. It is not hidden in any way. Those interested in this page should be watching it. Just for reference, as this Wiktionary, the word is spelled relevant. 122.29.162.129 17:29, 8 May 2007 (UTC)Reply

No diacritics

Latest comment: 13 years ago30 comments5 people in discussion

Here's a suggestion: Avoid using any diacritics (i.e. the "funny marks" on top of letters). Diacritics are by and large ignored except by a select few people, and for good reasons.

Most people don't know what the line on top of ō means, much less the meaning of the dots in ōümi. ī is (as others here have noted) even worse, as you hardly even notice the macron. These marks do not communicate any useful information to most people, and may even mislead. Because diacritics are nearly meaningless in most people's minds, they shouldn't be used to convey any significant meaning. The difference will just be lost.

The "funny marks" are commonly ignored in English, because they don't change anything - "café" and "cafe" are the same. But tōi (遠い) and toi (問い), nori (海苔) and nōri (脳裏), hato (鳩) and hāto (ハート) are totally different. Japanese already has more than enough homonyms (certainly too many for comfortable reading of rōmaji instead of kanji!), there's no need to create any more of them.

The macrons are also difficult to type, so people almost never do. Having Wiktionary romaji be easy to type would allow readers to use the romanizations elsewhere. With diacritics they can't (nobody will bother), so they will likely mangle the romaji so that it works for them. Which usually means dropping the diacritics and losing important information.

According to Wikipedia, modified Hepburn and JSL use doubling of vowels instead of macrons. This would be a good choice, e.g. "tookyoo", "joozu", "tooi", "noori", "haato". Modified Hepburn also differs from Hepburn in that ん is written as n with a macron, but that should be avoided for exactly the same reasons. Romanizing 文様(もんよう) as "mon'yoo" and 女(おんな) as simply "onna" is best.

Additionally, I think there shouldn't be any exceptions for customary transliterations. Irregularities do not help anyone, they're just confusing. If a transliteration has become customary in English, then it is an English word, i.e. a loan like Tokyo vs. mere transliteration Tōkyō. One would use the latter in Japanese text romanized in a particular way and the former in an English translation of the same. -- Coffee2theorems 20:47, 8 May 2007 (UTC)Reply

One more idea: It would probably be clearest to consistently follow pronunciation, like JSL according to Wikipedia does. I'm not saying that JSL itself should be used, as e.g. "tati" instead of "tachi" would certainly be mispronounced by an uninformed reader.

What I mean is (and this works for any romanization system) that one would take the kana of a piece of text, rewrite it in kana so that it follows the pronunciation as closely as possible using the useful stand-alone sounds for each kana (i.e. う is /u/, never /o/, and ゐ would be /wi/, not /i/ which is useless as there's already い), and then render that in romaji. This way the reader of romaji wouldn't have to know about different kana orthographies, which anyone who is not familiar with kana in the first place is not going to be aware of.

Examples: 読(よ)まう -> よもお ->　yomoo, 言(い)う -> ゆう -> yuu, 言(い)おう -> いおお -> ioo, 私(わたし)は -> わたしわ -> watashi wa, 思(おも)ひ出(で) -> おもいで -> omoide, 用(もち)ゐる -> もちいる -> mochiiru, 思(おも)う -> おもう -> omou, 大海(おおうみ) -> おおうみ -> ooumi, しよつちゆう -> しょっちゅう　-> shocchuu, 続(つづ)く -> つずく -> tsuzuku. -- Coffee2theorems 16:02, 9 May 2007 (UTC)Reply

No offense intended, but when you say "Diacritics are by and large ignored except by a select few people", you really do not know what you are talking about. General newspapers and magazines will often drop diacritics due to their editorial policies, but professionally translated texts and especially academic reports and studies almost always use diacritics. In the past, much of the world was limited by incompatible character sets and encodings. This often forced people to omit macrons (or sometimes replace them with available diacritics). In the last decade with the move to Unicode, diacritic usage has been soaring. Why? Because, as you have noted, it really is necessary in distinguishing between similar lexial terms, and further because it simply is "more correct".

Sure, there are other alternatives such as ad hoc spelling systems. The problem with these is that they are non-standard and will only confuse even more readers. Those who knew the old system will need to learn a new one. Look at the word 東京. Any reader, ever if unfamiliar with macrons, can read and understand Tōkyō. However, if you begin to spell it as Tookyoo, it will surely confuse some. Some people will surely try to pronounce "oo" as [o:]. The same issue would affect your other examples as well: "joozu" (*[zy:zu]), "tooi" (*[tu:i]), "noori" (*[nu:ri]), where * means incorrect.

I do not like irregularities either. While I recognize that "Tokyo" is common in English, "Tōkyō" is far from uncommon. Many professionals regularly use "Tōkyō". Guess what the Tōkyō train station (東京駅) calls itself in English text: "Tōkyō".

The present system is not perfect. As noted above, "ü" and "ī" are non-standard and created here. The rest, though, really is the way that Japanese romanization works. It is the reality and the standard.

I would love more than anything else to change the English spelling system to more accurately reflect current pronunciation. However, that too will cause an unending amount of confusion. More importantly, it's not my decision. At least with Japanese, with a few minutes of study one should be able to for the most part pronounce all of the Japanese sounds simply by looking at these romanizations. It's really not so bad.

I translate a lot of historical Japanese text into English. The standard is to romanize the word as it would appear in modern Japanese. In specific historical linguistic contexts exceptions may be made, but these are rare and for specialists.

I do not really have a strong opinion on romanizing the particles は, へ, and を or the sound づ. Probably writing them as "wa", "e", "o", and "zu" would benefit the majority. Those who know better really do not need the romanization in the first place.

Get rid of the non-standard ü and ī. But leave the rest of the diacritics as they are. They are crucial. If you are concerned with pronunciation, then I suggest that you use IPA. After all, spelling and pronunciation are two separate concepts, even in Japanese. 210.138.88.178 08:44, 14 May 2007 (UTC)Reply

That many "professionally translated texts and especially academic reports and studies" use macrons does not mean that a random sample of people will contain more than a small minority who understand macrons. Even Wikipedia uses macrons, and what do you think is the average lifetime of a macron once it leaves a Wikipedia article? I bet it's very short, close to 0 seconds. While that works for Wikipedia, you can't just remove macrons of a random Japanese word not used in English and still be understood. A dictionary contains a lot more foreign words than do professional translations, academic reports and studies, or encyclopedias.

I do not understand the relevance. 210.138.88.178 04:59, 17 May 2007 (UTC)Reply

Unicode is not universally adopted. It is not usually used in much of the Europe or Japan for communications. Someone who uses macrons in an e-mail to Japanese person may well get results like Google Books gives for book titles ("Nenp�o seijigaku: Nihon Seiji Gakkai nenpō."). As far as I know, two of the most used encodings in Japan, Shift-JIS and ISO-2022-JP, don't support macrons. As the difference between using ō and oo is really rather minor, I think practicality is more important.

Again, this is not relevant. ISO-2022-JP and Shift-JIS are legacy encodings. ISO-2022-JP is still popular in Japanese e-mail. However, neither are used in Wiktionary. And Unicode certainly is universally adopted and has been for well more than a decade. Windows 2000, XP, and Vista are completely Unicode. When working in legacy character sets, characters are round tripped between Unicode. That is why it is actually slower to work with legacy encodings in modern operating systems. Windows 95, 98, and ME had very limited Unicode functionality. However, that was supplemented by functionality in IE which could handle Unicode to a fairly high degree (certainly enough for macrons). The more important fact to mention is that these OSes are officially obsolete. Even Microsoft will not support them anymore. Linux was one of the very first adopters of Unicode. Apple's OSX has some amazing Unicode support. However, none of this is really very important though, because this discussion is about Wiktionary, which is entirely Unicode, whether you like it or not. 210.138.88.178 04:59, 17 May 2007 (UTC)Reply

The diacritics aren't crucial if you use another method for indicating vowel length. What's crucial is not to mix up different words. I doubt shuuchuu is difficult to understand to anyone who knows the shūchū style romanization (i.e. almost everyone who knows macrons), as even that uses double vowels for many cases. According to Wikipedia it's not ad hoc either, but modified Hepburn. In the Beer parlour discussion A-cai mentioned that it is used by at least one book, Learn Japanese: New College Text, →ISBN. As modifications go, it's rather minor, and has the advantage that it's easy to type anywhere and will not be bastardized either by unknowledgeable users or computer systems. Those who write scholarly articles know how to re-macronize the words if they need to. -- Coffee2theorems 12:01, 14 May 2007 (UTC)Reply

I agree that they are not crucial if you use another method. Your choice of "ū" vs. "uu" is interesting. Most English speakers will generally be able to pronounce it. However, as I mentioned before, that argument breaks down with "oo". Almost all English readers will try to pronounce /ō/ as [u:]. Yes, Wikipedia Hepburn is slightly modified. However, that is the present system being used in professionally edited texts. Why are you concerned if people "bastardize" the spellings? For Wiktionary, entries can be edited and re-edited. Outside of Wiktionary people will do as they want. That should not be of any concern to us here. 210.138.88.178 04:59, 17 May 2007 (UTC)Reply

I'm simply concerned whether people will be able to successfully communicate or not. Is not your concern the same, or why do you care whether anyone pronounces "oo" correctly - after all, "outside of Wiktionary people will do as they want and it should not be of any concern to us"? People do not communicate solely by speech, they also do so by writing, and that often happens by computer these days. A dictionary is not just for reading, it is for copying from - when one looks a word up from a dictionary, one usually has the intent to use that word somewhere, and with a computer dictionary the use will often be on the very same computer.

Macrons are a reality in romanizing Japanese. Individual blogs and amature homepages aside, the publishing world strongly embraces it. Just go to a library if you are not aware of these facts. As I have said, this discussion is entirely internal to Wiktionary. However, as you say, "A dictionary is [...also ] for copying from". True. And what happens when one copies a non-standard spelling? Confusion and general disorder. That is precisely why we can not deviate from the common standards and re-invent our own romanization system. (Note: the diaeresis needs to go for the same reason.) 210.138.88.178 00:57, 18 May 2007 (UTC)Reply

"uu" etc. instead of "ū" will hardly cause much confusion, as anyone who understands the latter is likely to understand the former. With "ū" the copied form will most often be "u", which is worse, as it loses information. And as I said a couple of times before, it's not an original idea, but from modified Hepburn. -- Coffee2theorems 13:39, 18 May 2007 (UTC)Reply

If you copy ū and it turns into u, then you have a software problem. Wiktionary is entirely Unicode. 210.138.88.178 07:01, 23 May 2007 (UTC)Reply

Most can't write macrons and they are often broken by computer systems which still use legacy encodings. Even Wikipedia says that "The adoption of Unicode in e-mail has been very slow. Some East-Asian text is still encoded in a local encoding such as Shift-JIS, and some devices, such as cell phones, still cannot handle Unicode data correctly." With pronunciation problems, at least you have immediate feedback and help from the person you're talking to. People are generally not able to deal with computer problems equally well. -- Coffee2theorems 10:53, 17 May 2007 (UTC)Reply

Gmail and (the new) Hotmail, two of the most popular web-based e-mail clients, default to UTF-8. Outlook and Thunderbird, two of the most popular software based e-mail clients, prompt to switch to Unicode when characters outside of the current character set are used. Cell phones are a little slow in adoption, but increasingly more and more are supporting it. For those that do not, it is their lose and reason to support modern standards from more than a decade ago. If Wiktionary stoops to the lowest denominator, then we all loose. 210.138.88.178 00:57, 18 May 2007 (UTC)Reply

It's a reality that there are still many places where Unicode is not used. -- Coffee2theorems 13:39, 18 May 2007 (UTC)Reply

Really? Then they must not be able to access Wiktionary which is entirely Unicode. By the way, limiting the character set to ASCII is also not enough. There are many legacy character sets that do not even include all of the A-Z a-z characters. One of many examples is RADIX-50 [1]. There are no lower case letters. SHALL WE USE ONLY UPPER CASE LETTERS HERE ON WIKTIONARY? Of course not. We do not restrict well established, proper typesetting. 210.138.88.178 07:01, 23 May 2007 (UTC)Reply

I now think it is best to first convert kana to modern kana orthography and then use the normal romanization method. I still think we should use "oo" etc. as in modified Hepburn to avoid the problems with diacritics, though. I.e. just use Hepburn otherwise as is, but replace all ō:s with oo:s, ū:s with uu:s, etc. -- Coffee2theorems 18:32, 16 May 2007 (UTC)Reply

I'm opposed for the reasons given above. Until the rest of the world adopts such a system, we should not either. 210.138.88.178 04:59, 17 May 2007 (UTC)Reply

Do you mean only the diacritics part? I don't know how old orthography is romanized normally, so I tried to come up with something reasonable. Simple conversion to modern orthography seemed best, e.g. 学校(がくかう) -> がっこう -> gakkō, instead of 学校(がくかう) -> gakukau. As for the "rest of the world" part, most of the world already uses macronless forms on the computer, and with the popularity argument we'd probably end up with waapuro style. Hepburn without diacritics (ō->oo style) is a compromise between practicality and academic credibility. -- Coffee2theorems 10:53, 17 May 2007 (UTC)Reply

Yes, only the diacritics part. They are necessary and can not be dropped. Unless you have a special purpose, then I do not have a problem with historical 学校(がくかう) being romanized as gakkō. That is how we do it when producing English translations (romanization is usually only for poetry and names though). ō -> oo is simply unacceptable. 210.138.88.178 00:57, 18 May 2007 (UTC)Reply

I still fail to understand why you consider it unacceptable, as it's a small, isolated, easy to understand modification for a pragmatic reason. Especially as you said "I do not really have a strong opinion on romanizing the particles は, へ, and を or the sound づ." How else would you romanize them except wa, e, o, zu, and why would that be any more acceptable than romanizing ō as oo? -- Coffee2theorems 13:39, 18 May 2007 (UTC)Reply

As I've already stated, I am fine with wa, e, o, and zu. 210.138.88.178 07:01, 23 May 2007 (UTC)Reply

The part that I do still not understand is why one would find gakkō problematic from a viewing/encoding perspective, but not 学校. If one's computer can display 学校, it should also be able to display gakkō without difficulty. Similarly, if one can figure out how to input 学校, then gakkō should be a piece of cake. Also, waapuro as the word implies, is mainly only seen on computers (because it is quick and dirty, not because it is a recognized standard). Most textbooks and dictionaries which are aimed at foreign language students still use standard Hepburn (gakkō). -- A-cai 11:30, 17 May 2007 (UTC)Reply

The point is mostly in that the romaji should be less problematic than 学校 from an encoding/inputting perspective. Most encodings which don't support 学校 don't support gakkō either. Also, if you find 学校 difficult to input, gakkō won't be much easier (which is precisely why waapuro and similar are so widespread on the net). Neither of these problems exist with "gakkoo". It also isn't true that wherever 学校 is supported, gakkō is too. This applies to much of Japan, where such encodings are still widespread. -- Coffee2theorems 14:05, 17 May 2007 (UTC)Reply

Wiktionary is Unicode, so neither 学校 or gakkō are a problem. waapuro style is often used by language students, some of which have special needs such as discussing aspects of morphology etc. In special contexts, there is nothing wrong with unique romanization methods. However, as a whole, that is not how the professional industry romanizes Japanese. Besides, waapuro style would be "gakkou", not "gakkoo". If you are so concerned with legacy encodings, then support circumflexes. This is fairly old-style, but they are in legacy code pages without need for Unicode. It is fairly unprofessional and generally frowned upon, but it is better than removing diacritics entirely. 210.138.88.178 00:57, 18 May 2007 (UTC)Reply

Circumflexes are a bit better as more encodings support them and more users know how to type them, but I don't think they're supported by Shift-JIS and ISO-2022-JP either, and still most people don't know how to type them. -- Coffee2theorems 13:39, 18 May 2007 (UTC)Reply

More legacy encodings. The 90s are over. Wiktionary is entirely Unicode as is most of the world. Yes, there is still lag, but it is less and less all of the time. Look at the legacy RADIX-50 [2] character set. It does not have any lower case letters. How will you support users on such systems? Do you suggest that we write in all capital letters? There are many, many other legacy encodings each with their strengths and drawbacks. We can not support them all, and the common denominator is extremely small, certainly not enough even for your proposal. Like Wiktionary, the world has moved on from all of that nonsense to Unicode. 210.138.88.178 07:01, 23 May 2007 (UTC)Reply

It looks like this discussion is repeating itself and all the arguments have already come. The points in favor of using macrons are widespread use in print and standards, generally less incorrect pronunciation in the case of ō, and professionals won't frown when they see them. The points against using macrons are that most people can't type them, they are still corrupted by many computer systems, and many will incorrectly think e.g. tōi and toi are two different spellings of the same word just like café and cafe.

1) People can and do type them. Wiktionary could make it easier like Wikipedia does by including various diacritics at the bottom of the page while editing. 2) Corrupt in obsolete legacy encodings. So do, for example, all lower case letters in RADIX-50. We can not support them all. 3) Readers will think whatever they wish. Some will understand and some will not. As a reference, we need to be accurate and professional with our edits. 210.138.88.178 07:01, 23 May 2007 (UTC)Reply

I'm still in favor of not using macrons, but it could be argued that as macrons are widespread in print, most who look up Japanese words on Wiktionary will have to deal with macrons elsewhere anyway, and thus using them is at least not too harmful, just fails to be helpful. I also see that I'm the only one both here and at Beer parlour to prefer the "uu" form, so it may be that I'm mistaken in my assessment. In any case, this is a minor point - the world or even Wiktionary won't collapse even if you decide it by a coin - so in absence of new arguments and opinions I'm inclined to let this one wait. -- Coffee2theorems 13:39, 18 May 2007 (UTC)Reply

Your assessment is not wrong. However, I do not share the same opinion as I have tried to explain above. I have been working professionally in Japanese linguistics for many years now, and macrons are expected. Personally, I do not have anything against your suggestions, and I can understand many different forms of romanization. However, that is not how the world is. You do not like the current romanization system. Fine. I do not like the English spelling system. I can not change it and must accept it as is. These are the realities of the world. Just go with the flow. I too would like to let this issue drop. 210.138.88.178 07:01, 23 May 2007 (UTC)Reply

I'm in favour of the traditional Hepburn and using macrons. The input on this should be given to regular Japanese editors. I am normally involved in translation into Japanese, less in creation/editing of Japanese entries. --Anatoli ^{(обсудить)} 04:26, 12 December 2011 (UTC)Reply

English academia, including the Library of Congress, utilizes the modified/revised system which eliminates the macrons for the long a, e, and i vowels, and also eliminates the syllabic N turning into an M in front of certain consonants.—Ryūlóng (竜龍) 11:35, 12 December 2011 (UTC)Reply

Transliteration of Individual Kanji

Latest comment: 17 years ago2 comments2 people in discussion

I've noticed that the transliterations of kanji which with additional hiragana form a seperate word, such as adjectives, adverbs, and verbs, are giving the entire word as the transliteration of that kanji. For example 小 is rendered ちいさい (chiisai), when the function of the kanji is only the ちい in ちいさい. A more accurate rendering would be ちい（さい） (chii[sai]), or something similar. The same goes for adverbs (大 as おお（いに） (ō[ini]), and verbs (与 as あた（える） (ata[eru]). I know this looks a tad clumsy, but I think either this or something like it is needed for accuracy.--Hikui87 17:18, 26 May 2007 (UTC)Reply

This isn't really about transliteration (to romaji), as the problem is the same with kana. The romaji just follows the kana. Better places for discussing this would be Wiktionary talk:About Japanese and Wiktionary:Beer_parlour.

The usual way to indicate okurigana boundary is in my experience e.g. ちい・さい, though some dictionaries do as Wiktionary does now and don't indicate it at all (ちいさい). Not indicating it is the easy way, avoiding any choices. It's not too pretty though. Indicating only the standard (or the most common where there is no standard) okurigana would be one way which I think would be an improvement. Part of the work could even be done by a bot if Template:ja-okurigana is used in more entries to indicate the standard okurigana. -- Coffee2theorems 20:42, 26 May 2007 (UTC)Reply

ん = n or m before other labial consonants (b/m/p)

Latest comment: 12 years ago6 comments4 people in discussion

The Wikipedia article on [Hepburn romanization] says:

In traditional Hepburn, ん is written as m before other labial consonants, i.e. b, m, and p. In revised Hepburn, the rendering m before labial consonants is not used, being replaced with n.

This policy page suggests the use of Hepburn without specifying which variety. It does however suggest later that the Hepburn romanization charts in the above article should be used, and in said charts the used of 'm' for ん is not shown. So, if it is agreed that 朝日新聞の群馬版は中途半端 should be Asahi Shinbun no Gunma-ban wa chūto-hanpa not Asahi Shimbun no Gumma-ban wa chūto-hampa would it not be a good idea to mention this specifically in the text? Of course, there is the additional problem that e.g. 朝日新聞 uses Shimbun not Shinbun for the transliteration of its own name. --Ozaru 11:25, 14 June 2008 (UTC)Reply

ん can be n, m (before b, m, and p) and ng (before k and g). Personally, I think it's more useful for a dictionary to show how a word is actually pronounced, even if approximate. I took one class years ago using revised Hepburn, and it was just confusing for people, because they would end up mispronouncing words (e.g., en-pitsu instead of em-pitsu). Why not use n, m, and ŋ?
Ulmanor (talk) 19:01, 6 August 2012 (UTC)Reply

True or false?: Japanese is the only language for which Wiktionary provides no accurate pronunciation guide. It seems like the current system is designed to help one write Japanese, not speak it, and ignores m-, b-, p-, ng- preconsonant pronunciations. How does this make any sense for a dictionary? Why not have IPA for Japanese words?
Ulmanor (talk) 07:34, 10 January 2013 (UTC)Reply

Wiktionarians are not paid to do the job. You can't complain that someone hasn't volunteered to do it or hasn't done it for every entry. Be our guest and add IPA to Japanese entries. There are many thousands Japanese entries waiting.
Transliteration is not changed arbitrarily but by agreement. I personally oppose replacing "n" for "m" (before b, m, and p) and "ng" (before k and g). A basic familiarity with the Japanese phonology will cover that and we are using a standard transliteration, familiar to majority of users, learners, academics and dictionary publishers. --Anatoli ^{(обсудить}/^вклад) 08:14, 10 January 2013 (UTC)Reply

But most people looking up words on Wiktionary (i.e. anyone who doesn't write Japanese) don't have a basic familiarity with Japanese phonology. I'm not complaining; I'm just trying to understand how this came to be. I guess I'll start adding IPA… Ulmanor (talk) 22:23, 10 January 2013 (UTC)Reply

While changing n to m makes sense, I agree with not doing it. My own impression is that not changing n to m is the most common style. I would add pronunciation sections myself but I don't know how to use IPA well enough and I have my hands full already. Especially in cases with words with "np" etc. the entry isn't complete without a pronunciation section, I admit. Anyone is more than welcome to add pronunciation information, and I hope someone will. --Haplology (talk) 06:04, 11 January 2013 (UTC)Reply

Diacritics (2)

Latest comment: 16 years ago2 comments1 person in discussion

New thread; previous discussion #No diacritics was mostly a dialog between Coffee2theorems and 210.138.88.178

To summarize the discussion, there appear to be two issues:

reading: Can computers display macrons? Can readers understand them?
writing: Can people enter macrons?

The points raised were (summarized with my slant):

display: all newer computers can display macrons fine, but many older ones (particularly in East Asia) cannot. Supporting legacy encodings is a pain (particularly in East Asia).
understanding: complete novices to Japanese may disregard or be confused by macrons, but there is no way to display おう (ō/ou/oo) so that it is intuitive to English-speakers, as ō is unfamiliar, and oo is pronounced as /u/, enPR: o͞o. Anyone who has studied any Japanese romanization at all is completely comfortable with macrons, and they are used pervasively.
writing: Macrons are a pain to type; they can be accommodated by entry templates.

Note also that pronunciation guides are not generally given for Japanese words, since the writing is largely phonetic. (However, for various features or dialects one would need to do this.)

Based on this, I would suggest:

display: Legacy problem. Will resolve in time.
understanding: Coffee2theorems points out that novice users, who are part of our mandate, will be confused [by macrons] – and indeed I think will be confused by any romanization. One solution is to add a link to help (“how to read the romanization/Hepburn”), as {{IPA}} and {{enPR}} do. This can easily be done by adding suitable code to Category:Japanese inflection templates ({{ja-noun}} etc.). We could imitate w:Template:Nihongo, which adds a help link like this^? to Japanese words, using the code:

<span class="t_nihongo_help"><sup>[[Help:Japanese|<span class="t_nihongo_icon" style="color:#00e;font:bold 80% sans-serif;text-decoration:none;padding:0 .1em;">?</span>]]</sup></span>

input: This is part of a more general discussion – how to access non-Latin scripts, particularly if the standard romanization uses diacritics? (What if you want to look up a Sanskrit word? Vietnamese?) The two solutions that come to mind are:

Have (per-language) input templates, so people don’t need to configure input methods per language.
Have standard diacritic-free romanizations (e.g., just drop the diacritics), and have these be soft/hard redirects to the proper romanization.

So to summarize:

understanding: Should we write Help:Japanese and add links to it to Category:Japanese inflection templates, as per above code? (Or just link to existing Wikipedia page? Probably worth writing a short pronunciation page at least.)
input: Shall we have a discussion at WT:BP about non-Latin scripts?

Nbarth (email) (talk) 23:53, 14 June 2008 (UTC)Reply

In fact, there is such a template – {{Nihongo}} (presumably from Wikipedia) – but it is little-used.

Nbarth (email) (talk) 20:15, 29 June 2008 (UTC)Reply

Several errors fixed

Latest comment: 13 years ago3 comments2 people in discussion

There were several errors on this page when it comes to the long vowels.

There is never a diaresis used in Hepburn romanization. The correct way to differentiate between the long u in 十 and the u-u in 湖 is simply by writing one as ū and the other as uu.
I never takes a macron as a long vowel unless it is in a loan word and the chōonpu is used. A and E rarely get macrons. The only time they do is when they arise from the words 母 (kā), 婆 (bā), or 姉/姐 (nē).

—Ryūlóng (竜龍) 01:38, 12 December 2011 (UTC)Reply

Update: ā and ē do not exist in the Hepburn system unless they are accompanied by the ー.—Ryūlóng (竜龍) 11:36, 12 December 2011 (UTC)Reply

@Ryulong --

Just so it's clear, I thought I'd mention that the Wiktionary:About Japanese/Transliteration page is currently only a draft, and is not official policy inasmuch as such exists -- from my months here being more actively involved, I get the impression that WT runs more on consensus among the users for any given language, and that at least some of our official policy pages for specific languages may be used more as general guidelines. That said, if and when we achieve strong consensus about the content of the Wiktionary:About Japanese/Transliteration page, it may be folded into the Wiktionary:About Japanese page.

That said, your statement about ā and ē is at odds with the content at w:Hepburn_romanization#Long_vowels, which clearly shows that modified Hepburn does indeed use ā and ē in cases without the ー, such as for おかあさん or おねえさん.

And after saying that, I must point out also that the romanization system we use here at WT may differ from Hepburn, if enough of the active contributors for Japanese so decide. The Wiktionary:About Japanese/Transliteration page is intended as a draft description of the romanization system to be used here at Wiktionary, and not necessarily as a description of Hepburn, or Kunrei, or any other system. -- Hope this clarifies, Eiríkr Útlendi │ Tala við mig 19:03, 16 December 2011 (UTC)Reply