Template talk:ja-pron

From Wiktionary, the free dictionary
(Redirected from Module talk:ja-pron)
Latest comment: 1 hour ago by Shlyst in topic Multiple accent phrases
Jump to navigation Jump to search

Initial discussion

[edit]

Discussion moved from Talk:精神分裂病.


Having done Module:ko-pron, I'd like to start the work on Module:ja-pron now. Wyang (talk) 03:19, 16 April 2014 (UTC)Reply
@Eirikr. Might as well move this discussion to a more visible place. ;) --Anatoli (обсудить/вклад) 03:23, 16 April 2014 (UTC)Reply
  • @Eirikr OK. I'll think of a different location. If you have a way to contact Haplology, please let him know we need him here! Take it easy and come back when you can.
  • @Wyang. I'm happy to do the testing and using the future module but I'd need to brush up my IPA for Japanese. We'll have to rely on your skills and knowledge again. :) がんばってね!--Anatoli (обсудить/вклад) 06:17, 16 April 2014 (UTC)Reply


Thanks. I've done a crude version at Template:ja-pron and Module:ja-pron. Currently,

{{ja-pron|せいしん ぶんれつ びょう|acc=h|y=on}}

generates

Lua error in Module:parameters at line 858: Parameter "y" is not used by this template.

Any suggestions? Wyang (talk) 07:52, 16 April 2014 (UTC)Reply

Well done! No suggestions yet but some documentation would be helpful, specifically on parameters and types of accents. It's for 標準語, isn't it or for any variety? Should the template link to/mention the variety name? --Anatoli (обсудить/вклад) 08:01, 16 April 2014 (UTC)Reply
Can we also have Accent: 0, Accent: 1, 'etc. next to accent names? --Anatoli (обсудить/вклад) 08:14, 16 April 2014 (UTC)Reply
It would refer to standard Japanese, although I don't know where that information should be placed. How do the accent numbers correspond to accent types 'h,o,a,n'? Japanese pitch accent isn't very helpful. Do they refer to the accented morae? (in which case, nakadaka could get more than one number, correct?) Wyang (talk) 13:10, 16 April 2014 (UTC)Reply
Letters must be the best way, then. Since we don't know if a given transcription is for the standard Japanese, I'll drop this request as well. They can be marked in brackets for any variety, potentially. Do you think we should have a default IPA info, when the pitch is unknown? Unfortunately, I don't have enough sources for the Japanese accents, nothing online, only some old Japanese-Russian dictionaries with accents. --Anatoli (обсудить/вклад) 22:02, 16 April 2014 (UTC)Reply


  • Looks good. I tweaked the module to replace ɽ with ɾ̠. The former is a retroflex tap, not used in Japanese, while the latter is more generally accepted as the /r/ tap preceding an /a/ sound.
@Atitarev you're talking about the numbers used in some dictionaries to indicate the number of the syllable right after which the downstep in pitch occurs, yes? Or something else? If you mean the downstep syllable, calling it accent isn't quite correct. If you mean something more like dialect, maybe some other less ambiguous term could be used.
@Atitarev, Wyang {{ja-accent-common}} refers to 標準語 as defined by Tokyo standard pronunciation and the NHK pronunciation guidelines for broadcasters. I haven't seen any resources that give pitch information for any other dialects, but I would be quite happy to include those, provided we can find such resources.
With that in mind, is there any easy way to make the module, um, modular :), to allow for pluggable pitch sub-modules or functions? I haven't gone through the code at all really, I just made that one change to swap out ɽ.
Also, @Wyang the pitch drop on long vowels looks plug-ugly at カリフラワー, at least on my machines -- the ` invalid IPA characters (`) that's supposed to show the lower pitch shows up too far to the right, so that it's not even over the vowel at all, appearing instead as a stray mark over the closing square bracket.
Lastly, combos like 'çʲj' don't quite work -- this should just be 'çj' instead, with just the main palatal glide of /j/.
Cheers folks, thank you for your help with this! ‑‑ Eiríkr Útlendi │ Tala við mig 22:03, 16 April 2014 (UTC)Reply
  • Oh, and to clarify, when I say I haven't seen any resources that give pitch information for any other dialects, I mean on a word-by-word basis. Shibatani and others do discuss the broad trends of pitch patterns, but the only lexicographical information about pitch that I've seen in actual dictionaries and the like has been for Tokyo dialect. ‑‑ Eiríkr Útlendi │ Tala við mig 22:05, 16 April 2014 (UTC)Reply
  • Oo, also, じ should be rendered as d͡ʑi, not (d͡)ʑʲi (not really clear what the parens are doing there, and no need for the small "j"). C.f. 御御籤.
For compatibility (and legibility) purposes, {{ja-pron}} should support yomi as a synonym param for y.
And, how / where do we put the reference footnote? If I put it right after the call to {{ja-pron}}, the footnote shows up on the IPA line -- which isn't correct, since I'm using the reference for the pitch accent, not the IPA. See 御御籤 again for an example.
Thank you again! ‑‑ Eiríkr Útlendi │ Tala við mig 22:27, 16 April 2014 (UTC)Reply
@Wyang Re: "nakadaka could get more than one number, correct". Does that mean than "n" may not be sufficient? I'll dig up my old dictionaries/textbooks and check if there is a straightforward mapping between numbers and letters (pitch accent names). --Anatoli (обсудить/вклад) 22:32, 16 April 2014 (UTC)Reply
Pitch and names and numbers:
  • Heiban (平板, “flat”): Pitch rises after first syllable, falls gradually thereafter. Pitch number 0 -- no downstep.
  • Atamadaka (頭高, “high head”): First syllable takes high pitch, downstep immediately thereafter. Pitch number 1 -- downstep after first mora.
  • Nakadaka (中高, “high middle”): Pitch rises after first syllable, downstep after some number of morae. Can only apply to terms with at least 3 morae. Pitch number varies, must be at least 2, and less than the total number of morae in the term -- downstep after mora indicated by number.
  • Odaka (尾高, “high tail”): Pitch rises after first syllable, downstep after last mora. Pitch number varies, must be the number of the last mora in the term -- downstep after mora indicated by number. For odaka terms, the downstep is actually heard on the following particle.
Hope that clarifies! ‑‑ Eiríkr Útlendi │ Tala við mig 23:45, 16 April 2014 (UTC)Reply
Thanks. You have just confirmed that for Nakadaka there are variants. That means that just using "n" won't produce 100% correct pitch accent. Users may want to know when the downstep starts on long words, right? Please clarify. --Anatoli (обсудить/вклад) 00:08, 17 April 2014 (UTC)Reply

Thanks.

  1. 'replace ɽ with ɾ̠' - great
  2. 'the accent fall on long vowels looks plug-ugly'
    Um... not sure how to solve this. Unicode only has combined grave-macron for e (ḕ) and o (ṑ). It is caused by the font formatting <tt>kàrífúráwā̀</tt> (kàrífúráwā̀). cf. normal unformatted 'kàrífúráwā̀'. We could decompose it into single vowels, eg. kàrífúráwàà, or use either no formatting or some other font (which I don't know). For now I have decomposed it into single vowels and it now looks like Navajo.
  3. Palatalisation. It is currently consistently marked. Questions are:
    1. Should it be consistently marked? eg. mi -> mʲi, ki -> kʲi, çj -> çʲj. I have removed this now.
    2. Should it be marked by default after ɕ, t͡ɕ and d͡ʑ, or when those are followed by non-'ij', or not at all? This version of 統合失調症 used the second option. I have changed it to match that.
  4. About '(d͡)ʑ'... Japanese phonology says d͡ʑ ~ ʑ and d͡z ~ z are in free variation for romaji 'j' and 'z', respectively. Hence the notation there... Should these be written as 'd͡ʑ' and 'd͡z' regardless of the environments they are in? I have converted them to 'd͡ʑ' and 'd͡z' for now.
  5. I have added |yomi, |accent, |accent2, |acc_loc, |accent_loc (these are 'Tokyo' by default), |acc_ref, |accent_ref ('DJR' by default), |acc2_loc, |accent2_loc ('Tokyo' by default), |acc2_ref, |accent2_ref ('DJR' if |acc2 exists).

Anatoli: The single-letter accent types in that template mainly match {{ja-accent-common}}, except 'nakadaka' which needs further specifying. Thus |acc=o (o), |acc=a (a), |acc=h (h), |acc=2,3 (n), |acc=2,2 (n). I'm not sure how the accent numbers correspond to this. Maybe they are positions of accented morae? Wyang (talk) 00:20, 17 April 2014 (UTC)Reply

  • @Wyang re: single-letter accent types, see above about Pitch and names and numbers.  :) Theoretically, it should be possible to specify h, a, or o without needing any number. Only nakadaka would require a number to be able to figure out where the downstep happens. As such, one should ideally be able to specify nakadaka using 'nX, where X is the number, or by using X alone.
For that matter, it might make sense to allow accent types to also be specified by number alone, where 0 or 1 would be heiban or atamadaka respectively, and any greater value would wind up as odaka or nakadaka, depending on how many morae are in the term.
  • Re: d͡ʑ ~ ʑ and d͡z ~ z, d͡z happens, but is rarer. Likewise, ʑ happens, but is rarer. This is mostly an issue of geographical variations in dialect. For NHK purposes (i.e. one of the closest things to a standard pronunciation), my understanding is that romaji "j" == [d͡ʑ], and romaji "z" == [z]. This gets complicated, but it might make sense in the longer term to add a param to allow for specifying this variation, since I think it might sometimes be contrastive and / or emphasized in certain careful speech.
Similarly, whether or not certain /i/ or /u/ sounds are unvoiced should also be specifiable. is つき in hiragana, and is usually [t͡sɯ̥ᵝki] as I hear it. Meanwhile, 付き as in about, regarding is also つき in hiragana, and is usually [t͡sɯᵝki] as I hear it. So it's not really possible to tell just from the kana spelling whether a given /i/ or /u/ is unvoiced.
How is non-initial /g/ handled? Are both g/ŋ produced? Please demonstrate on ありがとう.
To me, it seems most Japanese who start learning foreign languages late, have difficulty pronouncing /ʑ/, even if they make an effort. :) --Anatoli (обсудить/вклад) 01:11, 17 April 2014 (UTC)Reply

Thanks.

  • For arigatou:
  • IPA(key): [a̠ɾʲiɡa̠to̞ː]
  • d͡z: changed to 'z'.
  • For vowel devoicing: There was a rule in the module, which devoices vowels between voiceless consonants, and then only keeps the first when two devoiced vowels occur in adjacent morae. I have removed that rule and added a |dev= parameter. Please see Template:ja-pron/documentation.
  • I have added |acc_note, |accent_note, |acc2_note, |accent2_note, which are placed at the end of the accent line.
  • Accent types '0' and '1' treated as 'h' and 'a'. If not single characters 'hao01', then remove 'o'. If resulting string is equal to the length of text, then 'o'. If not, then 'n'. eg. |acc=0 (h), |acc=1 (a), |acc=h (h), |acc=2 for 2-morae word (o), |acc=3 for 3-morae word (n3), |acc=o for 5-morae word (o).
  • I have added an accent reference template so that |acc_ref=NHK etc. can now call the reference template. (Template:ja-pron/documentation)

How about now? Wyang (talk) 02:10, 17 April 2014 (UTC)Reply

More thoughts :) --
  • There are sometimes more than just two pitch accent patterns. The most I can recall running into is three, but I suppose it's possible that a handful of terms might even have four.
  • I'm changing the description for the dev param -- the number should really be described as the mora number, as some syllabic analyses would give incorrect results. For instance, かんした could be analyzed as having two syllables (sounding like /kan.ɕta/ in casual speech), but four morae, and the devoiced mora is the third one.
  • For references, I think it's best to have the default be nothing. There are terms where Daijirin doesn't include any pitch accent, and I've misplaced my NHK pronunciation dictionary (probably in a box in storage), but I work with native speakers and sometimes crib from them. In these cases, I deliberately don't list any reference, since there isn't really any -- but I think the pitch information is important enough to include, until such time as I can find a real reference to add.
Also, by You can also do it the conventional way, |acc_ref=[1] -- do you mean that it's possible to add the call to {{R:Daijisen}}, etc., directly as the acc_ref param value?
Thanks again, again! :D ‑‑ Eiríkr Útlendi │ Tala við mig 19:23, 17 April 2014 (UTC)Reply


  • I have added |acc3, |acc4 and |acc5, since there might be occasions where non-Tokyo accent patterns would like to be specified too.
  • 'dev': I think I might have described it inaccurately... As the module analyses it, the 'dev' parameter is the position of the devoiced syllable in the kana string. eg. hyakushou should have |dev=3 (not 2), and だいこん やくしゃ should have |dev=6 (spaces are not counted). This is inconsistent with the format of the accent parameter, but I think it is easier to specify and easier for the module to handle.
  • 'ref': Oops, I forgot to put nowiki tags around it. It should read |acc_ref=<ref name="NHK">{{R:NHK Hatsuon}}</ref>.
  • 'default ref': I have removed the default reference, so that there is no reference listed when the parameter is unspecified.
  • 'dehijacking the talkpage': I agree... Hence it is here now.

Thanks. Wyang (talk) 22:01, 17 April 2014 (UTC)Reply

References

[edit]


Long vowel oddity spotted at パーソナルコンピューター

[edit]

Just created this entry, and noticed that the ピュー got romanized oddly in the romaji-with-tone-marks bit in the pronunciation section:

[pàásónárúkóńpyúーtàà]

I'm about to log off for the night. If you have time, could someone look at the module and see what's going on there?

Cheers, ‑‑ Eiríkr Útlendi │ Tala við mig 04:53, 20 April 2014 (UTC)Reply

Sorry for the delay. Fixed now. Wyang (talk) 08:02, 22 April 2014 (UTC)Reply

Pitch on moraic /n/ > ん

[edit]

I was reformatting 日本 and noticed that the downstep that occurs on the final ん isn't being indicated in the romanized version with tone marks. For instance, {{ja-pron}} is giving [nìhón] and [nìppón], when it should be outputting [nìhóǹ] and [nìppóǹ] instead.

For that matter, even if there were no downstep, the template should still show tone marks for moraic /n/. Could that be fixed? ‑‑ Eiríkr Útlendi │ Tala við mig 05:59, 21 April 2014 (UTC)Reply

Thanks, I think it's fixed now. Wyang (talk) 08:02, 22 April 2014 (UTC)Reply

Displaying numbers next to pitch accent names

[edit]

@Wyang It would be good to add numbers next pitch accent names, similar to how some paper and online dictionaries mark accents, e.g. 現在 on Weblio. it would make it easier to cross-reference Wiktionary accent names to those numbers. User:Eirikr seems to agree. Do you think it's a good idea? --Anatoli (обсудить/вклад) 01:45, 20 June 2014 (UTC)Reply

Added now. Wyang (talk) 02:22, 20 June 2014 (UTC)Reply
Thank you. I've added [ ]. Is that OK? From what I've seen so far, either a superscript number is used or a number in square brackets. --Anatoli (обсудить/вклад) 03:01, 20 June 2014 (UTC)Reply
Yes, please prettify anything. :) Wyang (talk) 03:58, 20 June 2014 (UTC)Reply

宿題

[edit]

@Wyang, @Eirikr

It didn't work on 宿題 (しゅくだい), the しゅ part. --Anatoli (обсудить/вклад) 02:19, 2 July 2014 (UTC)Reply

Why is |dev=1? I thought vowel devoicing only occurs interconsonantally. Wyang (talk) 06:48, 2 July 2014 (UTC)Reply
It's between two devoiced consonants ɕ and k. Same with しかし (working fine) and 少し (adding now). NHK even uses a similar notation to ours for devoiced vowels. --Anatoli (обсудить/вклад) 07:20, 2 July 2014 (UTC)Reply
It should be |dev=2 instead. |dev= is the position of kana with devoiced vowel in the input kana string. Wyang (talk) 07:31, 2 July 2014 (UTC)Reply
Oh, thank you. Silly me. :) --Anatoli (обсудить/вклад) 07:37, 2 July 2014 (UTC)Reply
  • Just saw this thread again. Wyang, when kana compounds like しゅ are devoiced, the whole thing should be marked as devoiced, like しゅ. Marking it as し makes it look like [ɕiɯ̥ᵝ] or some such oddness, when what we want to indicate instead is [ɕɯ̥ᵝ] or [ɕʲɯ̥ᵝ].. ‑‑ Eiríkr Útlendi │ Tala við mig 01:40, 5 March 2015 (UTC)Reply
@Eirikr I didn't notice this, thanks. You're right. I'm also inviting you to join Wiktionary:Beer_parlour/2015/February#Simplification_of_topic_categories_adding, which may affect Japanese categories, hopefully for the better, if implemented. --Anatoli T. (обсудить/вклад) 02:00, 5 March 2015 (UTC)Reply

dev2, dev3?

[edit]

@Wyang Frank, can there be more dev's, please, as in 蛋白質 to get [tã̠mpa̠kɯ̥ᵝɕit͡sɯ̥ᵝ], e.g. ...|dev=4|dev2=6...? --Anatoli T. (обсудить/вклад) 03:22, 28 January 2015 (UTC)Reply

OK, second devil added. I want to rewrite its code... so that the devils can be written as たんぱ(く)し(つ), avoiding the need for |dev11=. Keep it like this for now, I will change the format if I ever get around to doing that... Wyang (talk) 03:43, 28 January 2015 (UTC)Reply
Ah, thanks. I guess all uses will need to be updated? --Anatoli T. (обсудить/вклад) 03:50, 28 January 2015 (UTC)Reply
That's not a big problem when done semi-automatically. We've managed to do all the Chinese format changes... Wyang (talk) 03:53, 28 January 2015 (UTC)Reply
You're genius. :) --Anatoli T. (обсудить/вклад) 04:18, 28 January 2015 (UTC)Reply
@Wyang Hi Frank, I'm back in Melbourne after three weeks in France (also a bit of Belgium) I'm eager to see the change, as 少し and しかし also need to be fixed. :) --Anatoli T. (обсудить/вклад) 01:20, 5 March 2015 (UTC)Reply

Bug with sutegana at the beginning of a term

[edit]

{{ja-pron|ふぁふぃとぅふぇふぉふぁ|acc=1}} {{ja-pron|ふぃとぅふぇふぉふぃ|acc=1}} {{ja-pron|とぅふぇふぉふぃ|acc=1}} "fúァ" and "fúィ" probably isn't desirable—umbreon126 07:37, 7 March 2015 (UTC)Reply

'tis okay now —umbreon126 05:11, 27 March 2015 (UTC)Reply
Thanks, User:Wyang! --Anatoli T. (обсудить/вклад) 05:19, 27 March 2015 (UTC)Reply
No worries. :) Wyang (talk) 20:58, 27 March 2015 (UTC)Reply

Delimiting vowels

[edit]

For example, on 女王 (applies to the readings じょおう [2] and にょおう [2]), there is a need to delimit the vowels for the IPA to render properly, but when this is done using . as is done in ja-noun, the . is printed in the kana and it also messes up the accent because the dot gets counted as a kana. Nibiko (talk) 08:25, 4 June 2015 (UTC)Reply

Fixed by Kc kennylau! Thank you so much, Kc kennylau! <3 Nibiko (talk) 04:36, 25 April 2016 (UTC)Reply

Twofold long vowels

[edit]

Different to my above-mentioned concern, I noticed that on 蓊鬱 おううつ is represented as òóótsú just before the section where it says Heiban. I would expect it to be òóútsú. Nibiko (talk) 03:12, 24 August 2015 (UTC)Reply

[çiβa̠kɯ̥ᵝɕʲa̠]

[edit]

It gives the pronunciation [çiβa̠kɯ̥ᵝɕʲa̠] for 被爆者. I don’t know where this [β] comes from. The intervocalic /ɡ/ is often realized as a fricative but the phoneme /b/ doesn’t change at least in my pronunciation. In addition, [ɕʲ] is redundant because [ɕ] is already palatalized. — TAKASUGI Shinji (talk) 12:27, 9 September 2015 (UTC)Reply

@Wyang. --Anatoli T. (обсудить/вклад) 00:03, 14 July 2016 (UTC)Reply
Thanks. [β] is from Japanese phonology#Weakening - should the rule be removed? Removed palatalisation of [ɕʑ]. Wyang (talk) 00:11, 14 July 2016 (UTC)Reply
I think we should remove the rule of β. — TAKASUGI Shinji (talk) 06:11, 14 July 2016 (UTC)Reply
Ok no problem. Removed. Thanks! Wyang (talk) 09:01, 14 July 2016 (UTC)Reply
Thank YOU! — TAKASUGI Shinji (talk) 00:53, 15 July 2016 (UTC)Reply

Sorting

[edit]

Can a sort parameter be added to the template? —britannic124 (talk) 16:28, 13 July 2016 (UTC)Reply

  • Since this template already uses a kana-ized string as its primary input, a sort key shouldn't be necessary.
I do see that the underlying module is not changing katakana to hiragana for sorting purposes, but this should be fixed in the module itself, so that the sort key is correctly and automatically derived from the data that the module is already using. @Wyang, is that something you could do? If not, could you ping someone who could? ‑‑ Eiríkr Útlendi │Tala við mig 18:14, 13 July 2016 (UTC)Reply
Sure, I added sortkeys to the IPA and audio categories. It's a bit of an ugly hack though, since those templates do not seem to support |sort=. Wyang (talk) 22:20, 13 July 2016 (UTC)Reply

ん (‘n’) before approximates

[edit]

Shouldn’t “n” before “w” be represented as [ɰ̃ᵝ], like in “denwa” [dẽ̞ɰ̃ᵝɰᵝa̠]? (Or least [dẽ̞ɴɰᵝa̠]?) And “n” before “y” as [j̃], like in “shin’ya” [ɕĩj̃ja̠]? —britannic124 (talk) 18:02, 2 September 2016 (UTC)Reply

Recent change to Module:ja-pron

[edit]

Hi @Eirikr! Just letting you know that there was a change to Module:ja-pron recently by User:Nardog. I'm not qualified to comment on the IPA changes, but I know you are definitely. :) Wyang (talk) 09:17, 17 May 2017 (UTC)Reply

dev2

[edit]

Since the Japanese attention category has no organisation, I'm leaving my concern here. It would be good if you could override the value of the dev parameter for a certain accent, as this would allow to express exceptions in a single use of the template. See 増幅器 and 屹度. Nibiko (talk) 13:02, 19 June 2017 (UTC)Reply

Co-occurring pitch accents

[edit]

ja-pron does not currently support co-occurring pitch accents. If a term has multiple pitch accents divided across words, then see 因果応報 and 一期一会 for the current way to format them. Nibiko (talk) 02:29, 29 June 2017 (UTC)Reply

Error in display of devoiced vowels

[edit]

{{ja-pron|だいこん やくしゃ|acc=5|acc_ref=DJR,NHK|dev=6|y=o}}

currently yields (refs removed)

[dàíkóń yáꜜkùshà] should be [dàíkóń yáꜜkùshà], to match the hiragana (and because apparently only u and i can be devoiced). Something must be making the function think there's an extra vowel or mora somewhere before the second word. — Eru·tuon 18:42, 20 August 2017 (UTC)Reply

(I think that part of {{ja-pron}} is somehow generally problematic. (麻婆豆腐 [màbóódóꜜòfù], 野馬 [nóꜜòmà]) —suzukaze (tc) 09:14, 21 August 2017 (UTC))Reply

Distinguishing [oɯ] from [oː] and [ei] from [eː]

[edit]

Though relatively rare, [oɯ] and [oː] and [ei] and [eː] do contrast in Japanese, as in ō ('king') vs. ou 追う ('to pursue'), mei-sha 名車 ('great car') vs. 目医者 me-i-sha ('eye doctor'). As far as the IPA is concerned this isn't much of a problem since e.g. {{ja-pron|o.u}} yields [o̞ɯ̟ᵝ], but this workaround wouldn't work as soon as |acc= is introduced. They should be supported in some way or another. Nardog (talk) 08:48, 28 August 2017 (UTC)Reply

Or rather, maybe all instances of [oː] and [eː] should be represented by おー and えー instead of おう and えい, so that おう and えい would always stand for [oɯ] and [ei], as in せーしん ぶんれつ びょー instead of せいしん ぶんれつ びょう. This would make it clearer too because おう and えい being restricted to [oː] and [eː] is inherently ambiguous since, orthographically, they could always represent either [oː]/[eː] or [oɯ]/[ei]. Nardog (talk) 08:54, 28 August 2017 (UTC)Reply

は/へ vs. わ/え

[edit]

I also noticed the particles は and へ would still be shown as は and へ in the kana representation even though they are pronounced [ɰa] and [e], not [ha] and [he]. But I believe, since the kanas are in and of themselves phonetic symbols, in pronunciation illustrations, they should be the phonetic わ and え instead. (By extension one could argue を should be お too, but since there's no pronunciation variation in を and some do preserve [ɰo] for を so I don't support changing it as strongly as for the other two.) Nardog (talk) 11:31, 28 August 2017 (UTC)Reply

IPA module

[edit]

Hey @Wyang, is there a reason this template doesn't call format_IPA_full in Module:IPA? I ask primarily because this module is adding non-entry pages to Category:Japanese terms with IPA pronunciation. Thanks! —JohnC5

@JohnC5 If I remember correctly, it was because the IPA module does not allow sortkey categorisation. Wyang (talk) 05:26, 28 September 2017 (UTC)Reply
@Wyang: How about now? ;PJohnC5
@JohnC5 All right! Changed. Wyang (talk) 05:45, 28 September 2017 (UTC)Reply
@Wyang: Thanks! That fixed it. —JohnC5 06:01, 28 September 2017 (UTC)Reply

日向

[edit]
Discussion moved from User talk:Wyang#Japanese pronunciation oddity.

Hello Wyang, long time no write.  :)

I was cleaning up the 日向 entry, and found that the pronunciation given at 日向#Etymology_3 is a little weird. It's showing up as [çɨᵝːɡa̠], when it should be something more like [çjɯːɡa̠]. I don't suppose you could have a look at the module code? ‑‑ Eiríkr Útlendi │Tala við mig 05:47, 15 September 2017 (UTC)Reply

(@Nardogsuzukaze (tc) 05:48, 15 September 2017 (UTC))Reply
@Eirikr: It's perfectly accurate. See w:Japanese phonology. /u/ [ɯ̟ᵝ] becomes centralized to [ɨᵝ] after /j/, and /hj/ is [ç]. Vance (2008) had [çj], but this was criticized by Akamatsu. Nardog (talk) 05:55, 15 September 2017 (UTC)Reply
Hello Eirikr! Long time no write. I will hand the mic now to ... Wyang (talk) 06:08, 15 September 2017 (UTC)Reply
(after edit conflict... :) )
@Nardog:, there is a definite glide in the ひゅ sound as pronounced by speakers, which is not represented anywhere in [çɨᵝ]. The IPA is thus misleading.
Akamatsu seems to make the argument that there is no /j/ glide anywhere after /ç/, which is frankly baffling to me, as this does not match my experience at all. It leads me to wonder if he's describing a dialect, or if his home lect might be biasing his interpretation.
FWIW, I'm more interested in descriptively representing Japanese sounds, rather than hewing to any particular academic theory. ‑‑ Eiríkr Útlendi │Tala við mig 06:15, 15 September 2017 (UTC)Reply
@Eirikr: [ç] is a palatal fricative, i.e. [j̝̊], so it is only natural a [j]-like sound is heard during the transition from [ç] to [ɨᵝ] (that is why they're called "glides" in the first place). Hence [çj] is inherently redundant, unless some language somewhere contrasted [çV] and [çjV]. Nardog (talk) 06:40, 15 September 2017 (UTC)Reply
@Nardog: I'm familiar with palatal fricatives, but it seems I've spent too much time working on /phonemics/ and not [phonetics]. I'm happy to concede the point. ‑‑ Eiríkr Útlendi │Tala við mig 07:19, 15 September 2017 (UTC)Reply
FWIW, it would also be very inconsistent if it were [çj]. All consonants are palatalized before /i, j/ either phonetically ([kʲ], [ɡʲ], [mʲ]...) or phonologically ([ɕ], [tɕ], [(d)ʑ]...). So if /h/ became [çj] before /i, j/, or, even more capriciously, /hi/ became [çi] but /hj/ became [çj], that would be quite an exception. Nardog (talk) 06:37, 29 September 2017 (UTC)Reply

ざ・ず・ぜ・ぞ

[edit]
Discussion moved from User talk:Wyang#More JA pronunciation questions.

I noticed that ざ・ず・ぜ・ぞ are now being rendered by the module with initial consonant [d͡z], indicating a harder onset than I hear around me. I also note that some dialects of Japanese distinguish between づ and ず, which would ostensibly be [d͡zʉ͍] and [zʉ͍].

Do you have any insight? ‑‑ Eiríkr Útlendi │Tala við mig 04:57, 28 September 2017 (UTC)Reply

I felt we discussed at Template talk:ja-pron before when designing the template. Pinging @Nardog for his or her opinion. Wyang (talk) 05:20, 28 September 2017 (UTC)Reply
It's in free variation, so we can't say for certain it's one or the other. The compromise the template currently adopts is to treat /z/, /zi–di/, /zu–du/, /zj–dj/ as affricates when word-initial or after /N/ and as sole fricatives when intervocalic.
Do the speakers around you at least pronounce it with the tip of the tongue at first in contact with the roof of the mouth? If so it's without a doubt an affricate, although it might not be as striking as cards in English. In fact most speakers of Standard Japanese can't (and don't realize they can't) pronounce English zoo or French genre properly without training, or grasp the difference between cars and cards.
Not only is the number of speakers who still make the distinction between /zu/ and /du/ very small (see the map at w:Yotsugana), but the distinction is not represented in orthography in many cases since the spelling reform of 1946. So we can't possibly integrate the pronunciation for speakers without the neutralization into the template. (Also note that even in words still spelled with ぢ or づ, speakers who have /zu, du/ and /zi, di/ neutralized might still pronounce them as [z]/[ʑ].) We can of course manually add non-neutralized pronunciations on the entries for relevant words, though.
See Labrune (2012:64–66) for more. Nardog (talk) 07:40, 28 September 2017 (UTC)Reply
@Nardog: Re: yotsugana, thank you for the link, I couldn't think of the term earlier this evening. FWIW, one of my teachers years ago was from Kyushu and made the four-way distinction. Later, I lived in the Tōhoku, but in Morioka, which appears to be the pocket of yellow on the map; later on, I was in Tochigi, and later still in Tokyo. FWIW, I recall students in Tochigi deliberately overpronouncing づ in names to clarify the spelling, so at least in that rarified context, even Kantō speakers may make some distinction.
I'll keep my ears open at work over the next several days, and see if I can tease out the specifics of articulation by the native speakers around me (mostly Tokyo-ites, with some folks from Kyoto and elsewhere in Kansai). ‑‑ Eiríkr Útlendi │Tala við mig 16:19, 28 September 2017 (UTC)Reply
ご報告楽しみにしてます。;)
Here's a relevant quote from Vance (2008:85–86):

Typically, though not consistently, [dz] occurs at the beginning of a word or in the middle of a word immediately following a syllable-final consonant [i.e. /N/ or /Q/], and [z] occurs in the middle of a word immediately following a vowel. In short, [dz] and [z] are allophones of this /z/ phoneme. Most native speakers of Japanese are quite surprised to discover that there's actually a phonetic difference to worry about, but you'll hear it if you listen carefully to pronunciations of zu [dzɯ] 図 'diagram' and chizu [cɕizɯ] 地図 'map'.

He then goes on to cite the minimal pairs (traditionally spelled くづ) vs. (traditionally くず) and 記事 (traditionally きじ) vs. 生地 (traditionally きぢ), which used to be pronounced differently "until about 400 years ago" but are now both spelled with ず/じ and not distinguished by Tokyo Japanese speakers.
Interestingly, he diverges a little bit from Labrune in saying that, in "careful pronunciation", modern Tokyo Japanese speakers always realize j (じ, じゃ, じゅ, じょ) as [dʑ], as opposed to the "typical, though not consistent," production of z (ざ, ず, ぜ, ぞ) as [z] intervocalically and as [dz] otherwise.
If this account is supported by several other scholars, I'd be willing to change the template's current realization of じ, じゃ, じゅ, じょ, i.e. [ʑ] intervocalically and [dʑ] otherwise, to always [dʑ]. Nardog (talk) 06:16, 29 September 2017 (UTC)Reply
I support your proposition just as a native speaker. For me, /dʑ/ is the base phoneme and [ʑ] is a casual intervocalic allophone, just like the intervocalic allophone [ɣ] for the phoneme /ɡ/. — TAKASUGI Shinji (talk) 00:48, 10 March 2018 (UTC)Reply

Verb ending with "ou"

[edit]

I know that verbs ending with "ou" such as 競う, 囲う and 惑う are pronounced not /o:/ but /ou/. Naggy Nagumo (talk) 08:10, 8 December 2017 (UTC)Reply

Sorry, I can distinguish them by writing like "まど.う". I am sorry for making noise. Naggy Nagumo (talk) 08:15, 8 December 2017 (UTC)Reply

Katakana for IPA

[edit]

Something's going wrong at 首長国: there's a katakana in the IPA transcription. —Mahāgaja (formerly Angr) · talk 14:52, 9 March 2018 (UTC)Reply

And at ニュースキャスター too. —Mahāgaja (formerly Angr) · talk 14:55, 9 March 2018 (UTC)Reply
ニュースキャスター has wrong parameter, I'll fix it. 首長国: When yōon like "しゅ" is devoiced, it seems go wrong. --Naggy Nagumo (talk) 23:18, 9 March 2018 (UTC)Reply
(Notifying Eirikr, Wyang, TAKASUGI Shinji, Nibiko, Suzukaze-c, Dine2016, Poketalker, Cnilep, Britannic124, Fumiko Take, Dine2016): Anatoli T. (обсудить/вклад) 00:19, 10 March 2018 (UTC)Reply
I think the entire module needs to be overhauled (;・∀・) —suzukaze (tc) 00:33, 10 March 2018 (UTC)Reply
Just curious, why do you think so. It's mostly working fine, doesn't it? --Anatoli T. (обсудить/вклад) 00:51, 10 March 2018 (UTC)Reply
It's working, but it seems fragile... —suzukaze (tc) 01:37, 10 March 2018 (UTC)Reply
Re dev: I think it would be much easier if the dev parameters are incorporated into the kana string (e.g. つ'くよみ). Wyang (talk) 00:54, 10 March 2018 (UTC)Reply
@Wyang, @Naggy Nagumo -- my sneaking hunch is that the handling of the devoicing parameter is screwy from the get-go. For reasons unknown to me (I haven't gone through the module codebase), dev uses a different count than acc. While acc is based on the actual mora count, dev appears to be based on the character count -- which will always diverge from the mora count for any term with yōon or other small non-moraic vowel kana (such as ファン or シェル, where the small ァ and ェ are not technically yōon as I've understood it). Since the string processing for dev is based on character count, it seems like the module can incorrectly split up phonographemes like ファ or しゅ, leaving the small kana dangling and unprocessed -- where it then appears in the final output string. ‑‑ Eiríkr Útlendi │Tala við mig 00:59, 10 March 2018 (UTC)Reply
@Eirikr There's the |devm= parameter that counts by mora, added by Kenny before, which is one bug (out of two) less buggy. I've fixed 首長国. Wyang (talk) 02:39, 10 March 2018 (UTC)Reply
@Wyang I think your changes have introduced a Lua error on 少し. Could you please take a look? —Internoob 06:41, 10 March 2018 (UTC)Reply
@Internoob Thank you, it has been fixed. Wyang (talk) 06:50, 10 March 2018 (UTC)Reply
By the way, is there a need to have the y/yomi parameter? It is already present in {{ja-kanjitab}}, and in a few cases it may depend on which spelling you choose as the main entry. --Dine2016 (talk) 14:10, 11 March 2018 (UTC)Reply
{{ja-kanjitab}} describes the spelling, in which case which yomi is in use is potentially useful information. {{ja-pron}} describes the pronunciation, in which case, again, which yomi is in use is potentially useful information. Given the current infrastructure, if we want to have yomi in both places, we need to add the value in both places -- so far as I understand it, the scope is limited such that one template invocation on a page cannot reference any of the parameters given to another template invocation.
Not sure what you mean by "in a few cases it may depend on which spelling you choose as the main entry". The yomi in either {{ja-kanjitab}} or {{ja-pron}} should match the headword for the relevant etymology section. Any given spelling with multiple readings should have a separate etymology section for each reading. ‑‑ Eiríkr Útlendi │Tala við mig 21:13, 13 March 2018 (UTC)Reply
"It may depend" might be referring to cases like 気まぐれ / 気紛れ, in which there is on'yomi/on'yomi+kun'yomi. —suzukaze (tc) 21:17, 13 March 2018 (UTC)Reply
For cases like that, which don't cleanly fit even into 湯桶読み or 重箱読み categories, and for cases of longer mixed-reading compounds, I find myself coming back to the need to revamp {{ja-kanjitab}} (at a minimum) to allow editors to specify yomi for each kanji, not just for the whole term. (I mean, allow specifying for the whole term where that fits, but also allow per-kanji yomi values where whole-term reading categories won't fit.) In fact, thinking it through now, I'd prefer to have *detailed* yomi information in {{ja-kanjitab}}, and leave it out of {{ja-pron}}.
Is this idea sensible? Would that work for others? ‑‑ Eiríkr Útlendi │Tala við mig 21:26, 13 March 2018 (UTC)Reply
There is Template_talk:ja-kanjitab#Feature_request:_jukujikun_readings and Module:User:Suzukaze-c/Hani-tab (although we are diverging from the original topic). —suzukaze (tc) 21:29, 13 March 2018 (UTC)Reply

@Naggy Nagumo, Atitarev, Suzukaze-c, Wyang, Eirikr, Internoob, Dine2016: Now at the kanji itself is appearing in the IPA transcription, even though the template specifies the hiragana as the first positional parameter. —Mahāgaja (formerly Angr) · talk 14:48, 21 March 2018 (UTC)Reply

I suspect that was caused by the second instance of {{ja-pron}}, {{ja-pron|a=もり.wav}}, which didn't specify any parameter except a= for the audio file. I've merged the two, and now things are displaying correctly. ‑‑ Eiríkr Útlendi │Tala við mig 19:05, 21 March 2018 (UTC)Reply

Atamadaka notation not accounting for long vowels in first syllable

[edit]

In creating the page for 聖句 (せいく), I gave {{ja-pron}} the |acc=1| parameter, since it uses an atamadaka-gata pitch accent. However, it is showing séꜜèkù for the pitch, when it should be sééꜜkù because of the long vowel in the first syllable. Should a rule be added in the module where it would place the pitch fall differently between, say, せー and せい in the first parameter? (This exception would also occur when the first syllable ends with ん.) BlueCaper (talk) 17:05, 22 June 2018 (UTC)Reply

Each せい, せー and せん is single syllable but two morae. It should be séꜜèkù. --Naggy Nagumo (talk) 15:03, 6 July 2018 (UTC)Reply
Yes, as Naggy stated. Atamadaka means high pitch on the first mora, not the first syllable, so séꜜèkù is correct. ‑‑ Eiríkr Útlendi │Tala við mig 20:09, 6 July 2018 (UTC)Reply

Verb 囲う "kako.u" incorrect IPA

[edit]

On the entry for 囲う, using "かこ.う" the kana correctly renders as "かこう" but the IPA says "kàkóó". The second "o" should be a "u". — This unsigned comment was added by Aogaeru4 (talkcontribs).

Agreed. (Notifying Eirikr, Wyang, TAKASUGI Shinji, Nibiko, Suzukaze-c, Dine2016, Poketalker, Cnilep, Britannic124, Fumiko Take, Nardog, Marlin Setia1, AstroVulpes, Tsukuyone): . --Anatoli T. (обсудить/вклад) 03:11, 10 July 2018 (UTC)Reply
Note that the verb (おも) (omou) with a similar ending is working fine. --Anatoli T. (обсудить/вклад) 03:16, 10 July 2018 (UTC)Reply
It works correctly only when acc=2 is specified. — TAKASUGI Shinji (talk) 03:27, 10 July 2018 (UTC)Reply
I think the significant feature is that there is no problem when the accent falls on the kana before the "u". (Aogaeru4 (talk) 03:30, 10 July 2018 (UTC))Reply
It should be fixed now. Wyang (talk) 03:43, 10 July 2018 (UTC)Reply
Yay! Thanks, Frank. --Anatoli T. (обсудить/вклад) 03:46, 10 July 2018 (UTC)Reply

yutōyomi vs k,o

[edit]
Discussion moved to Template talk:ja-kanjitab

呉音 and 漢音

[edit]

See the Wikipedia articles Go-on and Kan-on. Also compare the following Google hits:

The second one mostly shows our project, which is a bad sign: we don’t follow the mainstream spellings. — TAKASUGI Shinji (talk) 11:31, 20 May 2019 (UTC)Reply

  • FWIW, I note that Google doesn't differentiate between searches for [ kan-on] and searches for [ kan'on]. Regarding the convention of romanizing 呉音 as go-on with a hyphen, I think that's largely to differentiate from the regular English term goon. I see that the WP article at Go-on has had the same headword spelling since 2008, and no discussion on its talk page about the romanization.
By way of comparison, I just had a look in the index of Shibatani's The Languages of Japan, and I see that he used kan'on and go'on.
Considering that go-on and go'on appear to be more common in English-language texts as the romanization for 呉音, I think we should have entries at these spellings, in accordance with our overarching policy of being a descriptive dictionary. If a term is in demonstrable use, we should generally have an entry for it. This also improves discoverability. However, for purposes of our own usage in headings and text in Japanese entries, I don't consider Google usage patterns to be a strong indicator for how we should spell things: for Japanese terms written in the Latin alphabet, we use a variant of Hepburn, as described at Wiktionary:Japanese transliteration.
That said, it's good to discuss this from time to time and make sure we're in accordance at least amongst ourselves. ‑‑ Eiríkr Útlendi │Tala við mig 17:46, 23 May 2019 (UTC)Reply

Strange behavior at ハノイ

[edit]

For some reason, the pronunciation at ハノイ is given as "[háꜜnòì]" in transcription (so far quite reasonable), but as "[ɰᵝa̠ no̞i]" in IPA, which seems to be "wanoi" rather than the "hanoi" I'd expect. What's happening here? MuDavid (talk) 08:11, 23 May 2019 (UTC)Reply

@MuDavid: Moved the discussion from Module talk:ja. Removing the space from the word fixes the transcription: {{ja-pron|ハノイ|acc=1|acc_ref=NHK}} gives [ha̠no̞i]. Perhaps on its own is being interpreted as the particle . — Eru·tuon 08:57, 23 May 2019 (UTC)Reply
Indeed it does, thank you! MuDavid (talk) 09:05, 23 May 2019 (UTC)Reply

char count or mora count for "dev"?

[edit]

Since I changed the module, "dev" should be an index by mora, not by character. Previously dev was counted by character, so entries including yōon(拗音) will be incorrect. (e.g. 百姓 {{ja-pron|ひゃくしょう|dev=3}} will be incorrect ひゃく(しょ)ー, now it should be dev=2). If this change is bad, please revert or fix the module. If this change is okay, please fix entries. --Naggy Nagumo (talk) 00:29, 7 March 2020 (UTC)Reply

At most 182 entries may be related [1][2]Naggy Nagumo (talk) 00:37, 7 March 2020 (UTC)Reply

@Naggy Nagumo: Thank you! I'd long been puzzled by that. At some point there was a separate devm parameter to handle devoicing specifically by mora count -- but I've never understood why that was even created, since (as you note) one would never want to mark devoicing on any other basis. This might entail some clean-up, though, for entries with 拗音 (and possibly also 促音). Cheers! ‑‑ Eiríkr Útlendi │Tala við mig 00:44, 7 March 2020 (UTC)Reply
@Mahagaja Fixed. —Naggy Nagumo (talk) 05:55, 3 July 2020 (UTC)Reply

Deprecate yomi=/y= for this template and Kansai/Kyoto pitch accents

[edit]

Since {{ja-kanjitab}} is enough for the yomi= parameter, might as well remove the yomi=/y= parameter of this template? That is of course if there is a bot who can gladly make the necessary changes.

In other news, I have been experimenting with ああ and 明日 regarding the Kansai/Kyoto accents after @Kyoww's edits and tried to unify it into one single template with the standard Tokyo accent, which proves to be disastrous:

In this example, the Kyoto accent should be Kōki instead of Atamadaka and parameters prefixed with acc3= were put in the first line or acc=/acc1=.


This one has two separate templates, better than nothing, the Kyoto accent still has to be labeled Kōki instead of Atamadaka.


This is the current one with the correct Kyoto pitch accent and not using the template as in the pronunciation under the Okinawan header. The IPA is somewhat the same as the last two though.

Any thoughts? Is there to need to update this template for the dialects? Remember that the pitch accents are not in line as the standard Tokyo accents. ~ POKéTalker05:35, 7 March 2020 (UTC)Reply

@Poketalker: For the yomi or y parameters, it's probably enough to just remove or comment out the relevant code in the module. I'm less familiar with our Lua infrastructure, but for regular templates that don't use modules, any parameter in a template call that isn't handled by the template does ... nothing. It only appears in the wikicode, with no effect on the output.
If we wanted to be clever about it, we'd replace the handling code for yomi or y values to instead add the entry to a maintenance category, which we would then use to either manually or bot-wise go through and remove the parameters and arguments.
For the pronunciation details, I have some questions.
  • Is that specifically Kyōto accent, or is it more generally Kansai?
  • What is Kōki? Since this is supposed to be information for the reader of the template's output, in order to provide further detail required to understand that output, I'd recommend that we create that entry, and any other relevant entries, before building out their use in this template.
Good stuff! Cheers, ‑‑ Eiríkr Útlendi │Tala við mig 05:58, 7 March 2020 (UTC)Reply
@Eirikr: after checking Kansai_dialect#Pitch_accent link (which you might have not seen in my starting post), also forgot to put Japanese_pitch_accent#Kyoto–Osaka_(Keihan_type) for your convenience. You're right that the accent mentioned is Kansai *in general*, don't see any difference there. 高起 (kōki) literally means a “high-rising” accent in which the first mora has a high pitch and the next one(s) usually have high pitch(es). Contrasts to 低起 (teiki), literally “low-rising” where the first mora has a low pitch. Both yield no entries in Kotobank, though. ~ POKéTalker06:55, 7 March 2020 (UTC)Reply
@Eirikr, @Poketalker: I wrote a bot script today to get rid of the yomi arguments from the {{ja-pron}} from entries in {{tracking/ja-pron/yomi}}. The source code is here, if you would be interested in reading it. I'm going to make a post in the Beer parlour – if you think the script is good enough, please lend me your support over there! Thanks, Kiril kovachev (talk) 15:25, 29 May 2023 (UTC)Reply

References

[edit]
  1. 1.0 1.1 1.2 Nakai, Yukihiko, editor (2002), 京阪系アクセント辞典 [A Dictionary of Tone on Words of the Keihan-type Dialects] (in Japanese), Tōkyō: Bensei, →ISBN
  2. 2.0 2.1 2.2 2.3 2.4 2.5 NHK Broadcasting Culture Research Institute, editor (1998), NHK日本語発音アクセント辞典 [NHK Japanese Pronunciation Accent Dictionary] (in Japanese), Tokyo: NHK Publishing, Inc., →ISBN
  3. 3.0 3.1 3.2 Matsumura, Akira, editor (2006), 大辞林 [Daijirin] (in Japanese), Third edition, Tokyo: Sanseidō, →ISBN
  4. 4.0 4.1 4.2 Kindaichi, Kyōsuke et al., editors (1997), 新明解国語辞典 [Shin Meikai Kokugo Jiten] (in Japanese), Fifth edition, Tokyo: Sanseidō, →ISBN

Odd output

[edit]

I noticed at 石榴 (zakuro) that {{ja-pron}} is outputting initial /z-/ instead as [d͡z-]. I don't think this is correct; while that might be an allophone, I don't think it's the main form.

Meanwhile, at 砕氷船 (saihyōsen), I noticed that {{ja-pron}} is outputting /hjo/ as [ço̞], completely missing the palatal glide.

Does anyone have any insight into what's going on here? I'm pretty sure these both represent changes that have crept in some time over the past few months while I've been busy off-site. ‑‑ Eiríkr Útlendi │Tala við mig 06:02, 25 May 2020 (UTC)Reply

Utterance-initial /z/ is typically [dz]. See Vance (2008: 85), Labrune (2012: 64), Maekawa (2010). We talked about [ç] vs [çj] in #日向 above. As I said then, a palatal glide articulation is implied because the tongue inevitably passes through that position during the transition from a voiceless palatal fricative to the following vowel. /C/ before /i/ and /Cj/ are consistently represented as a single palatal(ized) consonant ([kʲ ɡʲ ɕ ʑ t͡ɕ d͡ʑ ɲ bʲ pʲ mʲ ɾʲ]), so it makes no sense to single out /h/ before /i/ and /hj/ and represent them as two segments. Nardog (talk) 13:04, 11 February 2022 (UTC)Reply

Katakana long mark

[edit]

@Eirikr, Erutuon, Naggy Nagumo, Nardog, Suzukaze-c, TAKASUGI Shinji: the katakana long mark is causing errors, for example at キャッシュディスペンサー and several others in Category:IPA pronunciations with invalid IPA characters. —Mahāgaja · talk 18:59, 4 July 2020 (UTC)Reply

"dev" index on the pages was invalid. —Naggy Nagumo (talk) 23:16, 4 July 2020 (UTC)Reply

Hyphens

[edit]

Do we want hyphens?

Currently:

{{ja-pron|あい-うえお|acc=3}}い-うえお
instead of い-うえお

Suzukaze-c (talk) 21:38, 19 September 2020 (UTC)Reply

Katakana for IPA again

[edit]

(Notifying Eirikr, TAKASUGI Shinji, Atitarev, Fish bowl, Poketalker, Cnilep, Marlin Setia1, Huhu9001, 荒巻モロゾフ, 片割れ靴下, Onionbar, Shen233, Alves9, Cpt.Guapo): The template is not parsing しゅ correctly; it's putting a stray "ュ" into the IPA. See アラブ首長国連邦 for an example, though there are many others at CAT:IPA pronunciations with invalid IPA characters. —Mahāgaja · talk 20:41, 28 January 2022 (UTC)Reply

It may have something to do with this edit by @Rdoegcd. —Mahāgaja · talk 20:43, 28 January 2022 (UTC)Reply

Hi. I undid my edit. I was trying to fix an issue on 可視光線 where the devoicing diacritic was being added to the space between the words in the IPA, but I think my edit introduced another bug. --Rdoegcd (talk) 00:39, 29 January 2022 (UTC)Reply
Fixed the above, I think. However, the module still produces an unexpected result if for some reason you try to devoice っ, ん, or a mora before ん or if you enter characters besides kana, ー, spaces, and periods. Rdoegcd (talk) 18:53, 30 January 2022 (UTC)Reply

"Devoiced /a/"?

[edit]

@Rdoegcd --

Re: diff and your edit comment, "fix bug where devoiced /a/ yields ḁ instead of a in romaji, try mw.log(p.accent('かきくけこ',0,'1,2,3,4,5'))".

Where is there ever a "devoiced /a/" in Japanese? In all I've read so far, Japanese only devoices /i/ and /u/, and only in specific situations. I've never heard (ha!) of any cases of devoiced /a/, /o/, or /e/.

Curious, ‑‑ Eiríkr Útlendi │Tala við mig 09:34, 25 March 2022 (UTC)Reply

I saw it on , where it says
Lua error in Module:parameters at line 858: Parameter "y" is not used by this template.
I have no idea if it is correct. Japanese_phonology#Devoicing also mentions that /o, a/ may be devoiced sometimes. Rdoegcd (talk) 16:02, 25 March 2022 (UTC)Reply
Not sure if it's appropriate to use this template for dialects other than standard/Tokyo Japanese. Most of them don't have the same two-accent system and the narrow transcription the module produces is bound to be way off. Nardog (talk) 11:59, 4 April 2022 (UTC)Reply
Looks like Poketalker (talkcontribs) added that pronunciation information in December 2018. Poketalker, do you have a reference for that, or more generally any further detail on Kagoshima pronunciation patterns and whether {{ja-pron}} really fits for this? I know that some dialects aren't moraic and are syllabic instead, which wouldn't fit {{ja-pron}} in its current implementation. ‑‑ Eiríkr Útlendi │Tala við mig 19:27, 4 April 2022 (UTC)Reply
Re: Japanese_phonology#Devoicing and the examples for /o/ and /a/, in Tokyo speech, that only happens in fast speech -- in which case, {{ja-pron}} is not the place to document that. Fast speech is a different beast, and I'm not comfortable at present trying to document that anywhere except maybe usage notes. — This unsigned comment was added by Eirikr (talkcontribs) at 19:29, 4 April 2022 (UTC).Reply
@Eirikr: that was a blind insertion from a StarLing entry years ago. There was no accent on the ta there, so had to improvise. You could check Martin's book (JLTT) as referenced, but my impression at that time was no way to de-accent that mora. Could try to look at a Japanese etymological database, takes a while. ~ POKéTalker00:25, 5 April 2022 (UTC)Reply
@Poketalker: Thank you for the background. FWIW, I am doubtful of Starling's derivation -- Japanese cartographic traditions put the north on the left, similar to one possible derivation for English north, and not behind as Starling contends. Also, I thought I recalled Shibatani suggesting that Kansai / Kagoshima pitch accent patterns were actually the innovation, and that Kantō was more conservative, which would put a further kink in Starling's theory... I'll have to look through my local copy. Cheers! ‑‑ Eiríkr Útlendi │Tala við mig 04:57, 12 April 2022 (UTC)Reply

Please add SMK7 (Template:R:Shinmeikai7)

[edit]

Could somebody who have a right for editing Module:ja-pron please add ['SMK7'] = 'R:Shinmeikai7', in ref_template_name_data. Lugria (talk) 00:03, 16 September 2022 (UTC)Reply

I got the right, so I added it. Lugria (talk) 13:02, 17 September 2022 (UTC)Reply

[ŋ]

[edit]

(Notifying Eirikr, TAKASUGI Shinji, Atitarev, Fish bowl, Poketalker, Cnilep, Marlin Setia1, Huhu9001, 荒巻モロゾフ, 片割れ靴下, Onionbar, Shen233, Alves9, Cpt.Guapo, Sartma, Lugria):

I think it would be informative and useful if we modified {{ja-pron}} so that it produces [ŋ] where applicable. Given that the template outputs phonetic, not phonemic IPA, it is actually in a way "necessary" for us to use [ŋ] where applicable if we want to be correct. In the article , we use [ŋ] manually (i.e. using {{IPA}} directly) but almost none of the other entries (where it would be correct) use both [g] as well as [ŋ] (there's also entries like 天が下 (before I've changed it) where only [ŋ] was used). I could easily modify the module so that it produces both a [g] as well as a [ŋ] variant if given か゚ but then the task remains that か゚ needs to be added to potentially thousands of entries. Is there a better solution? What do others think? — Fytcha T | L | C 00:29, 29 November 2022 (UTC)Reply

Word-internal /ɡ/ as [ŋ] is a regressive feature, now consistently exhibited only by older Tokyo speakers.
  • Vance (2008: 214): "According to descriptions written around 1940, many Tokyo natives were consistent nasal speakers at that time, but the proportion of inconsistent speakers and consistent stop speakers was on the increase. More recent sociolinguistic studies indicate that the proportion of consistent stop speakers has in fact increased and that, at least in some Tokyo neighborhoods, the proportion of tokens pronounced with [ŋ]/[ŋʲ] correlates with the age of the speaker."
  • Labrune (2012: 78): "The study by Hibiya (1999) clearly demonstrates that there is a clear pattern of age stratification in the use of [ŋ], which drops off as age diminishes."
Nardog (talk) 17:44, 29 November 2022 (UTC)Reply
@Fytcha: What @Nardog wrote is correct. Only a very small part of elderly people still uses /ŋ/, most people either say /g/ or /ɣ/ (the latter might even be more prominent in Tokyo). — Sartma 𒁾𒁉𒊭 𒌑𒊑𒀉𒁲 10:37, 1 December 2022 (UTC)Reply
Slashes enclose phonemes. We're talking about realizations. Nardog (talk) 22:40, 1 December 2022 (UTC)Reply
@Nardog: I meant realisations too, hence the verbs "use" and "say". Sometimes it's good to just be practical, see the big picture, and understand what people are trying to say, without being unnecessarily pedantic. — Sartma 𒁾𒁉𒊭 𒌑𒊑𒀉𒁲 23:16, 1 December 2022 (UTC)Reply
I know you did, "we" was inclusive. When I see as grievous an error as using slashes to enclose allophones, I correct it not just for the person making the error but for anybody who sees it. Nardog (talk) 23:22, 1 December 2022 (UTC)Reply
@Nardog: I understand that spirit. It's the same one that makes me call out pedantry when there is an excess of it. I don't do that just for the person being unnecessarily pedantic, but for anybody to know that it is ok to call it out. It's perfectly fine to point to the moon with a dirty finger. The enlightened ones will know what to look at. — Sartma 𒁾𒁉𒊭 𒌑𒊑𒀉𒁲 00:36, 2 December 2022 (UTC)Reply

|acc2_note not working properly if there is no |acc_note

[edit]

The note intended for the second accent is displayed after the first accent, see 千本. Pinging @Rdoegcd. — Fytcha T | L | C 02:11, 29 November 2022 (UTC)Reply

@Fytcha Fixed. Rdoegcd (talk) 06:27, 29 November 2022 (UTC)Reply

|acc_note not rendering if there is no pitch specified

[edit]

There are cases where a word might not have any pitch that we can specify, but we still want to add acc_note to clarify -- particularly for verb suffixes, where the pitch depends on the verb to which the suffix attaches. See the wikicode at なば (naba), for instance.

Could someone please update the template / module to allow acc_note (and acc2_note, etc.) even if there is no acc (or acc2, etc.)? ‑‑ Eiríkr Útlendi │Tala við mig 21:20, 27 December 2022 (UTC)Reply

d͡z/z and d͡ʑ/ʑ

[edit]

The module currently produces [ɕĩn(d͡)ʑɨᵝ] for 真珠, which doesn't look right. I think it should be either [ɕĩɲ̟d͡ʑɨᵝ] (in the traditional description mentioned in w:Japanese phonology) or [ɕĩɰ̃ʑɨᵝ] (taking into account the Maekawa (2010) study and treating /N/ the same way as in 新種 [ɕĩɰ̃ɕɨᵝ]). Also, the d͡z/z and d͡ʑ/ʑ variations should hold before all vowels, not just before /i, u/, right? Rdoegcd (talk) 23:57, 1 April 2023 (UTC)Reply

[ɕĩɰ̃ʑɨᵝ] wouldn't be "taking into account the Maekawa (2010) study". He says /N/ "strongly enhanced [rate of affricate realization]" (p. 365). Nardog (talk) 12:53, 2 April 2023 (UTC)Reply
Removed it. The module already assumes the Tokyo-style yotsugana merger, which means [z] and [dz], and [ʑ] and [dʑ], are in complementary distribution. The addition seems to have been motivated by an incomplete understanding of the merger. Nardog (talk) 13:03, 2 April 2023 (UTC)Reply
@Rdoegcd, Nardog: Thank you for catching this mistake, my addition was wrong, sorry for that. However, is the module really correct as it is? My impression was that e.g. ずっと can (non-utterance-initially) be realized as either [d͡zɨt̚to̞] or [zɨt̚to̞] in free variation which is not currently reflected by the module. — Fytcha T | L | C 20:53, 14 September 2024 (UTC)Reply
Transcriptions usually represent utterances in isolation, otherwise we would have to list all the possibilities for /N/, etc. Nardog (talk) 20:57, 14 September 2024 (UTC)Reply

Add a hidden category for unsourced pitch accents

[edit]

Hi,

Could someone add a hidden maintenance category when no reference is given for a pitch accent? This would allow me to check them and add references.

Thanks. Thibaut (talk) 11:11, 15 May 2023 (UTC)Reply

@Thibaut120094 With regards to this, I regularly add pitch accents from JPDB.io because it's a very straightforward resource to access and means I don't need to open my NHK ebook and seek alphabetically for it; is there any other accepted online source that can be put in instead? Kiril kovachev (talk) 23:12, 28 May 2023 (UTC)Reply
JPDB.io uses AI to autogenerate pitch accent graphs [3][4], it might not be accurate.
It’s best to stick to authoritative Japanese dictionaries like the NHK dictionary or the Daijirin, paid digital versions are available on various platforms (for example on iOS and macOS). Thibaut (talk) 23:53, 28 May 2023 (UTC)Reply
I see, I apologise then if I may have polluted some entries with wrong pitch accents, although I believe the main ones I've been porting have been confirmed as opposed to AI-generated (there's a warning icon to signify if the accent is auto-generated). Anyway, this is perhaps a good wake-up call to simply make the effort to look it up, since I do already have a copy of NHK. Thanks for your advice, Kiril kovachev (talk) 13:55, 29 May 2023 (UTC)Reply
Also, I found [5] for looking up words through NHK, this way you don't need a paper or ebook copy but can just look it up online. Kiril kovachev (talk) 22:27, 6 June 2023 (UTC)Reply
@Thibaut120094 Here, I made it so they're now added to Special:WhatLinksHere/Template:tracking/ja-pron/unsourced_accent, but be warned there is a ton of them. Maybe there is a better way of doing this—in fact, I was wondering whether this should be a proper category, maybe which would appear under Category:Requests concerning Japanese, but unfortunately I don't know how to do that... Sorry that it took such a while when the change was really little in fact... Kiril kovachev (talk) 22:26, 6 June 2023 (UTC)Reply
Thank you very much! Thibaut (talk) 08:32, 12 June 2023 (UTC)Reply
No problem! Kiril kovachev (talk) 10:01, 25 June 2023 (UTC)Reply

recent edit

[edit]

(Notifying Eirikr, TAKASUGI Shinji, Atitarev, Fish bowl, Poketalker, Cnilep, Marlin Setia1, Huhu9001, 荒巻モロゾフ, 片割れ靴下, Onionbar, Shen233, Alves9, Cpt.Guapo, Sartma, Lugria, LittleWhole, Dixtosa, Kc kennylau, Rua, Ruakh, ZxxZxxZ, Erutuon, Jberkel, JohnC5, Benwing2, RichardW57, Chuck Entz, Metaknowledge, SemperBlotto, Dixtosa, Rua, Benwing2, Dixtosa, The Editor's Apprentice): the edit was meant to be minor edit; i was rushing and forgot to click the button Chuterix (talk) 04:37, 3 June 2023 (UTC)Reply

also notifying @Theknightwho Chuterix (talk) 04:38, 3 June 2023 (UTC)Reply
i fixed module error alos minor edit
can someone get shuri naha to work Chuterix (talk) 04:44, 3 June 2023 (UTC)Reply
Okinawan has its own ISO 639 code, you should just fork the template/module rather than extend them. Nardog (talk) 07:23, 3 June 2023 (UTC)Reply
@Nardog No - that is a really bad idea, because it causes a maintenance nightmare where modules for smaller sister languages effectively get neglected; we've had this issue with Okinawan already. We will do what we've done with the other Japanese modules, which is to adapt them to multiple languages. Honestly, I have no idea why you would suggest that. Theknightwho (talk) 17:41, 3 June 2023 (UTC)Reply
Agreed with @Theknightwho. I tried to do that, but I don't know Lua and got reverted by @Chuck Entz. Chuterix (talk) 17:46, 3 June 2023 (UTC)Reply
@Chuterix: I reverted you for two reasons: 1) There were errors that looked like they would affect all 42,000 transclusions and I don't know how to fix them- all I could do was revert to the state before the changes were made, and 2) Renaming "ja-pron" to just "pron" was a really bad idea. This isn't a general module that covers pronunciation in all languages, but one that covers specific writing systems. If someone ever does want to create a general module along those lines, the name would be already taken. It would be better to come up with a name that everyone can agree with before moving the module.
This wasn't a minor error: it broke all uses of the template (42,000 of them), and flooded CAT:E. Even after my revert fixed the errors, they were still flooding in there. I managed to purge a couple thousand entries before I fell asleep, but there were in excess of 7,000 more, and increasing by the minute. Meanwhile, there was a (probably) fixable module error in the Okinawan entry at that was not addressed because there was no way to tell it apart from the errors from your edits. I spotted it at the time, but never got around to telling @Theknightwho because I had this mess to deal with. Chuck Entz (talk) 18:20, 3 June 2023 (UTC)Reply
BTW: there's an error at 欲しゃい that you need to fix (I think you used the wrong name in the module). Chuck Entz (talk) 18:26, 3 June 2023 (UTC)Reply
@Chuck Entz I've fixed the error at : I merged Module:ryu-headword into Module:Jpan-headword (formerly Module:ja-headword, but now able to handle all Japonic languages). Module:ryu-headword had effectively been abandoned since it was forked back in 2019, and it looks like it was slightly more permissive with its parameters (i.e. it tried to guess which parameter was which based on the script, instead of just having fixed parameters). It's a feature that's gradually being phased out of East Asian templates, because it's a pointless waste of resources that causes lots of issues with edge cases where Latin letters are used in Jpan, Hani etc., such as Japanese NATO (NATO).
The error at 欲しゃい was caused by Chuterix forking Module:ja-see as Module:kzg-see, but without looking into the details I don't specifically know what the problem was. Much better to just wait until Module:ja-see has been converted into Module:Jpan-see so it can be done properly. Theknightwho (talk) 18:42, 3 June 2023 (UTC)Reply

Removed yomi code

[edit]

Dear all, as per my recent bot action, all instances of the yomi parameter we deprecated a few months ago have been removed, so I got rid of the code that originally handled transcluding pages into the tracking template, as well as the straggling data structures used to match the parameters to the reading names. Kiril kovachev (talk) 09:59, 25 June 2023 (UTC)Reply

Custom accent

[edit]

What about Kyoto, Kagoshima, Ryukyuan (for {{ryu-pron}}), possibly other accents?

I've documented the accent info here; there's no falling pitch accent, initial 2 morae accent, nor unaccent in more than two morae. I have suggested this in Discord, but both theknightwho and Fish bowl (i contacted in WT) are unable to do this ATM.

We can put this in a collapsible section if someone says "we should put the dialectial accent info somewhere else because we should only focus on "standard" (tokyo) pronunciation". We're trying to make this like {{zh-pron}} in the future and any contributions are highly appreciated.

A sample from (the dead account of) @荒巻モロゾフ can be viewed at User:荒巻モロゾフ/draft.

(Notifying Eirikr, TAKASUGI Shinji, Atitarev, Fish bowl, Poketalker, Cnilep, Marlin Setia1, Huhu9001, 荒巻モロゾフ, 片割れ靴下, Onionbar, Shen233, Alves9, Cpt.Guapo, Sartma, Lugria, LittleWhole, Mcph2): Chuterix (talk) 23:30, 1 January 2024 (UTC)Reply

Add DJR4

[edit]

Someone who has a right to edit Module:ja-pron, please add ['DJR4'] = 'R:Daijirin4', in ref_template_name_data. Lugriaルグリア [会話貢献] 06:21, 13 March 2024 (UTC)Reply

Add NKD2

[edit]

Can we add {{R:Nihon Kokugo Daijiten 2 Online}} as NKD2 to the reference? --TongcyDai (talk) 13:12, 13 June 2024 (UTC)Reply

Transcription of w and u

[edit]

Pinging @Fish bowl @Eirikr @Poketalker @Lattermint @Chuterix @Nardog Currently, we transcribe w as [ɰᵝ] (voiced velar approximant with compression), u after non-palatalising consonants as [ɯ̟ᵝ] (close back unrounded vowel with compression and frontedness) and moraic n before vowels and semivowels as [ɰ̃] (nasalised voiced velar approximant). I suggest that we transcribe w as [β̞] (voiced bilabial approximant) and u as [ɯ̟] (the same, but without the compression).

  1. According to this paper, which did real-time MRI tests on speakers, [β̞] is a closer approximation to the actual sound produced for w. Quoting from the conclusion: "The results of the current study show that it is misleading to use the IPA symbol [w] for Japanese /w/. Note also that it is even more misleading to use [ɰ] instead, because there is little evidence to interpret Japanese /w/ as a velar consonant." (Emphasis mine.)
  2. It's not clear that [ɯ̟ᵝ] for u is more accurate than [ɯ̟], as the two appear to be in free variation. For instance, w:Japanese phonology#Vowels gives two citations for each, which all check out. I'm unconvinced that such a fine distinction is appropriate in a general phonetic transcription such as the one we aim to give (as opposed to transcriptions of idiolects, for example), so I see no compelling reason for us to use the busier transcription.
  3. Distinguishing between [ɰ̃], [ɰᵝ] and [ɯ̟ᵝ] is extremely difficult, bordering on impossible, particularly when more than one appears in the same transcription, and even more particularly when they appear next to each other. Some examples:
    1. 温和(おんわ) (onwa): [õ̞ɰ̃ɰᵝa̠]. New suggestion: [õ̞ɰ̃β̞a̠].
    2. ふわふわ (fuwafuwa): [ɸɯ̟ᵝɰᵝa̠ɸɯ̟ᵝɰᵝa̠]. New suggestion: [ɸɯ̟β̞a̠ɸɯ̟β̞a̠].
    3. The worst possible case, found on Japanese WP: デーモンウゥーム (dēmonwūmu): [de̞ːmõ̞ɰ̃ɰᵝɯ̟ᵝːmɯ̟ᵝ] (literally unreadable, unless you're a robot). New suggestion: [de̞ːmõ̞ɰ̃β̞ɯ̟ːmɯ̟].
  4. It's not ideal to use [ᵝ] for compression anyway, because it isn't actually part of the IPA, but an informal extension.

Theknightwho (talk) 07:27, 8 July 2024 (UTC)Reply

Whatever is confirmed accurate, I seem to accept it. I don't do much on standard Japanese phonology. Chuterix (talk) 07:41, 8 July 2024 (UTC)Reply
Your suggestion of using [β̞] certainly seems sensible, especially given the results in the paper you linked, and the IPA examples listed above. ‑‑ Eiríkr Útlendi │Tala við mig 19:07, 10 July 2024 (UTC)Reply
I think it would also be sensible to adapt the moraic nasal based on surrounding phonology, too, since [ɰ̃] (no doubt aped from WP) is explicitly said to be broad transcription, which is at odds with the very narrow transcription we use for everything else. This has a very detailed analysis, among other papers. Theknightwho (talk) 00:22, 11 July 2024 (UTC)Reply
And Maekawa (2023) based on the same data as the paper you based on for using [β̞]. Nardog (talk) 06:16, 11 July 2024 (UTC)Reply
@Nardog Thank you. Do you have any suggestions for how we should deal with (admittedly rare) cases such as クァ (kwa), グァ (gwa) and so on? I am sceptical that [kβ̞] is an accurate representation, but I can't really find anything that focuses on the phonetics beyond representing it as simply [kw] (or similar). It's usually only touched on briefly to note that it's a rare phenomenon that existed historically, and that it crops up in the occasional loanword, but the papers which go into more detail do so from the perspective of orthography. Theknightwho (talk) 12:25, 11 July 2024 (UTC)Reply
That's neither here nor there for me, given any velarization can be argued as coarticulation/gestural overlap with [k, ɡ]. I wouldn't make an exception for them, to reduce the amount of code to maintain. Nardog (talk) 16:48, 11 July 2024 (UTC)Reply
@Nardog I'm doing a major rewrite of the whole pronunciation code which should make it much easier to specify/keep track of complex interactions between phonemes, so I wouldn't worry about maintenance issues, as I have that in hand. I'm just keen to make sure we accurately represent things, even with edge-cases. Theknightwho (talk) 21:00, 11 July 2024 (UTC)Reply
But using [β̞] is more theoretically sound also. Velarization is expected in a transition from a velar occlusive to a continuant. That's not allophony, just coarticulation. Nardog (talk) 11:42, 12 July 2024 (UTC)Reply

Reference dictionnary addition

[edit]

Could we have the following dictionnary available for accent references? Shin Meikai Nihongo Akusento Jiten Second Edition (新明解日本語アクセント辞典第2版). It has been published in 2014 by Sanseidō. Maidodo (talk) 03:03, 7 September 2024 (UTC)Reply

Confirmation of Convention for Indicating Long Vowels in Pronunciation Sections Using This Template (in kana)

[edit]

Hello. What is our convention for long vowels in terms of kana-represented pronunciation? (Not to be confused with kana orthography = 現代仮名遣い)

The ordinary convention in Japanese dictionaries is to use the prolonged sound mark (), otherwise it is not possible to distinguish long vowel and (quasi-)double vowels, or even morphemic boundary. Note: generally, the kana-represented pronunciation is written using kanakana, but I am using hiragana to align with Wiktionary.

  • Long a (following 現代仮名遣い section 1)

お母さん(おかあさん) pronounced かーさん >> Wiktionary pronunciation notation: おかさん

  • Long i (following 現代仮名遣い section 1)

お兄さん(おにいさん) pronounced にーさん >> Wiktionary pronunciation notation: おにさん

  • Long u (following 現代仮名遣い section 1)

夫婦(ふうふ) pronounced ふーふ >> Wiktionary pronunciation notation: identical

  • Long e (following 現代仮名遣い section 1)

お姉さん(おねえさん) pronounced ねーさん >> Wiktionary pronunciation notation: おねさん

  • Long e (following 現代仮名遣い section 2)

映画(えいが) pronounced えーが >> Wiktionary pronunciation notation: identical

(Note: it can also be pronounced えいが in low-speed speech, but we ignore)

  • Long o (following 現代仮名遣い section 1)

お父さん(おとうさん) pronounced とーさん >> Wiktionary pronunciation notation: identical

  • Long o (following 現代仮名遣い section 2)

(こおり) pronounced こーり >> Wiktionary pronunciation notation: こ


I am sure if those differences result as a collegial choice, or is there room for improvement?

In my opinion, the usul Japanese conventions for kana-represented representation make more sense, because they clearly show that this is not a double vowel. Maidodo (talk) 08:10, 7 September 2024 (UTC)Reply

@Maidodo Yes, we should probably do that. The current pronunciation module is in a bit of a sorry state, and needs some work. Theknightwho (talk) 14:26, 7 September 2024 (UTC)Reply
Thanks @Theknightwho ― I am glad to know that it was not intentional and that the idea is to have a kana-represented pronunciation using the same well-organized system of conventions than the paper dictionaries in Japan. I don't have experience in Template programmation but I am willing to offer my help. Maidodo (talk) 23:15, 7 September 2024 (UTC)Reply
@Maidodo Thanks - I appreciate it. I won't have time to get to this straight away, but I have been doing a large-scale rewrite of a lot of the core Japonic modules, and will make sure to get to it as part of that.
There are some further changes I think we should do (which are basically just extensions of your point above):
  1. If it differs from the standard kana spelling, we should give a phonemic respelling in the pronunciation section; e.g. 今日(こんにち) (konnichi wa) would have こんにちわ. We should probably label it "phonetic kana" (in a similar fashion to Korean entries, with their "phonetic hangul"). There is precedence for this in other languages, too(e.g. Russian интерне́т (internét) gives the phonetic respelling интэрнэ́т).
  2. At the moment, kana spellings only show up in pronunciation sections if at least one pitch accent has been given, but I think these respellings should always show up, since they're obviously relevant.
  3. Respellings should always be given in hiragana, for the sake of simplicity.
  4. We should replace the strange hybrid between romaji and IPA with phonemic IPA: e.g. [kòńníchí wáꜜ] would become /konnit͡ɕi waꜜ/. This seems to have come from notation used on the Japanese pitch accent article on Wikipedia, but whoever implemented it didn't seem to realise that it should never have used raw romaji (e.g. the ch digraph makes no sense in that context).
  5. Mora-by-mora pitch accent information should be given in the phonetic IPA transcription.
  6. I would prefer that we showed the slight upstep that occurs with certain pitch accents after the first mora, which Japanese pronunciation dictionaries do with a slight uptick in the line (e.g. [6]).
Here are a couple of mock-ups of what I'm envisioning. The logic behind the dark red is that it indicates differences/features which cannot be derived automatically from the kana spelling (even though they're highly regular in some cases):
TermOldNew
今日(こんにち) (konnichi wa)
  • (Tokyo) んにちは [kòńníchí wáꜜ] (Odaka – [5])
  • IPA(key): [kõ̞ɲ̟ːit͡ɕi β̞a̠]
  • (Tōkyō) IPA(key): /konnit͡ɕi waꜜ/, [kõ̞̀ɲ̟́ːít͡ɕí β̞á̠ꜜ]
    • Phonetic kana: んにち (Odaka – [5])
夫妻(ふさい) (fusai)
Also pinging (Notifying Eirikr, TAKASUGI Shinji, Atitarev, Fish bowl, Poketalker, Cnilep, Marlin Setia1, 荒巻モロゾフ, Shen233, Cpt.Guapo, Sartma, Lugria, LittleWhole, Chuterix, Mcph2, The Editor's Apprentice): Theknightwho (talk) 15:38, 12 September 2024 (UTC)Reply
Have to agree with the idea. Chuterix (talk) 16:01, 12 September 2024 (UTC)Reply
Some quick thoughts.
  • I like the phonetic kana and the red overbars.
    • It's not clear to me why the わ is bold and red for 今日は?
  • Properly accounting for multiple pitch patterns is another good improvement.
  • For the phonetics, I agree that we should get away from romaji and use IPA. That said, I'd recommend a slightly different approach.
    • Marking morae is important. We can use the simple period notation already used in IPA to split syllables.
    • To revist, do we need both /loose/ and [strict] notations? If my (admittedly patch) memory is correct, we did that in the past because we were worried that the accents to show pitch (á for high and à for low), when combined with the other diacritics used in the [strict] notation, wound up being too many diacritics crammed into one space, making it look too much like Z̷̜̓̆̐͗̚͘͝͠͝a̸̱̬̣͕͔̫̘̾̍̐ļ̴̭̩̞̯̲̘̜͎̌g̵̡̲̙͆̾͊͆̌͆̈́̌̓͘ớ̶̧͕̮͓̥͔͛͝.
    • For the konnichi wa example, as far as I'm accustomed to hearing it, that the double "n" there isn't realized phonetically as [ɲ̟ː], but more like [ɲ̟.n], where the "n" in nichi is distinct from the velar nasal of the "n" in kon.
Must run. Happy to see this discussion taking place! ‑‑ Eiríkr Útlendi │Tala við mig 16:50, 12 September 2024 (UTC)Reply
@Eirikr The red is to indicate a difference from the canonical kana spelling; I used bold because you can’t really see the colour otherwise. I also agree with showing the morae with dots.
The / / and [ ] are phonemic and phonetic notation (respectively), not loose and strict, so they fundamentally convey different information. In some (rare) cases, there may be phonemic differences that are difficult to distinguish phonetically, like the honoo example we’ve talked about, but for the most part it’s useful to give both because they exist for different reasons, and make it clear what is phonemically relevant and what isn’t in an IPA format. Indeed, mora boundaries with dots are a purely phonemic phenomenon, and wouldn’t make sense in the phonetic transcription. Theknightwho (talk) 17:24, 12 September 2024 (UTC)Reply
Thank you for clarifying. About your last sentence, "mora boundaries with dots are a purely phonemic phenomenon", I'm not entirely sure I agree, considering that pitch changes are mora-bound. If we try to show pitch information in the phonetic transcription, we run into problems, such as when the pitch changes in the middle of an otherwise long vowel, as between the first and second "o" morae in 高校. How would we show both low pitch and high pitch on something like [o̞ː]? ‑‑ Eiríkr Útlendi │Tala við mig 19:21, 12 September 2024 (UTC)Reply
@Eirikr Yes, I did notice that problem actually. There are tone contour marks, which could work: [o̞᷅ː] has a low-rising contour (i.e. a shift from low to high during the vowel). There is also [o̞᷇ː] (high-falling), which makes sense for a downstep during the vowel. On second thoughts, I also think we should leave out from the phonetic transcription, too, since it's redundant to the diacritics (and would make long vowels impossible to represent), with the one exception being Odaka, since the downstep occurs after the term (and there's no other way to represent that). (Edit: we could also represent them as falling ([ô̞ː]) or rising ([ǒ̞ː]), but that possibly oversimplifies what's actually happening during the vowel.) Theknightwho (talk) 19:47, 12 September 2024 (UTC)Reply
The work seems interesting. The topic is long but I support specifically the phonetic respellings suggested.
BTW, @Theknightwho, if you're using the Russian phonetic respelling интерне́т (intɛrnɛ́t) example, respelled as "интэрнэ́т", then it's important to add the non-automatic Wiktionary transliteration. It's like using "konnichi wa" vs "konnichi ha". Anatoli T. (обсудить/вклад) 00:15, 13 September 2024 (UTC)Reply
Hmm, I'm worried that tone contours might misrepresent what's happening in Japanese. Comparing a Japanese pitch downstep with a falling tone in Mandarin, or a Japanese pitch rise with a rising tone in Mandarin, for instance, these are clearly different phenomena. Japanese pitches are stepped in a way that I think tone contour marks don't quite convey: it's not that the pitch changes steadily through the course of the two morae, and rather each mora has a pitch. In cases where a long vowel sound straddles a pitch change, I think it's clearer to eschew the geminate notation as in [ǒː] and instead simply double the vowel and mark each with the appropriate pitch as in [òó]. ‑‑ Eiríkr Útlendi │Tala við mig 01:17, 13 September 2024 (UTC)Reply
@Theknightwho Very good directions, in my opinion. I am not an expert in terms of IPA, so I won't comment on it. That said, I support the idea of showing a phonemic representation. I can also probably contribute regarding kana-represented pronunciations.
As for the phonemic guide between //, as you know there is no established standard (especially because the Nihon-shiki rōmaji ignores the loanwords; in a world without loanwords, t and t͡s would be two allophones of the same consonnant /t/). But, in reality, Japanese phonology has well integrated a couple of dozens of moras specific to loanwords. For example, it seems well accepted that t and t͡s are phonemically distinct, since, for example contemporary Japanese distinguishes ツ and トゥ, or even more undeniably, there is a distinction between t and t͡ɕ (パーティー; つち). I propose that we use the following vowels, semi-vowels, other consonnants and special phonemes (= special moras). Could you share your thoughts?
  • Vowels: a i u e o
  • Semi-vowels: y w
  • Other cons.: k g s ɕ z d͡ʑ t t͡ɕ t͡s d n h f p b m r
  • special phonemes/moras:
    • long vowel: ː
    • hatsuon (撥音): ɴ
    • sokuon (促音): doubling the following consonnant, or, using ʔ, or a combination of both.
Remarks:
  • I would like to advocate for /koɴnit͡ɕiwaꜜ/ versus /konnit͡ɕiwaꜜ/. The hatsuon is a very special object in Japanese phonology, I think it is better to have a specific symbol.
  • v is not relevant as is it an allophone of [b] which many native speakers still cannot distinguish from. v is relevant for the transliteration of kana, but not for pronunciation, in my opinion.
  • Also, I think we can ignore in first approach the consonnant ŋ (鼻濁音) because it mostly appears in refined language and its phonemic relevance is debated/minor. But there is certainly room for discussion.
Also, regarding the pitch accent representation, I think that showing the fall in pitch is sufficient. I don't think we need to show any sign when the word follows the "flat" pattern 平板型. Maidodo (talk) 05:19, 13 September 2024 (UTC)Reply
@Maidodo I agree with most of those, though I think we should (eventually) include bidakuon pronunciations as alternatives (although they would need to be clearly marked as such). I also think showing a flat Heiban pattern helps to clearly demark it from ponunciations where no pitch accent has been specified yet. Theknightwho (talk) 05:45, 13 September 2024 (UTC)Reply
@Theknightwho Thank you. Regarding ŋ (bidakuon) I think that the questions is do we see it as phonemically significant or not. The short answer seems to be yes, though. The minimal pair 大釜/大蝦蟇 is relatively well-known but, despite rules described for example in the Shinmeikai Nihongo Akusento Jiten, whether using the bidakuon or not is almost never consensual among native speakers (also probably due to dialectal influences). That said, the Shinmeikai Nihongo Akusento Jiten generally made clear choices between normal 'g' and 'ŋ' (using a handakuten on the カ行 kana), based on phonological arguments. I was suggesting to ignore bidakuon in phonemic guide in the sake of simplicity, but I don't have a strong argument. When such a standard dictionary show the bidakuon as standard, one option could be to show both, as you mentioned.
Regarding the pitch accent, showing the flat accent pattern in kana only (for example using the good old convention of the upper line on the second and subsequent moras - while this is actually criticized by the latest version of the NHK dictionary) seems sufficient to demark it. I am not sure we need to have also some sort of mark in the /phonemic/ representation. But maybe I am too worried of the phonemic representation becoming too difficult to read. Maidodo (talk) 06:19, 13 September 2024 (UTC)Reply
@Maidodo Ah sorry - I misunderstood what you meant. Yes, I agree that we shouldn't show anything in the phonemic IPA for pitch accent if there's no downstep, as there's nothing we could actually show. Theknightwho (talk) 06:29, 13 September 2024 (UTC)Reply
No worries, my English often needs some improvement. Indeed, one might argue that heibangata exhibits a rise in pitch starting from the second mora, but this is not an accent per se (Note: it often does not occur when words are connected in sentences).
I am glad to see that we have convergent understanding and views on phonemic IPA.
As for the honoo (炎) example, /ꜜhonoo/, as far as the standard pronunciation is concerned, the dictionaries recommend a double o (as opposed to a long o), so, in theory I supposed that the phonetic IPA of the /noo/ part should differ from the /noː/ as in 農場.
EDIT: or maybe it is because the IPA does repeat the same vowel to show the long vowels... Sorry I am not knowledgeable to high-level phonetics...
I remain at your disposal for further discussion if necessary. Maidodo (talk) 08:40, 13 September 2024 (UTC)Reply
Sorry @Theknightwho – Actually, I was thinking about this question: is the apostrophe necessary in phonemic IPA? I don't think we need it if we use a distinctive sign for the special phoneme (mora) ん (hatsuon). Just to confirm your views. I am suggesting ɴ because (1) It is one of the sounds that ん can represent and (2) it resembles the /N/ which is a usual convention (but personally I don't like using an upper letter in the middle of a word – more an esthetic point of view. Maidodo (talk) 08:45, 13 September 2024 (UTC)Reply
@Maidodo So we currently use the apostrophe for two different things:
  1. To disambiguate when it occurs before a vowel, e.g. 近縁(きんえん) (kin'en), as opposed to 記念(きねん) (kinen); when used like this, the apostrophe doesn't represent an independent phoneme, but merely clarifies the romaji, so shouldn't be represented in IPA. The vast majority of apostrophes are of this type. (Edit: there's also the extremely marginal case of ヴィーンヌィツャ (Vīnnwitsya), which needs a disambiguation apostrophe somewhere due to ヌィ (nwi) being romanised in the same way as ンウィ (nwi), but I'm not sure how we should deal with that.)
  2. To represent as /ʔ/. These are marginal, occurring mostly at the ends of interjections (e.g. あっ (a')), but sometimes in other places too (e.g. ハガッニャ (Haga'nya, Hagåtña)). These do need to be included, either as /ʔ/ or the archiphoneme /Q/ obsolete or nonstandard characters (Q), replace Q with ꞯ, invalid IPA characters (Q).
As a side point, this means things start getting weird with the romanisation if you have a sequence like んっあ: it can't be n'a, as that already represents んあ, and n''a looks like it should represent んっっあ, so both choices are terrible. That being said, the only instance I could find of this in use is some song called 『んっあっあっ。』, so it's probably fine. If we wanted to be really strict about it, I guess we could use q instead of the apostrophe for this (e.g. あっ (aq), ハガッニャ (Hagaqnya) etc.). Theknightwho (talk) 10:56, 13 September 2024 (UTC)Reply
Thank you @Theknightwho
I thought the logic was simply to convert ン to a certain (unique) IPA sign (e.g. ɴ), but from what I understand from the examples you provided, it seems that the phonemic IPA is deducted from the romanized word.
In my imagination, it was more like the below:
(1) Japanese kana spelling  こんにちは
(2) "Phonetic" kana spelling  こんにち  (the bold font shows the mora that preceeds the fall in pitch ; bars are better but I can't make them appear here)
(3) Phonemic IPA  /koɴnit͡ɕiwaꜜ/
(4) IPA (phonetic)
Going from (1) to (2) cannot be done automatically, because of the pitch accent information which is impossible of course to deduct from (1) and also due to the cases were the phonetic kana is different from the orthography (while there are cases the conversion is predictible, especially if we use dots to separate morphemes, such as 王・おう converted into オー and 追う・お.う converted into オウ). Going from (2) to (3) is automatic as far as I can imagine.
Sorry for not understanding the logic currently built in the wiki. Feel free to ignore this message if it is not contributing to the broader discussion. Maidodo (talk) 23:55, 13 September 2024 (UTC)Reply
@Maidodo Yes, your deduction is essentially correct. The transliteration module converts from canonical kana to phonetic kana using heuristics, which can be modified manually using . and so on, and then converts the result into romaji. The pronunciation module works by initially calling the transliteration module (using slightly modified logic to account for eiee and so on), and then converts the romaji result into IPA. So all-in-all, the current output is generated by the following steps: (1) kana → (2) phonetic kana → (3) transliteration → (4) phonetic IPA. The new version will be a bit different, as I'll just convert directly from phonetic kana into the relevant output (i.e. (1) kana → (2) phonetic kana → (3) transliteration / phonemic IPA / phonetic IPA), cutting out the mandatory conversion into romaji, as it reduces the chance of information loss due to ambiguity, which can arise in certain edge cases. 00:20, 14 September 2024 (UTC) Theknightwho (talk) 00:20, 14 September 2024 (UTC)Reply
@Theknightwho My thanks! Sounds very good. Maidodo (talk) 00:12, 17 September 2024 (UTC)Reply
@Eirikr, @Maidodo, @Theknightwho: Pitch accent in Japanese is a phenomenon that regards syllables, not moras. One important distinction here is between long vowels and double vowels (I don't agree with these labels, but that's what they use on Wikipedia, so I'll just go with them). In normal speed speech they are pronounced the same, but they have different phonemic structures. Long vowels belong to one syllable, double vowels are one syllable each:
  • (hónoo) → /hó.no.o/ [hó.noː] VS お脳 (onō) → /o.noo/ [o.noː]
Long vowels (or better, long syllables, including those ending in and -q) can only have the pitch downstep on their first (or only) vowel; double vowels (=two syllables), on the other hand, can have the pitch downstep on either vowel (being independent syllables). We should keep this well in mind when deciding how to indicate the pitch accent on long syllables.
On a side note: I hate the pitch downstep symbol (ꜜ), I'd much prefer using the high tone marker on the vowel of the accented syllable, being that really the only necessary phonemic piece of information you need to know to pronounce a Japanese word, while at the same time making transliterations and phonemic/phonetic transcriptions much easier to handle, much cleaner and easier to read. Here some more examples:
  • 覆おう (ooô) /o.o.óo/ [o̞ːó̞ː] (volitional of 覆う (oóu) /o.ó.u/ [o̞ó̞ɯ̟])
  • 鳳凰 (hōô) /hoo.óo/ [ho̞ːó̞ː]
  • 病院内 (byōínnai) /bjoo.íɴ.nai/ [bʲo̞ːĩ́ɴ.na̠i] (from 病院 (byōin) /bjoo.iɴ/ [bʲo̞ːĩɴ])
  • 経験者 (keikénsha) /kei.kéɴ.ɕa/ [ke̞ːkẽ̞́ɴɕa̠] (from 経験 (keiken) /kei.keɴ invalid IPA characters (/) [ke̞ːkẽ̞ɴ])
  • 時計塔 (tokéitō) /to.kéi.too/ [to̞ké̞ːto̞ː] (from 時計 (tokei) /to.kei/ [to̞ke̞ː])
  • 里親 (satooya) /sa.to.o.ja/ [sa̠to̞ːja̠] VS 砂糖屋 (satōya) /sa.too.ja/ [sa̠to̞ːja̠]
etc.
And I would also get rid of all the diacritics on the vowels... Japanese has only 5 vowels and they are always pronounced the same, so why are we being so anal on exactly how high or low or whatever a vowel is? It's just unjustified visual noise. — Sartma 𒁾𒁉𒊭 𒌑𒊑𒀉𒁲 08:46, 16 September 2024 (UTC)Reply
@Eirikr, @Maidodo, @Theknightwho: We should also get rid of the Atamadaka, Odaka, Nakadaka + [number] nonsense. Apart from those categories being quite weird and poorly thought to begin with (the redundancy of things like "Atamadaka [1]"! What else could it be? Lol), well thought and written phonemic and phonetic transcriptions would be more than enough. Those categories used in Japanese traditional grammar books are completely unnecessary and only confusing to an English reader. — Sartma 𒁾𒁉𒊭 𒌑𒊑𒀉𒁲 09:19, 16 September 2024 (UTC)Reply
Hello @Sartma:
  • Re: pitch markers solely on the last high-pitched mora, I must disagree: that implies that all other morae are low pitch, which is incorrect. We cannot assume that all readers are familiar with Tokyo-standard broadcast Japanese pitch patterns on the one hand, and on the other, some of our entries include pitch information for other dialects, which do not all have the same pitch structure as Tokyo Japanese. See also the interesting table at ja:w:日本語の方言のアクセント#各方言の比較表, showing various patterns for two-mora nouns.
  • Re: the downstep symbol , I personally find this helpful for clarifying; moreover, this is standard notation for marking downsteps. See also w:Japanese pitch accent, which makes ample use of the symbol. Without this, we cannot show the proper pitch pattern for odaka words like (ki) or (hana) or (tsubomi).
  • Re: diacritics on vowels, if the [brackets] transcription is intended to be phonetic, then it should be phonetic, and that involves annotating phonetic details like vowel height. In addition, we cannot structure our Japanese entries to cater to those readers who are already familiar with Japanese, so from a usability perspective, it would be inappropriate to provide a [phonetic transcription] that intentionally omits important information. What you describe with limited diacritics is the phonemic transcription, for which we use /slashes/, and you will note in @Theknightwho's example further above in the table that the /phonemic transcription/ is simpler and only has the pitch diacritics on the vowels.
  • Re: labels and numbers both, this was an intentional decision years ago to provide standardized and comprehensive pitch-pattern information as described in Japanese references. The terms atamadaka etc. are used in Japanese to describe the pitch contours of a given word, and indeed we see these used (with the suffix (-gata, type)) over in the JA WP article at ja:w:アクセント#共通語のアクセント. I agree that an English-language reader won't know these words to start with — but note that they are linked through to the relevant entries, and thus these labels exhibit good discoverability, and give readers more information that is useful for talking about Japanese pitch-accent patterns.
I hope that helps provide some background for the reasons for our Japanese pronunciation notation. ‑‑ Eiríkr Útlendi │Tala við mig 18:07, 16 September 2024 (UTC)Reply
Hi @Eirikr. To your points:
  • Re: pitch markers solely on the last high-pitched mora: I agree that most readers won't be familiar with how the Standard Japanese pitch accent works, so I'm ok with showing high/low pitch moræ individually in phonetic transcriptions. I don't particularly like that the first mora of an initial unaccented syllable is given as low, because it's simply wrong information. The hight of the first mora in an initial unaccented syllable is determined by a phrasal pitch pattern, not a word one. So one says íkéń, but sònó íkéń wá. As a matter of fact, Japanese native speakers will always hear a raise in tone at the beginning of a non-initial accented word even if both moræ are pronounced with the same pitch. It's a well known phenomenon. I would definitely not show the pitch for each mora in the phonemic transcription. The only phonemic relevant information one needs to know how to pronounce a Standard Japanese word is 1) whether it is accented or not, and 2) if it is accented, what syllable does the accent fall on. Any other information is not phonemically relevant and should not appear in a phonemic transcription. This is true for the vast majority of dialects, too, so I can't see many issues there either.
  • Re: the downstep symbol : I meant that note as a mere personal preference, and limited to phonemic transcriptions; mainly because it's easier and clearer, especially if we decide to go for the ː symbol instead of duplicating the vowel ([ó̞ː] is clearer than [o̞ꜜː]). I'd personally always use double vowels, though (so called long vowels in Japanese are monotimbric diphtongs, more than long vowels), so if we want to go for , I certainly wouldn't be opposed to it. Just to address the words you listed, I would transcribe them phonemically something like this: /kí/, /ha.ná/, /ʦu.bo.mi/ replace ʦ with t͡s, invalid IPA characters (ʦ). I don't see any issue with that, and it's much, much nicer than the downstep symbol.
  • Re: diacritics on vowels: This was also just a preference. The current phonetic transcriptions aren't really precise anyway, so I would get rid of the vowels diacritics too, but I'm fine if they stay.
  • Re: labels and numbers both: I find this completely nonsensical. We don't go around adding tronca, piana, sdrucciola or bisdrucciola for Italian, nor we add ὀξύτονος (oxútonos), παροξύτονος (paroxútonos), προπαροξύτονος (proparoxútonos), περισπώμενος (perispṓmenos) or προπερισπώμενος (properispṓmenos) for Ancient Greek (despite those being much better and more precise labels than the Japanese ones, which are not even sufficient on their own but need a number to go with them - even when it's completely redundant, like for atamadaka and odaka - nor actually reflect how the Japanese pitch accent works, indicating moræ when the base unit is actually the syllable...). Why on earth are we doing this for Japanese only? — Sartma 𒁾𒁉𒊭 𒌑𒊑𒀉𒁲 10:18, 18 September 2024 (UTC)Reply
@Eirikr: Thinking about it, it actually makes no sense at all to use the downstep symbol in a phonetic transcription. What would be the difference between [ɸɯ̟́ꜜsà̠ì] and [ɸɯ̟́sà̠ì], or [kõ̞̀ɲ̟́ːít͡ɕí β̞á̠ꜜ] and [kõ̞̀ɲ̟́ːít͡ɕí β̞á̠]? None! In the first case (fúsai) the use of acute and grave accents already shows the downstep, and in the second one (konnichiwá), there is, phonetically, no downstep at all! The phonetic realisation of an odakagata and a heibangata is identical!
By the way, why do we transliterate <konnichi wa> as two words? It's one word, it should be <konnichiwá>. — Sartma 𒁾𒁉𒊭 𒌑𒊑𒀉𒁲 13:24, 18 September 2024 (UTC)Reply
Just realised that the spelling <konnichi wa> is probably due to the pronunciation template not supporting overwriting the transliteration, and a space is needed for the to be transliterated as /wa/. I guess we will be able to correct this once we have a new template. — Sartma 𒁾𒁉𒊭 𒌑𒊑𒀉𒁲 14:40, 18 September 2024 (UTC)Reply
@Sartma Just on your point about παροξύτονος (paroxútonos) and so on - we definitely do use their direct translations ("paroxytone" etc.), so I think it's fine to use the transliterated names of the pitch accents, as they're reasonably common in Japanese learning materials. I agree we shouldn't have "尾高型 (Odaka – [5])" or whatever, though: too many Japanese entries are padded out with crap like "rendaku (連濁)", instead of "rendaku", and it just feels like certain editors think of Japanese as this super-special language that can never be properly translated, even though we don't (as you say) do this for any other language, and there are straightforward solutions that don't involve interspersing lots of kanji/kana in English text. Theknightwho (talk) 15:35, 18 September 2024 (UTC)Reply
@Theknightwho: Sorry, I should have specified. I meant here on Wiktionary, not in general. In Italian and Ancient Greek learning materials you also find the labels I listed above. You don't find them in a dictionary entry, though, because they are irrelevant. Those terms belong in a Wikipedia article about how Japanese modern scholars analysed and labeled their language, not here on Wiktionary. See for instance οἶκος (oîkos): nowhere we write it's properispomenon. It makes no sense, especially considering that other forms of the word might not be, like the genitive, which is a paroxytone.
The same is true for Japanese. 日本 (nihón) is nakadakagata [2], but when it appears as first element in a compound, like 日本政府 (nihon séifu), then it becomes heibangata. Telling people what the name of an accent pattern is, is irrelevant to the entry in question, and only creates confusion for no good reason.
I don't know what it is about Japanese that makes people think we can't treat it the same as any other language. Too much 日本人論? I don't know. But we need to start seeing sense and give people clear, relevant and useful information, not a bunch of cryptical, imprecise and confusing elements just for the sake of filling up space. — Sartma 𒁾𒁉𒊭 𒌑𒊑𒀉𒁲 20:54, 18 September 2024 (UTC)Reply
At least in phonemic IPA, showing the fall in pitch after a odaka word is usefull. For the rest I don't fully understand all the assertions (I am only contributing up to the phonemic layer), but, in general, I believe that Wiktionary should try to find a good balance between standardization and the way Japanese is described in a native context, in the sake of Wiktionary being a practical tool for a large audience of learners and even native speakers. Maidodo (talk) 01:09, 19 September 2024 (UTC)Reply
@Maidodo: Yes, I agree. The phonemic transcriptions must show the accented syllable (=where the pitch downstep happens) in all accented words, odakagata included. But not even Japanese monolingual dictionaries tell you the technical name of the accent pattern for each entry. They either give you nothing (the vast majority of them) or give you the number of the accented mora (a tiny minority of them). I'm not against giving just the numbers, if we must; but it would still be redundant (ergo possibly confusing) information, if we show the accented mora in the phonemic transcription: if people can count, they'll be able to get the number by themselves, should they need it; if they cannot count, the number to them is useless anyway. — Sartma 𒁾𒁉𒊭 𒌑𒊑𒀉𒁲 06:02, 19 September 2024 (UTC)Reply
A draft of what I'd like to see:
TermOldNew
今日は (konnichiwá)
  • (Tokyo) んにちは [kòńníchí wáꜜ] (Odaka – [5])
  • IPA(key): [kõ̞ɲ̟ːit͡ɕi β̞a̠]
  • (Tōkyō) IPA(key): /koɴ.ni.t͡ɕi.wá/, [kõ̞̀ɲ̟́ɲ̟́ít͡ɕíβ̞á̠]
    • Phonetic kana: ンニチワ
夫妻 (fúsai)
  • (Tōkyō) IPA(key): /fú.sai/, [ɸɯ̟́sà̠ì]
    • Phonetic kana: サイ
  • (Tōkyō) IPA(key): /fu.sái/, [ɸɨ̥̀sá̠ì]
    • Phonetic kana:
Cases like 夫妻 (fúsai) (where the accent shift to the second syllable is simply determined by the devoicing of the first one) should probably be automatised, since it's a regular phenomenon. — Sartma 𒁾𒁉𒊭 𒌑𒊑𒀉𒁲 08:31, 20 September 2024 (UTC)Reply
I like the devoicing mark for the devoiced kana (フ circled). I am not convinced that the dots between syllables are adding some value, since, at least from a phonemic standpoint, we always think in terms of moras. Maidodo (talk) 13:42, 20 September 2024 (UTC)Reply
@Sartma @Maidodo On a semi-related note, I've been meaning to suggest that we change the transliteration scheme so that is transliterated as "q" instead of an apostrophe in the rare cases when it doesn't double the following consonant (e.g. at the ends of words, or before vowels). There is no agreed-upon standard for what to do in those situations, so we have some scope to choose what we want to do, but I'd really like to get rid of the issue of ' being used for two completely different things, because it just leads to confusion. What do you think? Theknightwho (talk) 18:22, 20 September 2024 (UTC)Reply
@Maidodo: I guess we don't have to use the dots, but then how are we showing the phonemic difference I was pointing out above between the so called "long vowels" and "double vowels"? For instance:
  1. 運動 (undō) → /uɴ.doo/ (/oo/ = long vowel)
  2. (hónoo) → /hó.no.o/ (/-o.o/ = double vowel)
Phonetically, those two "long /o/" are pronounced the same [o̞ː], but phonemically they behave differently. If, say, we attach the suffix 〜会 (-kai), that requires the accent to fall on the preceding syllable, this is what we get:
  1. 運動会 (undôkai) → /uɴ.dóo.kai/
  2. 炎会 (honoókai) → /ho.no.ó.kai/
As you can see, it's clear that in (1) the "preceding syllable" is /doo/, while in (2) it's /o/.
I wouldn't know how else to make that clear in the phonemic transcription without using the dots. — Sartma 𒁾𒁉𒊭 𒌑𒊑𒀉𒁲 19:06, 20 September 2024 (UTC)Reply
@Sartma @Theknightwho
Regarding long vowels (長音), I am more in favor a unified phonemic sign, as ː
It would be a one-to-one correspondence with the ー (chōonpu) used in the context of (phonemic) kana.
運動(うんどう) ウンドー /uɴdoː/
公(おおやけ) オーヤケ /oːjake/
炎(ほのお) ホノオ /hoꜜnoo/

As for the sokuon in unconventional places (very exceptional cases), I think that Theknightwho's suggestion is good (/q/ or /Q/). I wonder if using a more symbolic sign as as /ⓠ/ or /ʔ/ would also make sense, as it would mititage the risk of misunderstanding with a /k/ phoneme. Maidodo (talk) 00:42, 21 September 2024 (UTC)Reply
@Maidodo:
  • If we use ː, though, how do you envision indicating an accented long vowel? For instance, 鳳凰 (hōô): would it be /hoːoꜜː/? or /hoːoːꜜ/? I find both unclear and confusing, compared to /hoo.óo/ or even /hoo.oꜜo/. I don't remember who said that earlier in this thread, but I do agree that doubling vowels is the best choice for Japanese, considering the phonemic and phonetic importance of moræ in this language. One of the reasons I was saying I don't agree with the difference between "long vowels" and "double vowels" is that I consider all long vowels in JP as double vowels (or moraic diphthongs, whether monotimbric or not). The only difference is whether there is a syllable boundary or not, and that is purely phonemic information (hence why I'd show the syllabification with the dots).
  • As for what to use in the phonetic kana for long vowels, I'm not too bothered either way. The 長音 (chôon) sign () works, too. How will we treat words like 経緯 (kêi) vs ケーキ (kêki), though? Will we use the 長音 (chôon) for both? 経緯 (ケーイ) and ケーキ, or we'll keep the for 経緯 (ケイイ)?
  • As for the /q/ issue: I feel we shouldn't use /q/, mainly because it's a very technical transliteration no student ever sees in transliterated Japanese (you only find that in some Japanese linguistics books). I'm not even 100% sure that the <> used in written phonetic representation of various onomatopeic sounds is actually always a /q/ or a /ʔ/... That being said, I don't really have a strong argument against it either. And the cases are so marginal anyway. Very often Japanese people themselves have no clue about how to pronounce it in katakana transliterations of foreign words like ハガッニャ (Haga'nya, “Hagåtña”)), being those pretty much always a katakana transcription of the original alphabetic spelling more than being based on how a Japanese person would actually pronounce it... Most people can't even pronounce much easier things like ヴェネツィア (venetsia), saying ベネチア (benechia) instead, so I wonder how they would actually say any of the examples given above by @Theknightwho... — Sartma 𒁾𒁉𒊭 𒌑𒊑𒀉𒁲 06:47, 21 September 2024 (UTC)Reply
@Sartma I meant using q instead of ' in transliteration for those edge-cases where we can’t double the next consonant, not the IPA transcriptions. It would only affect a small handful of words anyway, as the vast majority would use doubled-consonants. Theknightwho (talk) 12:47, 21 September 2024 (UTC)Reply
@Theknightwho: Yeah, I understood. I think it would be quite confusing to students, but I don't have a better solution... Part of me thinks we should probably use the exclamation mark (!), like:
It might be a bit weird, but I think it would do the trick... — Sartma 𒁾𒁉𒊭 𒌑𒊑𒀉𒁲 15:42, 21 September 2024 (UTC)Reply
@Sartma I like it for あっ (a!), but it looks like "l" in ハガッニャ (Haga!nya), and I think it'd be an issue when we're romanising any usage examples that have in them (I'm sure there are a few). Theknightwho (talk) 17:17, 21 September 2024 (UTC)Reply
@Theknightwho. True. What about making it superscript? あっ (a!)? Still weird, but closer to the ⟨'⟩:
「あっ!」 (a!!)
「あっ!」 (a'!)
Or maybe something like this:
「あっ!」 (a·!)
んっあっあっ (n·a·a·)
ハガッニャ (Haga·nya)
I don't know... As I said, I don't really have any better proposal. I just find the ⟨q⟩ confusing. People are going to wonder how they're supposed to read it. On the other hand, people who know enough Japanese will understand whatever sign we use. — Sartma 𒁾𒁉𒊭 𒌑𒊑𒀉𒁲 17:42, 21 September 2024 (UTC)Reply
@Sartma I quite like ·, since we just use a space for Japanese so there's no confusion. My preferred option is still q, but what are your thoughts @Eirikr? Theknightwho (talk) 17:52, 21 September 2024 (UTC)Reply
Also pinging @Fish bowl, Nardog, who may have thoughts on the transliteration issue; specifically: what we should use for in the places where we currently use an apostrophe, like あっ (a') and ハガッニャ (Haga'nya) as it's annoying/confusing to have the apostrophe serve two different functions in transliteration. Theknightwho (talk) 17:57, 21 September 2024 (UTC)Reply
Module_talk:ja#romanization_of_~っFish bowl (talk) 18:13, 21 September 2024 (UTC)Reply
@Theknightwho: with ⟨q⟩ it would look like this:
  • 「あっ!」 (aq!)
  • んっあっあっ (nqaqaq)
  • ハガッニャ (Hagaqnya)
Sartma 𒁾𒁉𒊭 𒌑𒊑𒀉𒁲 18:49, 21 September 2024 (UTC)Reply
@Theknightwho: As for ヴィーンヌィツャ the apostrophe would go after the first ⟨n⟩: Vīn'nwitsya. ンウィ would be transliterated as n'wi. — Sartma 𒁾𒁉𒊭 𒌑𒊑𒀉𒁲 17:03, 21 September 2024 (UTC)Reply

Multiple accent phrases

[edit]

@Theknightwho: You've written somewhere on this talk page that you're working on a rewrite of this module. One important feature that I'd like to request is the ability to specify accent phrases. This is especially important for 四字熟語s but also comes in handy for phrases.

Both my printed NHK accent dictionary as well as my digital copy denote separation of accent phrases using the easy-to-type character , a convention I'd suggest we follow as well. To give an example, 粉骨砕身, which is often (I'd say in the majority of cases even) pronounced with two accent phrases, could be transcribed using {{ja-pron|ふんこつ・さいしん|acc=1-0|dev=4}} and {{ja-pron|ふんこつ さいしん|acc=0|dev=4}} (both would have to be present in the entry).

I don't know how to best combine these two pronunciations (with different numbers of accent phrases) into one invocation of {{ja-pron}}. One idea would be to allow writing pitch information into the kana like this: {{ja-pron|ふ\んこつ・さいしん|ふんこつ さいしん|dev=4}} (alternatively \ instead of \). Another would be to use Benwing's inline parameters like this: {{ja-pron|ふんこつ・さいしん<acc:1-0>|ふんこつ さいしん<acc:0>|dev=4}}

Also, please be aware that some Japanese editors currently use handwritten code because there's no good solution using {{ja-pron}} (完全無欠, 頭が固い, etc.). These could then all be converted to {{ja-pron}}. — Fytcha T | L | C 21:46, 14 September 2024 (UTC)Reply

@Fytcha Thanks - yes, I'd noticed the manually-added version in a few pronunciation sections, and I agree we should be incorporating them into the template.
If we use for this, would that cause problems for terms which are spelled with it? Or, to put it a different way, can we guarantee that will always mark the end of a pitch accent phrase? See Category:Japanese terms spelled with ・ for reference. Theknightwho (talk) 19:49, 15 September 2024 (UTC)Reply
@Theknightwho: Good catch, there are indeed entries that are spelled with ・ but pronounced without an accent phrase boundary (ティー・シャツ was the first one I saw when I opened the above category). I'm not really sure what the best solution would be in that case. We could make it so that there are multiple permissible characters to denote accent phrase boundaries (e.g. '・' and '、'), each of which may only be used if it doesn't appear in the page title. This is a slightly ugly solution but I'm not sure if we manage to come up with a single sensible character for pitch accent boundaries that is easy to type with the Japanese layout. — Fytcha T | L | C 20:49, 20 November 2024 (UTC)Reply
More examples of pages with this issue:
ビタミンB1
ビタミンB2
クロイツフェルト・ヤコブ病
マルクス・レーニン主義
ビタミンD2 Shlyst (talk) 22:17, 13 November 2024 (UTC)Reply
Another one: 骨粗鬆症 Shlyst (talk) 20:59, 23 November 2024 (UTC)Reply