Module talk:cs-pronunciation
Move
[edit]Should be moved to Module:cs-IPA to match other languages' modules. —CodeCat 20:57, 16 February 2017 (UTC)
- Well, currently 23 modules in Category:Pronunciation modules use -pron, 11 use -IPA, 6 use -pronunciation, and 4 use -pronunc. So at the moment -IPA does not hold the majority. I do wish the modules were named systematically, though. — Eru·tuon 21:07, 16 February 2017 (UTC)
- Renaming this one would be a start. —CodeCat 21:08, 16 February 2017 (UTC)
- But why should it be -IPA? There has been no vote on this subject. — Eru·tuon 21:11, 16 February 2017 (UTC)
- @Erutuon Thanks for the module. Are you able to add a test cases module? It would be useful to track failed cases and check if everything is working before deploying it. I might add some. I agree with the move request. It was decided in a discussion but the moves haven't all completed. --Anatoli T. (обсудить/вклад) 21:19, 16 February 2017 (UTC)
- @Atitarev: Oh, it's been decided? Can you find the discussion? It should be mentioned and linked to on the page Category:Pronunciation modules, so that it will be implemented. Or maybe I can find it. — Eru·tuon 22:01, 16 February 2017 (UTC)
- Since this a new module, you can move without any impact or extra work. The test module is more important, though.--Anatoli T. (обсудить/вклад) 22:14, 16 February 2017 (UTC)
- There, I've created the testcases module. — Eru·tuon 22:15, 16 February 2017 (UTC)
- Thanks! Please also consider "phonetic respellings" for loanwords or for irregularly pronounced words. --Anatoli T. (обсудить/вклад) 22:36, 16 February 2017 (UTC)
- There, I've created the testcases module. — Eru·tuon 22:15, 16 February 2017 (UTC)
- Since this a new module, you can move without any impact or extra work. The test module is more important, though.--Anatoli T. (обсудить/вклад) 22:14, 16 February 2017 (UTC)
- @Atitarev: Oh, it's been decided? Can you find the discussion? It should be mentioned and linked to on the page Category:Pronunciation modules, so that it will be implemented. Or maybe I can find it. — Eru·tuon 22:01, 16 February 2017 (UTC)
- @Erutuon Thanks for the module. Are you able to add a test cases module? It would be useful to track failed cases and check if everything is working before deploying it. I might add some. I agree with the move request. It was decided in a discussion but the moves haven't all completed. --Anatoli T. (обсудить/вклад) 21:19, 16 February 2017 (UTC)
- But why should it be -IPA? There has been no vote on this subject. — Eru·tuon 21:11, 16 February 2017 (UTC)
- Renaming this one would be a start. —CodeCat 21:08, 16 February 2017 (UTC)
- @Erutuon I have tried adding a multipart word but it crashed on the space: check_output("na shledanou", "na zɦlɛdanou̯"). Maybe a better error-handling required and please allow SoP's to be included in the test and "phonetic respellings" (above). --Anatoli T. (обсудить/вклад) 02:07, 17 February 2017 (UTC)
Phonetic respellings - feature request
[edit]@Erutuon Other modules, e.g. Russian, French, German, use phonetic respellings. E.g. "na shledanou" is pronounced as "na schledanou" [na sxlɛdanou̯] in Moravian but "na shledanou" [na zɦlɛdanou̯] in Bohemian. The Moravian one would require a phonetic respelling (e.g. ...|phon=na schledanou...). --Anatoli T. (обсудить/вклад) 03:28, 17 February 2017 (UTC)
- Just a little remark: in fact it is the other way, in Moravia it is sometimes pronounced [na zɦlɛdanou̯] and in Bohemia [na sxlɛdanou̯]. --Jan Kameníček (talk) 18:47, 17 February 2017 (UTC)
- That feature is already supported: you just enter the phonetic respelling in the first parameter. For instance,
{{cs-IPA|na schledanou}}
→ IPA(key): [na sxlɛdanou̯]. It will work as long as the respelling does not attempt to override voicing assimilation or final devoicing. — Eru·tuon 05:35, 17 February 2017 (UTC)- @Erutuon Thanks. Is this feature supported in the test module, so that it links to the right spelling and displays the respelling and uses the IPA from the latter? --Anatoli T. (обсудить/вклад) 05:48, 17 February 2017 (UTC)
- @Atitarev: Hmm, no. It would link to the respelled form. I could add another parameter for the entry name, but not sure what it should be called. Perhaps
|term=
. — Eru·tuon 06:02, 17 February 2017 (UTC)- Yes, whatever works. The Russian, German and French modules do that. User:Benwing2 or User:Wyang could help with technical details. --Anatoli T. (обсудить/вклад) 06:09, 17 February 2017 (UTC)
- @Atitarev: Hmm, no. It would link to the respelled form. I could add another parameter for the entry name, but not sure what it should be called. Perhaps
- @Erutuon Thanks. Is this feature supported in the test module, so that it links to the right spelling and displays the respelling and uses the IPA from the latter? --Anatoli T. (обсудить/вклад) 05:48, 17 February 2017 (UTC)
Velar nasal
[edit]I think the phoneme ŋ needs to be added. For example with the word banka it produces {{#invoke:cs-pronunciation|example|banka}} [banka], while the correct pronunciation is [baŋka]. --Jan Kameníček (talk) 18:52, 17 February 2017 (UTC)
- @Jan.Kamenicek I've added a new test case in Module:cs-pronunciation/testcases. @Erutuon, also, FYI, @Dan Polansky. --Anatoli T. (обсудить/вклад) 22:46, 17 February 2017 (UTC)
- @Atitarev: Thanks! @Jan.Kamenicek: I've added a rule for assimilation of n before a velar.
- I wonder, are there any examples of n before a palatal (spelled ť or ď before most vowels, or spelled t or d before ě, i, or í)? I imagine it would assimilate, but maybe it would be written ň. — Eru·tuon 23:37, 17 February 2017 (UTC)
- @Erutuon: Yes, there are and they are pronounced [n], for example pindík being pronounced [pɪnɟiːk] or Mendělejev pronounced [mɛnɟɛlɛjɛf]. --Jan Kameníček (talk) 10:08, 18 February 2017 (UTC)
- I wonder, are there any examples of n before a palatal (spelled ť or ď before most vowels, or spelled t or d before ě, i, or í)? I imagine it would assimilate, but maybe it would be written ň. — Eru·tuon 23:37, 17 February 2017 (UTC)
Stress
[edit]The module now adds a stress mark at the beginning of the transcription if the term has more than one syllable and does not contain a space. This is probably incomplete; the Wikipedia article says that there's also secondary stress. But the article doesn't give detailed rules for secondary stress; it says secondary stress is usually on every odd syllable, which implies that sometimes stress isn't on every odd syllable. So I'm leaving out secondary stress for now. — Eru·tuon 04:50, 18 February 2017 (UTC)
- The usage of secondary stress cannot be described by a simple algorithm. E. g. in 5-syllable words it is usually (but not always) on the 4th syllable, see [1]. It is often more often used when speaking slowly and carefully or when reciting poetry, while it tends to disappear when speaking quickly. --Jan Kameníček (talk) 11:05, 18 February 2017 (UTC)
- Stress marks are now showing on monosyllables, such as the first example in the documentation, tři, or krk in Czech but not Slovak, and even the letter č. The function add_stress() at line 252 gets a syllable_count but doesn't use it. Should this be fixed? I don't dare touch the code myself. --Hiztegilari (talk) 14:46, 1 February 2024 (UTC)
Categorization by number of syllables
[edit]Unlike {{IPA}}
, {{cs-IPA}}
does not currently categorize by the number of syllables. Maybe it better should. --Dan Polansky (talk) 14:11, 18 February 2017 (UTC)
- @Dan Polansky I concur. --Lo Ximiendo (talk) 09:09, 19 December 2018 (UTC)
IPA for č and š
[edit]The discussion about incorporating ts and t͡s (or tʃ and t͡ʃ) into the module started at Wiktionary talk:About Czech#Czech pronunciation module, but it should probably be lead here, where more people might join. It had also been discussed at User_talk:Jan.Kamenicek#Czech.
Dan Polansky suggested using the syllable mark "." (e. g. /pot.ʃaːlɛk/ instead of /potʃaːlɛk/). I argued on my talk page that the mark "." is primarily used to show syllables and not pronunciation, and because it is not used in other multisyllable words, the reader does not expect it. Thus the reader might still be confused how to read e. g. /potʃiːt/. Using the syllable mark only sometimes is not a systematic attitude. I believe that we should either use the syllable mark to show the syllables with all multisyllable words, or not to use it at all and strictly distinguish t͡ʃ and tʃ (počít /pot͡ʃiːt/ vs. podšít /potʃiːt/) in Czech pronunciation instead. --Jan Kameníček (talk) 17:13, 18 February 2017 (UTC)
- I added heading "IPA for č and š".
- Using syllable mark to show something is not č or š is systematic in so far as it is done systematically. The rule is "indicate syllable separation only if confusion with č or š is possible". It is not as simple as the rule "always indicate syllable seaparation" but still simple enough. It seems to be a matter of taste. --Dan Polansky (talk) 17:18, 18 February 2017 (UTC)
Non-palatal t, d, n before i or í
[edit]I added |nopalat=1
to prevent t, d, and n from being transcribed as palatal before i or í in words such as identifikovat, but apparently there are words like antihrdina in which the prefix has a non-palatal and the root has a palatal. So perhaps a better solution would be to require the editor to enter a phonetic respelling with y and ý instead of i or í: i.e., entering {{cs-IPA|antyhrdina}}
in order to get IPA(key): [ˈantɪɦr̩ɟɪna] with non-palatal t. Otherwise, a less elegant solution would be to add some symbol after t
, d
, or n
, perhaps '
, which would prevent the letter from being transcribed as palatal, and would then be removed from the IPA transcription. — Eru·tuon 20:04, 18 February 2017 (UTC)
- Using phonetic respelling looks good to me. (It is identifikovat.) --Dan Polansky (talk) 20:06, 18 February 2017 (UTC)
- Oops, an English-speaker slip-up. — Eru·tuon 20:12, 18 February 2017 (UTC)
- Besides that the opposite case can happen as well: the prefix has a palatal and the root a non-palatal, such as protifašistický. Phonetic respelling might be a solution. --Jan Kameníček (talk) 22:15, 18 February 2017 (UTC)
osm
[edit]There are two possibilities how to pronounce the word osm (“eight”): either [osm] (with the syllable break [o.sm]) or [osum] ([o.sum]). The phoneme s usually does not change into z here (although it changes in words like fašismus ([faʃɪzmus]). The pronunciation rules of the Czech language are much less regular than it is said :-( --Jan Kameníček (talk) 01:30, 19 February 2017 (UTC)
The same applies for sedm, being prounounce either [sedm] or [sedum]. --Jan Kameníček (talk) 01:34, 19 February 2017 (UTC)
- Hmm, I guess I'm going to have to figure out a way to make the module output multiple transcriptions for this word. — Eru·tuon 01:51, 19 February 2017 (UTC)
- @Erutuon IMO, you'd better off providing multiple readings with respellings. The alternative readings are irregular. E.g. пятьдеся́т (pjatʹdesját). --Anatoli T. (обсудить/вклад) 02:04, 19 February 2017 (UTC)
- @Atitarev: Hmm, but in this case aren't the alternative pronunciations semi-regular (meaning that all words with syllabic m have an alternative pronunciation with um)? Not true for the alternative pronunciation of пятьдеся́т (pjatʹdesját). On the other hand, there are apparently few words with syllabic m, so it would not be very hard to add the alternative pronunciations manually. — Eru·tuon 02:09, 19 February 2017 (UTC)
- I suspect an additional vowel is inserted when there is a certain type of a consonant cluster in the final position, as in "osm" or "sedm". The optional sound could be inserted in brackets: [ˈsɛ.d(u)m]
- Since "-ismus" (and its inflections) is quite common, you may want to consider a small rule to convert "-ismus" to "-izmus". --Anatoli T. (обсудить/вклад) 02:17, 19 February 2017 (UTC)
- The basic rule is that s is pronounced [s], usually no matter of its surroundings. The exception are words borrowed from foreign languages (thus -ismus is pronounced [-izmus] including inflections, or krise is pronounced [krɪzɛ]), but this rule also has exceptions like agrese [agrɛsɛ]. The word kurs is pronounced with [s], but its inflections like kursu with [z]. Sometimes both pronunciations are possible, like diskuse ([dɪskusɛ] or [dɪskuzɛ]). For these reasons I suggest to keep basic pronunciation [s] by default and exceptions would have to be respelled. Clear cases like -ismus can have a sub-rule. It would be good if alternative versions were alsa enabled (not only for s x z, but for exceptions in other cases as well, like plakát (both [plakaːt] and [plagaːt]). --Jan Kameníček (talk) 09:59, 19 February 2017 (UTC)
- @Atitarev: Hmm, but in this case aren't the alternative pronunciations semi-regular (meaning that all words with syllabic m have an alternative pronunciation with um)? Not true for the alternative pronunciation of пятьдеся́т (pjatʹdesját). On the other hand, there are apparently few words with syllabic m, so it would not be very hard to add the alternative pronunciations manually. — Eru·tuon 02:09, 19 February 2017 (UTC)
- @Erutuon IMO, you'd better off providing multiple readings with respellings. The alternative readings are irregular. E.g. пятьдеся́т (pjatʹdesját). --Anatoli T. (обсудить/вклад) 02:04, 19 February 2017 (UTC)
Glottal stop
[edit]Glottal stop usually appears between a prefix ending with a vowel and a root ending with a vowel too, like in neobyčejný, or in compounds with the two parts ending and beginning with vowels, like in samoobsluha or jednooký. --Jan Kameníček (talk) 23:08, 24 February 2017 (UTC)
- If some words have vowels in hiatus that do not have a glottal stop between them, such cases with glottal stop must be distinguished by adding a hyphen to the respelling:
{{cs-IPA|ne-obyčejný}}
,{{cs-IPA|samo-obsluha}}
,{{cs-IPA|samo-obsluha}}
. Otherwise, I can add a rule that adds a glottal stop between any two vowels. — Eru·tuon 00:08, 25 February 2017 (UTC)- There are some words where there is no glottal stop between two vowels, like neon [nɛon]. On the other hand rarely there can be a glottal stop where one would not expect it, like bystrouška [bɪstroʔuʃka]. What I forgot to mention is, that sometimes there can be a glottal stop in a prefixed word or a compound even between a consonant and vowel, e. g. trojúhelník can be pronounced both [trojuːɦɛlɲiːk] or [trojʔuːɦɛlɲiːk] (which is another example why it would be good if the module enabled showing more pronunciation alternatives).
- It could be a good idea to place a character indicating the glottal stop into the respelling, but it would probably have to be something different from a hyphen, as hyphens are sometimes regular characters within some compound words, like česko-polský (“Czech-Polish”). --Jan Kameníček (talk) 16:18, 25 February 2017 (UTC)
- @Jan Kameníček: The question mark is used to represent the glottal stop in X-SAMPA. That would probably work, since question marks are not typically used in entry names on Wiktionary. — Eru·tuon 18:39, 25 February 2017 (UTC)
- Yes, sounds good. --Jan Kameníček (talk) 18:51, 25 February 2017 (UTC)
- @Jan.Kamenicek Do words like paranoik and/or transatlantický have glottal stops in them? The latter word is notated in SSJC as '[-s-a-ty-]', which I'm guessing indicates a glottal stop between the s and the a. paranoik similarly has '[-o-i-]' but I'm not sure if that just means the vowels are to be pronounced separately. Benwing2 (talk) 23:48, 15 May 2023 (UTC)
- Also words with anti- followed by a vowel like antioxidační, antiimigrační and antiamerický, do they have glottal stops or optional glottal stops (I'm guessing yes)? What about the prefix neo-, is there a glottal stop between the e and the o (I'm guessing no)? Benwing2 (talk) 00:11, 16 May 2023 (UTC)
- @Benwing2: As for paranoik, there should not be any glottal stop, as confirmed also in Akademický slovník cizích slov (1995), where the word is notated as [-noji-], which suggests the IPA transcription should be [paranojɪk]. As for various preposition + vowel words, they can be pronounced with a glottal stop like transatlantický [transʔatlantɪt͡skiː] or (in quicker or more casual speech) without it, i. e. [transatlantɪt͡skiː]. See also https://prirucka.ujc.cas.cz/?id=913 where they say that the glottal stop between a prefix and a vowel is recommended, but usually not compulsory. --Jan Kameníček (talk) 12:52, 20 May 2023 (UTC)
- @Jan.Kamenicek Thanks. In that case, do you think the module should automatically insert a /j/ between sequences of V + i/y? Currently it does not, but there are several words like naivní where it is manually inserted using respelling (in that word, it seems it's optional). Benwing2 (talk) 17:47, 20 May 2023 (UTC)
- @Benwing2: Well, difficult to say. If, then definitely with some exceptions:
- 1) V + i/y at the end of a word, like faraday [faradaj], Paraguay [paraɡuaɪ], koi [koj] (variety of carp). IMO no need to bother about combinations like "ii", "yy", "iy" or "yi", I cannot think of any at the end of a word.
- 2) V + i at the beginning of a word like aikido [ajkɪdo], eidetický [ɛjdɛtɪt͡skiː] or oidipovský [ˈojdɪpofskiː]. The only combination of V + y which I can think of is ayahuasca [ajavaska], so we do not have to bother about them. The same applies to "ii", "yy", "iy" or "yi" at the beginning of a word, they do not seem to exist.
- 3) prefixes ending with a V + i (not y), especially "ne-" (probably most frequent of these prefixes) like neidealizovat ([nɛʔɪdɛalɪzovat] or [nɛɪdɛalɪzovat]), but also "za-" like zaimponovat ([zaʔɪmponovat] or [zaɪmponovat]), "pře-" like přeinstalovat ([pr̝̊ɛʔɪnstalovat] or [pr̝̊ɛɪnstalovat]), "pro-" like proinvestovat ([proʔɪnvɛstovat] or [proɪnvɛstovat]), or "do-" like doizolovat ( [doʔɪzolovat] or [doɪzolovat]). I do not know any combination of a prefix + y.
- Other exceptions need to be respelled anyway, like doyen [doajɛn].
- As for naivní (and all related words like naivita, naiva, naivka, naivnost, naivně), I personally know only the pronunciation [-ajɪ-] and I do doubt that the alternative exists. --Jan Kameníček (talk) 19:15, 20 May 2023 (UTC)
- @Jan.Kamenicek Thank you. (1) and (2) are easy to implement; for (3) I need to maintain a list of prefixes, but that isn't an insurmountable issue (e.g. we do it already in the Russian pronunciation module). What are the vowel-ending prefixes in question? I assume ne-, na-, za-, pře-, pro-, do-, any others? One thing I could actually do is make the module automatically generate two outputs when it sees such a prefix followed by a vowel, one with a glottal stop, and the other with a hiatus. BTW in that case, naivní and related items would need respelling as najivní to avoid the glottal stop treatment. However, I think this might get too complicated; e.g. there are words like stereoizomerie that I assume should not have /j/, but stereo- is a pretty obscure prefix, and there are a ton of similar prefixes ending in -o. Benwing2 (talk) 20:06, 20 May 2023 (UTC)
- @Benwing2: I do not think we should apply it to all existing prefixes ending with a vowel, they are quite many, see Category:Czech prefixes. I think it is worth to apply it only to some more frequent ones which are likely to be combined with some vowel beginning words, and the rest leave for respelling when needed. The prefixes I suggest are ne-, na-, za-, pře-, pro-, do-, bio-, elektro-, foto-, hydro-, makro-, mikro-, mimo-, neo-, pseudo- and also stereo-. If you think they are too many and decide not to implement them, then at least ne- should be. I think there are more cases of the prefix na- + i then non-prefixed words beginning nai-, so leaving naivní and alike for respelling seems more convenient to me.
- I also have to apologize for saying something that was not correct: Above I mentioned that I cannot think of any words ending -ii in Czech. However, I forgot about forms of feminine nouns like Alexandrie, sépie and many others, whose sg. dative, accusative and locative end with -ii, always pronounced [-ɪjɪ]. Besides that, their pl. genitive forms end ií, pronounced [-ɪjiː]. -Jan Kameníček (talk) 22:03, 20 May 2023 (UTC)
- One more question, are glottal stops always optional? One thing I could do is, if a respelling with a glottal stop is encountered, generate two pronunciations, one with a glottal stop and one without it (where "without it" means a hiatus if the glottal stop is preceded and followed by a vowel). Benwing2 (talk) 20:07, 20 May 2023 (UTC)
- There is a strong tradition to codify everything in Czech language, including pronunciation, and a codified version is called "spisovný". As explained e. g. here, according to codified rules glottal stops are compulsory between non-syllable prepositions (just 4 in Czech: k, s, v and z) and words beginning with a vowel, but in reality they are sometimes not pronounced in quick careless speech. In all other cases that we discussed the glottal stop is recommended, but not compulsory, although it should be kept in rare cases where it makes difference in the meaning: proudit (to flow) from proud (current) is pronounced [prou̯ɟɪt] while proudit (to smoke something thoroughly) from pro- + udit (to smoke st) should be pronounced [proʔuɟɪt]. However, I personally know also the variant of the second case [prouɟɪt], i.e. with hiatus instead of glottal stop, although not with diphtong. This is probably connected with my place of residence: I live in Moravia (eastern part of Czechia) where glottal stops are more frequently omitted in careless speech than in Bohemia (western part of Czechia). --Jan Kameníček (talk) 22:03, 20 May 2023 (UTC)
- @Jan.Kamenicek Sounds good, I need to think a bit more about what symbols to use but your suggestion of recognizing the above prefixes specially sounds fine and isn't hard to implement. I'm not sure about making a special case for nai-, that might be hard to remember. For glottal stops, it sounds like I might need to have two symbols, one indicating a compulsory glottal stop (maybe
?
) and another indicating an optional glottal stop (maybe??
). Most of the time the latter shouldn't be needed due to the auto-handling of the above prefixes, but it will be needed with the more obscure prefixes. We will probably also need a way of indicating a hiatus without a glottal stop (.
between vowels) and of explicitly indicating a diphthong au, eu in words like nauruský, neuritida with [au̯], [ɛu̯] (vs. naučit, neurčitek with /aʔu/, /ɛʔu/). For the latter, I'm thinking_
or+
, so you'd maybe writena_uruský
andne_urityda
(or just[a_u]
,[e_u,ty]
). For ou, I think it's best to say that at least in dou- and prou- the ou defaults to being interpreted as a diphthong, since ou as a diphthong is so common. Finally I may change the glottal stop symbol from?
to an actual glottal stop symbolʔ
, unless you think that is too hard to type (on my Mac, it's easy to type with the ABC Extended keyboard layout, using Option-Shift-Dot; I'm not sure about Windows). Benwing2 (talk) 23:56, 20 May 2023 (UTC)- Well, you asked about combinations V +i/y and so that was what I had in mind when I suggested the prefixes. Making it more general brings more problems to solve:
1) na- + V: On the one hand we have prefixed words like naaranžovat, naučný, naoko, and on the other hand non-prefixed words like nautika, nauzea, Naomi. It seems to me that the first ones prevail, so their transcription can be preferred and diphtongs in the latter group could be retyped.
2) ne- + V: There are really many words beginning neur- [nɛu̯r-] or neo- [nɛo-] and these should probably be implemented as default variants, while exceptions like neurvalý (ne- + urvalý) or neokázalý (ne- + okázalý) can be retyped. With the rest we can make it the opposite: the prefixed version as default, with a few exceptions like "neapolitánec" that can be retyped.
3) I agree that prou- and dou- in the beginning should be transcribed as [prou̯-] and [dou̯-] by default, as the combinations of prefixes pro-/do- + u are much less frequent. With other vowels it should be OK to consider pro- and do- to be prefixes.
The rest of the suggested prefixes should be OK with all vowels. --Jan Kameníček (talk) 01:31, 21 May 2023 (UTC)- @Benwing2: Now, after the scope was broadened to all prefix+V combinations (not only V+i/y), I have had a look at it once more, and suggest to add some more prefixes: auto-, deseti-, devíti-, dvou-, euro-, kontra-, mezi-, mnoho-, osmi-, pěti-, polo-, pra-, proti-, pře-, při-, samo-, sebe-, sedmi-, sou-, spolu-, šesti-, tele-, termo-, tří-, ultra-, vele-, vše-, vy-, zne-, znovu-. None of these should ber problematic in any way. What is problematic are most of prefixes ending with a consonant, because they are difficult to be distinguished automatically, like nadúrovňový (nad- + úrovňový, with an optional glottal stop) and nadaný (na- + daný, without a glottal stop). The only one which seems unproblematic to me is sub- (e.g. in subatomární and others), so it might be added too. --Jan Kameníček (talk) 14:16, 21 May 2023 (UTC)
- Well, you asked about combinations V +i/y and so that was what I had in mind when I suggested the prefixes. Making it more general brings more problems to solve:
- BTW since you say "careless speech" for variants without glottal stop, I assume those are nonstandard and shouldn't be listed? (Or at least, not by default, and if listed, they need a qualifier saying something like "nonstandard, fast speech".) Benwing2 (talk) 23:59, 20 May 2023 (UTC)
- Really non-standard is only 1) omitting glottal stop in combinations of non-syllable prepositions with words beginning with a vowel and 2) omitting glottal stop where it may cause confusion (like proudit which means something different when pronounced [proʔuɟɪt] than when pronounced [prou̯ɟɪt]). In all other cases it is only optional, though recommended. In reality, the quicker people speak, the less they use it, but some people do not use it when speaking slowly either. --Jan Kameníček (talk) 01:31, 21 May 2023 (UTC)
- @Jan.Kamenicek Sounds good, I need to think a bit more about what symbols to use but your suggestion of recognizing the above prefixes specially sounds fine and isn't hard to implement. I'm not sure about making a special case for nai-, that might be hard to remember. For glottal stops, it sounds like I might need to have two symbols, one indicating a compulsory glottal stop (maybe
- There is a strong tradition to codify everything in Czech language, including pronunciation, and a codified version is called "spisovný". As explained e. g. here, according to codified rules glottal stops are compulsory between non-syllable prepositions (just 4 in Czech: k, s, v and z) and words beginning with a vowel, but in reality they are sometimes not pronounced in quick careless speech. In all other cases that we discussed the glottal stop is recommended, but not compulsory, although it should be kept in rare cases where it makes difference in the meaning: proudit (to flow) from proud (current) is pronounced [prou̯ɟɪt] while proudit (to smoke something thoroughly) from pro- + udit (to smoke st) should be pronounced [proʔuɟɪt]. However, I personally know also the variant of the second case [prouɟɪt], i.e. with hiatus instead of glottal stop, although not with diphtong. This is probably connected with my place of residence: I live in Moravia (eastern part of Czechia) where glottal stops are more frequently omitted in careless speech than in Bohemia (western part of Czechia). --Jan Kameníček (talk) 22:03, 20 May 2023 (UTC)
- @Jan.Kamenicek Thank you. (1) and (2) are easy to implement; for (3) I need to maintain a list of prefixes, but that isn't an insurmountable issue (e.g. we do it already in the Russian pronunciation module). What are the vowel-ending prefixes in question? I assume ne-, na-, za-, pře-, pro-, do-, any others? One thing I could actually do is make the module automatically generate two outputs when it sees such a prefix followed by a vowel, one with a glottal stop, and the other with a hiatus. BTW in that case, naivní and related items would need respelling as najivní to avoid the glottal stop treatment. However, I think this might get too complicated; e.g. there are words like stereoizomerie that I assume should not have /j/, but stereo- is a pretty obscure prefix, and there are a ton of similar prefixes ending in -o. Benwing2 (talk) 20:06, 20 May 2023 (UTC)
- @Jan.Kamenicek Thanks. In that case, do you think the module should automatically insert a /j/ between sequences of V + i/y? Currently it does not, but there are several words like naivní where it is manually inserted using respelling (in that word, it seems it's optional). Benwing2 (talk) 17:47, 20 May 2023 (UTC)
- @Benwing2: As for paranoik, there should not be any glottal stop, as confirmed also in Akademický slovník cizích slov (1995), where the word is notated as [-noji-], which suggests the IPA transcription should be [paranojɪk]. As for various preposition + vowel words, they can be pronounced with a glottal stop like transatlantický [transʔatlantɪt͡skiː] or (in quicker or more casual speech) without it, i. e. [transatlantɪt͡skiː]. See also https://prirucka.ujc.cas.cz/?id=913 where they say that the glottal stop between a prefix and a vowel is recommended, but usually not compulsory. --Jan Kameníček (talk) 12:52, 20 May 2023 (UTC)
- @Jan Kameníček: The question mark is used to represent the glottal stop in X-SAMPA. That would probably work, since question marks are not typically used in entry names on Wiktionary. — Eru·tuon 18:39, 25 February 2017 (UTC)
Multi-word expressions
[edit]It can be quite tricky to place correctly the stress in multi-word expressions, as the stress does not always have to be placed on the same places as when the words are pronounced separately. For example one-syllable prepositions before a noun usually take over the stress from the first syllable of the noun, such as in na shledanou, which can be pronounced either [ˈnasxlɛdanou̯] or [ˈnazɦlɛdanou̯] (I have corrected the pronunciation section of this entry). However, I think there may be exceptions; I am really not sure specifically about expressions with the glottal stop like nad Ohří (I personally feel the stress on the first syllable of Ohří, although the rule says it should be on nad). --Jan Kameníček (talk) 20:26, 25 February 2017 (UTC)
Syllable categorization
[edit]When writing for example {{IPA|/ˈlɪʃka/|lang=cs}}
, the entry is categorized according to the number of the syllables, which this module does not do. Could this feature be implemented too? --Jan Kameníček (talk) 15:56, 1 March 2017 (UTC)
- @Jan Kameníček: Categorization should already be done automatically by Module:IPA, which is used by this module (and in turn uses a function from Module:syllables). I'll see if I can figure out why it isn't working. — Eru·tuon 00:31, 2 March 2017 (UTC)
- Oh, the reason is that the transcription is phonetic, and Module:IPA only adds categories for phonemic transcriptions. — Eru·tuon 00:33, 2 March 2017 (UTC)
Voiced consonants in multi-word expressions
[edit]@Erutuon: The template generates correctly pronunciation of words like chladný (IPA(key): [ˈxladniː]) when they are separate, but it is wrong when the word is a part of a phrase, like chladný jako led (IPA(key): [xlatniː jako lɛt]). Could it be fixed, please? --Jan Kameníček (talk) 18:33, 11 February 2018 (UTC)
- @Jan.Kamenicek: Fixed! — Eru·tuon 21:10, 11 February 2018 (UTC)
- Perfect, thanks a lot. --Jan Kameníček (talk) 21:27, 11 February 2018 (UTC)
@Erutuon: I have also noticed that the template devoices voiced prepositions, which is OK if the following word starts with a voiceless consonant, like v práci (IPA(key): [ˈfpraːt͡sɪ]), but should not happen if the following word starts with a voiced consonant, like v rukávě (IPA(key): [f rukaːvjɛ], correctly (IPA(key): [ˈvrukaːvjɛ]). It can be worked around if the expression is respelled as one word, but it would probably be better if it could be fixed, because such situtation may appear quite often. --Jan Kameníček (talk) 19:31, 12 February 2018 (UTC)
- @Jan.Kamenicek: Okay, that is fixed, though I don't know if my solution will always be accurate. — Eru·tuon 20:36, 12 February 2018 (UTC)
- Thanks a lot! I use the template quite often (as it is really useful and timesaving!) so if I find something wrong, I will let you know. --Jan Kameníček (talk) 20:40, 12 February 2018 (UTC)
Guinea, Rovníková Guinea, etc.
[edit]@Benwing2, @Solvyn: Hi. Different sources give two pronunciations of Guinea and derivations. Respelled as "Ginea" [ˈɡɪnɛa] or "Gvínea" [ˈɡviːnɛa]. I provided both. Anatoli T. (обсудить/вклад) 01:44, 13 April 2023 (UTC)
- The latest is Papua-Nová Guinea. Anatoli T. (обсудить/вклад) 02:14, 13 April 2023 (UTC)
fixing up this module
[edit]@Erutuon, TomášPolonec, Jan.Kamenicek (Notifying Solvyn, Vininn126, Atitarev, Hergilei, Zhnka): I am going to be doing a bit of work on this module. I have some questions:
- Why is stress not generated at all in multiword expressions? This seems wrong.
- In sequences of i or y followed by a vowel, the respellings in IJP regularly insert a /j/. Often, manual respellings add this. Is there any reason not to automatically include this? An alternative is to use the notation /(j)/, if it's not always present depending on speaking speed.
- In sequences of ex- + vowel or exh-, it appears the x is normally pronounced /gz/. I am going to make this the default, any objections?
- There is currently no easy way of forcing a double consonant at prefix boundaries. I am planning on adding notation for this, probably a dot, so bezzemek could be respelled
bez.zemek
and nadtřída could be respellednad.třída
. Thoughts? - How does rhyme in Czech work? Is it always the last two syllables?
Benwing2 (talk) 23:57, 10 May 2023 (UTC)
- @Benwing2: On #2. You can have Persie respelled as "Perzije" but also Vietnam respelled as "Vjetnam".
- Also, I wanted to highlight, in multiword terms with a, like Bosna a Hercegovina, "a" is merged with the following word: [bosna aɦɛrt͡sɛɡovɪna]. Not sure if it's right. Anatoli T. (обсудить/вклад) 00:03, 11 May 2023 (UTC)
- When I was working on the module, I simply didn't attempt to figure out multi-word stress and specifically how to handle clitics. That was more complexity than I wanted to figure out. — Eru·tuon 01:15, 11 May 2023 (UTC)
- @Erutuon I see. I imagine it's probably not that hard as I did something similar for Russian; can you point me to any resources you know of concerning clitic stress placement? Benwing2 (talk) 01:33, 11 May 2023 (UTC)
- See e. g. here (in Czech). Summary: Every word has the primary stress on the first syllable, but some one-syllable words (especially conjunctions, pronouns, auxilliaries) tend to lose stress inside sentences ('Viděl ho 'zblízka) , although full meaning words (typically nouns, adjectives) keep it ('Viděl 'lva 'zblízka). However, when there are more one-syllable words one after the other, some of them keep the stress (e. g. 'bohužel 'je to jen 'zklamání). Prepositions followed by words whose inflection is determined by the preposition (typically nouns or adjectives) usually take stress while the following word loses it ('Rožnov 'pod Radhoštěm). Exceptions: when followed by long words or words difficult to pronounce ('přijít do 'nevyvětratelné 'místnosti), or when the speaker feels the need to stress the word after the preposition. When the preposition is followed by an indeclinable word, the word keeps the stress and the preposition loses it ('utíkal za 'rychle 'běžící 'ženou). However, these rules do not apply to one-syllable secondary prepositions created from originally longer words (skrz, dle, ...) which do not take stress. --Jan Kameníček (talk) 17:07, 11 May 2023 (UTC)
- @Erutuon I see. I imagine it's probably not that hard as I did something similar for Russian; can you point me to any resources you know of concerning clitic stress placement? Benwing2 (talk) 01:33, 11 May 2023 (UTC)
- 1) I would agree.
- 2-3) I can't help.
- 4) Seems good.
- 5) I believe that this practice was borrowed from Polish, which has penultimate stress, but Czech has initial stress; however I am not well-versed in Czech poetry (pun intended), so perhaps it is different. Vininn126 (talk) 09:09, 11 May 2023 (UTC)
- Ad 2) Generally true, but not when a word begins with these vowels, when "i" is just pronounced as [j]: yeti [jeti], yard [jard], iatrogenní [jatrogɛɲiː], yankee [jɛnkiː], yorkshirský [jorkʃɪrskiː], ionizovat [jonɪzovat]. The only exceptions from this I can think of is the interjection iá, pronounced [ɪʔaː]. Also when ie or ye are preceded by a vowel, then the i/y is pronounced [j], like in biedermeier [biːdr̩majɛr] or Zeyer [zɛjɛr]. Other exceptions include many (not all) words of foreing origin, besides the above discussed Vietnam [vjɛtnam] (and all its derivatives like Vietnamec, Vietnamka, vietnamky (kind of shoes), vietnamský, vietnamsky or vietnamština) I can also think of e. g. pierot [pɪɛrot], tiebreak [tajbrɛjk], diesel [diːzl̩], wiesbadenský [viːzbaːdɛnskiː], or biedermeier [biːdr̩majɛr] etc. Other exceptions are formed by adding a prefix ending with i/y before a vowel, like dielektrikum [dɪʔɛlektrɪkum], antievropský [antɪʔɛvropskiː], antialergikum [antɪʔalɛrgikum], vyexpedovat [vɪʔɛkspedovat] or vyobcovat [vɪʔobt͡sovat].
- Ad 3) Agree.
- Ad 4) Well, sometimes there are two ways of pronunciation possible in these cases, like oddaný ([oddaniː] or [odaniː]), rozzloben ([rozzlobɛn] or [rozlobɛn]), rozsudek ([rossudɛk] or [rosudɛk]). Besides, double consonants accepting two ways of pronunciation may appear in some inflected words like hrdliččin ([hr̩dlɪt͡ʃt͡ʃɪn] or [hr̩dlɪt͡ʃɪn], dative of hrdlička), but when the pronunciation is important to distinguish it from some other word, then there is only one possibility, like racci [rat͡st͡sɪ] (plural of racek), to be distinguished from raci [rat͡sɪ] (plural of rak) or pecce [pɛt͡st͡sɛ] (dative of pecka), to be distinguished from pece [pɛt͡sɛ] (plural of pec). Double consonants are also pronounced separately in compounds like čtvrttón [t͡ʃtvr̩ttoːn], combinations with conjunction -li like přišel-li [pr̝̊ɪʃɛllɪ] and in plural imperatives like uvědomme [uvjɛdommɛ]. See also here. I am not sure whether it is needed to add the dot between the doubled consonants. --Jan Kameníček (talk) 19:57, 11 May 2023 (UTC)
- @Jan.Kamenicek Thank you for the details! For (2) it sounds like it will work fine to add the /j/ in i + vowel because the various exceptions you give almost invariably need respelling in any case. For (4) the current practice is to reduce all double consonants to single; I am thinking of keeping this practice but allow you to override it by putting a dot between the consonants. BTW with a pronunciation like [pɛt͡st͡sɛ] do you actually mean what you've written (i.e. you have two affricate releases, and you hear the /s/ twice) or do you mean [pɛtt͡sɛ] with the closure held extra long but only one release? As for (1), it's difficult to know automatically whether e.g. a word is declinable or not. My plan is to do the following: (a) The module will have a list of prepositions, and by default, prepositions will get the stress and the following word will not; but if the user has explicitly marked stress on the following word using
"
, the preposition will remain unstressed. (b) The module will have a list of clitics (jsem, ho, etc.) and will simply leave them unstressed by default. You can override this by explicitly stressing words with"
. I am not sure if it's needed to have a way of forcing a word to be unstressed, but if so I will add it. Thoughts? Benwing2 (talk) 02:08, 12 May 2023 (UTC)- Ad 1) Sounds good to me, I only forgot to mention that what I wrote about prepositions applied only to one-syllable prepositions. Other prepositions are simply unstressed. Having a possibility to force a word to be unstressed might be useful, unless we are sure that the list of clitics is complete.
- Ad 2) Yes, I think [j] can be inserted between i/y and a vowel by default, but if it is possible, I suggest including some of the mentioned exceptions, like i+vowel in the beginning of a word pronounced as j+vowel (and maybe also words beginning "vie-" pronounced as [vjɛ-], but they are not many, so it might not be necessary).
- As for pecce, it is really pronounced [pɛt͡st͡sɛ], I have also uploaded audio of its pronunciation:
. --Jan Kameníček (talk) 16:36, 12 May 2023 (UTC)Audio: (file) - @Jan.Kamenicek Thanks again for your comments. I implemented the auto-addition of /j/ after i/y + vowel, the special handling of exV- and exh-, and defaulting words beginning with iV- and yV- to use /j/ for the initial letter. The other changes are still in progress. Benwing2 (talk) 23:50, 15 May 2023 (UTC)
- @Jan.Kamenicek Thank you for the details! For (2) it sounds like it will work fine to add the /j/ in i + vowel because the various exceptions you give almost invariably need respelling in any case. For (4) the current practice is to reduce all double consonants to single; I am thinking of keeping this practice but allow you to override it by putting a dot between the consonants. BTW with a pronunciation like [pɛt͡st͡sɛ] do you actually mean what you've written (i.e. you have two affricate releases, and you hear the /s/ twice) or do you mean [pɛtt͡sɛ] with the closure held extra long but only one release? As for (1), it's difficult to know automatically whether e.g. a word is declinable or not. My plan is to do the following: (a) The module will have a list of prepositions, and by default, prepositions will get the stress and the following word will not; but if the user has explicitly marked stress on the following word using
- (Notifying Solvyn, Atitarev, Benwing2, Hergilei, Zhnka, Jan.Kamenicek): I was looking at some Czech poetry and it seems that rhymes are not based on starting from the first word, but I wasn't able to discern what syllable it was. Sometimes it seemed antepenultimate or sometimes penultimate, can anyone weigh in? Vininn126 (talk) 16:49, 28 August 2023 (UTC)
- I don't really know much about poetry. For rhymes, see here and here. Solvyn (talk) 05:10, 29 August 2023 (UTC)
- @Solvyn This first website looks really useful. I'm not sure I 100% understand, but from what I can gleam it is determined by the position of the accent. Am I wrong? Vininn126 (talk) 10:12, 29 August 2023 (UTC)
- I don't really know much about poetry. For rhymes, see here and here. Solvyn (talk) 05:10, 29 August 2023 (UTC)
@Jan.Kamenicek: Hi. Did I respell the word right? Pls fix accordingly. Anatoli T. (обсудить/вклад) 01:14, 15 May 2023 (UTC)
- imo OK. --Jan Kameníček (talk) 16:07, 15 May 2023 (UTC)
substitution notation
[edit]@Atitarev, Jan.Kamenicek I added "substitution notation", which is borrowed from French, Italian etc. An example is {{cs-IPA|[sur:sir]}}
for surrealista, which says to replace sur with sir in the respelling but keep everything else the same. This is especially useful with long words and multiword expressions to avoid having to repeat the whole respelling, e.g. for Fibonacciho posloupnost (“Fibonacci sequence”) you can just write {{cs-IPA|[cc:č]}}
. As a further shortcut, if you write something like {{cs-IPA|[ty]}}
for mezinárodní fonetická abeceda (“International Phonetic Alphabet”), it is equivalent to writing {{cs-IPA|[ti:ty]}}
. In these single-part substitutions, you can write y to indicate an i -> y substitution; you can write a long vowel to indicate a short -> long vowel substitution; and you can write z to indicate an s -> z substitution. For example, ezofagitida (“oesophagitis”) could be written {{cs-IPA|[ty]|[tý]}}
, which is equivalent to {{cs-IPA|[ti:ty]|[ti:tý]}}
, which in turn is equivalent to writing {{cs-IPA|ezofagityda|ezofagitýda}}
(since you can now put multiple respellings as in a single call to {{cs-IPA}}
if there are multiple possibilities). You can put more than one substitution between brackets, e.g. for wikipedistka you could write {{cs-IPA|[w:v,dy]}}
. Benwing2 (talk) 00:29, 16 May 2023 (UTC)
- @Benwing2: Thanks. I just made a complex substitution on diesel:
{{cs-IPA|[die:dý,sel:zl]}}
, possibly an overkill. Previously I just used{{cs-IPA|dýzl}}
. Anatoli T. (обсудить/вклад) 00:42, 16 May 2023 (UTC)- @Atitarev Yeah probably not needed in this case; I tend to use this form in long words. Benwing2 (talk) 01:52, 16 May 2023 (UTC)
- @Benwing2: Agreed. Consider this an exercise in this new convention :) Anatoli T. (обсудить/вклад) 01:55, 16 May 2023 (UTC)
- @Benwing2: Maybe there could be added a parameter suggesting that the word is of foreign origin and so all -di-, -ti- or -ni- would be automatically transcribed at [-dɪ-], [-tɪ-] or [-nɪ-] instead of [-ɟɪ-], [-cɪ-] or [-ɲɪ-]. --Jan Kameníček (talk) 13:06, 20 May 2023 (UTC)
- @Jan.Kamenicek I think that was actually the original plan (see above for 2017), but then people pointed to the existence of words like protifašistický and antihrdina that mix foreign and non-foreign elements (and there are also multiword expressions). But potentially both methods could coexist. Benwing2 (talk) 17:44, 20 May 2023 (UTC)
- Ah, true. In fact it was me who pointed protifašistický out. I have completely forgotten about that :-) --Jan Kameníček (talk) 19:28, 20 May 2023 (UTC)
- @Jan.Kamenicek I think that was actually the original plan (see above for 2017), but then people pointed to the existence of words like protifašistický and antihrdina that mix foreign and non-foreign elements (and there are also multiword expressions). But potentially both methods could coexist. Benwing2 (talk) 17:44, 20 May 2023 (UTC)
- @Benwing2: Maybe there could be added a parameter suggesting that the word is of foreign origin and so all -di-, -ti- or -ni- would be automatically transcribed at [-dɪ-], [-tɪ-] or [-nɪ-] instead of [-ɟɪ-], [-cɪ-] or [-ɲɪ-]. --Jan Kameníček (talk) 13:06, 20 May 2023 (UTC)
- @Benwing2: Agreed. Consider this an exercise in this new convention :) Anatoli T. (обсудить/вклад) 01:55, 16 May 2023 (UTC)
- @Atitarev Yeah probably not needed in this case; I tend to use this form in long words. Benwing2 (talk) 01:52, 16 May 2023 (UTC)