User talk:Thadh/sandbox

Partitive for valkea-type words

Latest comment: 2 years ago28 comments2 people in discussion

Hi @Thadh, there is a difficulty with the "valkea" example: in Ala-Laukaa these words get ta partitive. Thus this probably won't fit the automatic pronunciation generation. KirillW (talk) 17:38, 26 August 2022 (UTC)Reply

@KirillW: that's okay, we can add those manually, I'm more concerned with whether the reduction overall works or not. Thadh (talk) 17:47, 26 August 2022 (UTC)Reply

@Thadh, I understand your intention, but I'm not sure we can come up with a real-word example of such sequence. I'll discuss this with Mehmet tomorrow. KirillW (talk) 17:59, 26 August 2022 (UTC)Reply

@Thadh actually there are multiple examples of pronunciation for valkiata and valken in Kuznetsova's «Phonological systems...» on page 221. Let me know if additional input is needed. KirillW (talk) 19:36, 26 August 2022 (UTC)Reply

@KirillW: I feel like you have misunderstood my question, I wasn't talking about any individual example, I was just wondering if the reduction logic as-is now in the module works correctly and displays correct phonetic Ala-Laukaa IPA values. You can try it out yourself, if you have any examples that you want to check. Thadh (talk) 00:32, 27 August 2022 (UTC)Reply

@Thadh, I'm a bit confused how do we discuss the logic if not based on the individual examples :) I'm happy to contribute, but what would be a better way to do it? KirillW (talk) 08:44, 27 August 2022 (UTC)Reply

@KirillW: Well, consider the /phonemic/ pronunciation hypothetical and compare the [phonetic] pronunciation with whether these words would be pronounced that way. If you're more comfortable with checking words that you know, you can create invocations using

{{T:User:Surjection/izh-pronunciation|A=Lower Luga pronunciation|S=Soikkola pronunciation (you can also ignore this)|title=word in kirjakeeli}}

. I'm purely concerned with whether the module works at the moment, adding the appropriate pronunciation per entry will be an issue for later. Thadh (talk) 09:54, 27 August 2022 (UTC)Reply

Ok, I must have missed what the module actually does! I first assumed it's for generating both phonemic and phonetic, rather than generating phonetic from phonemic KirillW (talk) 09:13, 28 August 2022 (UTC)Reply

@KirillW: it generates both, but it needs a respelling to generate phonemic pronunciation most of the time. Thadh (talk) 11:40, 28 August 2022 (UTC)Reply

@Thadh, in the last two weeks I've managed only checking on nasals and palatalization.

According to Kuznetsova (p. 31) ŋ occurs only before k and g. So apenikka (from you tests) should be [ˈɑpe̞ˌnʲikː] (unless I'm reading the IPA character wrongly).

ng should become ŋg — on p. 242 it's mentioned that ng becomes ŋŋ only in Hevaha dialect.

dj becomes dʲ: apparently only in Vadja and this could be considered a loanword. So the "default" mode should be adjala [ˈɑdjəɫ]. There's no good summary for that and we need to check more examples with the speakers.

KirillW (talk) 20:47, 4 September 2022 (UTC)Reply

@KirillW: Thaks for the responses so far!

<ɲ> is not the same thing as <ŋ> ^_^; The former is a palatal nasal (this is also how нь is most commonly pronounced in Russian), the latter is a velar one.
Oh, I didn't know that! Thanks for the tip, I seem to have remembered it wrongly.
Right, thank you for the fix! I'll get on fixing it

I'll be waiting eagerly for a follow-up! :) Thadh (talk) 20:57, 4 September 2022 (UTC)Reply

@Thadh, I've actually meant ɲ, but you've answered my question anyway. And to the point:

ia/ea is usually being contracted into e in the West and South, while in the North and East it doesn't seem to be contracted. Contraction ia > i Kuznetsova calls "Finnish" type (indeed popular in "common" spoken Finnish) and it seems to be the least common (see p. 229 and p. 231) — could be Finnish influence.
Similarly ua/oa and yä/öä should become o and ö in the West and South, but apart from it mentioned on p. 213 (4.18.6, with a single example of räätyä > räätö), I couldn't find any supportive material for that so far.
Diphthongs with i are reducted as a whole (p. 107). E.g. puutui is currently [ˈpuːtŭi̯], but should be [ˈpuːtŭ], keltaisen is [ˈke̞ltəis̠e̞n], should be [ˈke̞ltəs̠e̞n].

KirillW (talk) 20:31, 12 September 2022 (UTC)Reply

Thanks, we'll sort it out. Keep them coming! Thadh (talk) 07:09, 13 September 2022 (UTC)Reply

@KirillW: Could you just quickly give me a thubs up if it's fine, so we can go ahead and start implementing the template on pages? :) Thadh (talk) 20:43, 8 October 2022 (UTC)Reply

@Thadh, do I have a couple more days to test with the some examples I had in mind? KirillW (talk) 20:10, 9 October 2022 (UTC)Reply

@KirillW: Of course you do! Take as much time as you need, just make sure to let me know when you're ready. Thadh (talk) 20:35, 9 October 2022 (UTC)Reply

@Thadh, I've done with my test cases. This module is indeed a great piece of work!

I observed that the reduction in the end of the word is modeled as "complete". For the case when i is dropped, it leaves the preceding consonant palatalized, this is especially pronounced with l and t (see Kuznetsova, 4.20). Keel, lust, puol and luot would sound really weird to me.
At the same time it looks like that there's no case when a vowel is dropped completely in the middle of the word. On one hand that would be in line with dropping terminal vowels, but on the other hand that adds more complication (situations where a vowel can't be dropped should be considered separately).
When using the module with a proper name (e.g. Petteri — Saint Petersburg) an error message is rendered for the uppercase character.
It seems that some compound words have what should have been a secondary stress as a primary stress. At least one word is known for sure: paráikaa. We'll need to check if any other compounds (e.g. maailma, maamuna, kanamuna) behave the same way.
Ma-infinitive (3rd) ending needs additional check. For instance ostamaa gives [ˈo̞s̠təmɑ], but kuuntelemaa gives [ˈkuːntəˌle̞mɑː]. This is expected from the rules, but I don't remember if this actually so. Also I'm not sure if it's a real-world problem, but since you had not only "dictionary" forms in your test cases, I tried this too.

KirillW (talk) 20:00, 10 October 2022 (UTC)Reply

@KirillW:

Thanks for spotting the first one - I knew that, but I seem to have forgotten to tell Surjection about this when designing the module.
If a word consistently drops a non-final vowel (are there any such words?), these could potentially be re-written. I'm not quite sure how to describe Ingrian phonotactics to be honest, but if you know an easy rule that could be used to recognise permitted and disallowed clusters, I'm sure we can manage incorporating it into the module.
AFAIK, all Ingrian grammars and Konkova alike describe paraikaa as the only word in Ingrian that does not have initial stress. I was going to handle this word manually, also considering the fact "praikaa" is a valid pronunciation.
I'll be waiting eagerly!

Thanks for the checks! Thadh (talk) 20:25, 10 October 2022 (UTC)Reply

Oh, actually, never mind on the first one, I just remembered that we decided to make pre-vocalic palatalisation manual, so you actually need to write |A=keel'i. This is because of the whole ee-ii thing where syyvvä doesn't have a palatal s-, but syy does. Thadh (talk) 20:29, 10 October 2022 (UTC)Reply

@Thadh, I've reread 2.15.4 of Kuznetsova's thesis and listened to some recordings of one of the most hardline "reductionists" (ND from Vanakylä). And to me it now seems that it's more about the sound rather than position: a/ä, e and (to less extent) i are dropped more often, while o/ö and u/y are mostly preserved. With that we could consider the following:

Complete vowel drop should happen only for a, ä and e.
i might be dropped with the automatic palatalization (but I'm not sure what you mean by "ee-ii" thing, so might be missing something).
o, ö, u and y should be transcribed with something like ŏ̥, ø̥̆, ŭ̥, y̥̆ (there's more breathing out in this sound rather than voice, here's an example: https://drive.google.com/file/d/1iN2kSepEgN4eETUWBlTYh3o73tGjxjur/view?usp=sharing).
The rules for dropping vowels inside the word for 3+ syllable words are the same as for the word end (including the example of p(a)raikaa), but there are cases where this is not possible, because the result is "unpronounceable". This poses to me the biggest problem as there's apparently no summary for that. From the examples it looks like that it doesn't work with 3+ consonants in a row and some pairs like mk and kt.

KirillW (talk) 09:39, 11 October 2022 (UTC)Reply

┌────────────────────────────────────────────────────────────────────────────────────────────────────┘ @KirillW: By ee-ii I mean the effect that the pairs ee-ii, oo-uu and öö-yy are in free variation, but palatalisation primarily (only?) happens before words that originally had i or y: So, syyvvä (< söövvä) doesn't have a palatalised s-, while syy (< syy) does. This was described in Kuznetsova if I'm not mistaken.

paraikaa is special, because the -a- also drops in Soikkola, where there is no widespread vowel reduction, certainly not in non-final positions. Do you think you may come up with a summary at some point, or should we just make word-internal dropping manual (e.g. by giving the vowel in brackets like so: |A=tiit(ä)vät)? The point about o, ö, u, y is doable. Thadh (talk) 12:52, 11 October 2022 (UTC)Reply

@Thadh,

I couldn't find the description you're referring to, but I can see this being an obstacle for the automation. Do I understand correctly that this is the reason for no automatic platlization of any consonants followed by i? Like CiV (where C can be t, l, s and, probably, n): astia [ˈɑs̠tʲe̞], pääsiä [ˈpæːs̠ʲe̞], assia [ˈɑs̠ʲːe̞], helliä [ˈhe̞lʲːe̞] (btw the latter two to me shows some inconsistent spelling — could be easier to have asia and heliä read in a way similar to paljo).

You're absolutely right about paraikaa — it doesn't even look similar. We might try coming up with an initial version of the rules for dropping vowels, but could we launch without this? I feel it requires effort to define what the "first long syllable" and "unpronounceable syllable" are.

Anyway I feel an option to show a vowel in brackets (as used in English transcriptions) could be a good way to show vowel's tendency to complete reduction.

KirillW (talk) 17:35, 13 October 2022 (UTC)Reply

@KirillW: See page 212 in Kuznecova: Соответственно, в тех случаях, когда i или ü являются новообразованными фонемами, возникшими в результате повышения или дифтонгизации *ȫ, *ē (см. пп. 3.3.2 и далее), палатализация t’ : t ̄ ’ перед ними непоследовательна или вовсе отсутствует даже во многих таких идиолектах, где она последовательна перед исконными i, ī, ü, ǖ. - so yes, this is why there is no automatic palatalisation in the module, because definining what is or isn't an original ii/yy is a hastle.

Launching an unfinished version only makes sense if the imput (so, the parameters of the template) won't be changed after we finish it. So, if you believe that word-internal vowel deletion can be automated, it makes sense to do that before the launch or to launch with the current vowel reduction rules, but using the brackets now when we are aiming towards complete automation in the future would be troublesome. Thadh (talk) 17:52, 13 October 2022 (UTC)Reply

@Thadh: thanks for pointing that out!

Correct me if I'm wrong: I though that the template rendered each time (bar any caching). In this case if we enter the pronunciation data with the word-internal vowels in place and the current version of the rules, we will get it "almost" correct (with vowels which could have been actually dropped). When we update the template later on with the rules for dropping the word-internal vowels, we'll get more precise transcription automatically. Isn't that the case? KirillW (talk) 18:33, 13 October 2022 (UTC)Reply

@KirillW: Yes, exactly. So we may use the current rules (I can even ask Surjection to fix word-final -u, y, o, ö in the meantime), and we would have the droppable vowels in place until we come up with a way to detect droppability. Thadh (talk) 18:42, 13 October 2022 (UTC)Reply

@Thadh, on kuuntelemaa. Currently it's rendered as [ˈkuːntəˌle̞mɑː], but should be [ˈkuːntəˌle̞mːɑ]. In this case the same logic applies as in kanaa /ˈkɑnɑː/, [ˈkɑnːɑ]. KirillW (talk) 19:34, 18 October 2022 (UTC)Reply

@KirillW: You're absolutely right, Surjection worked his magic and fixed it :) Thadh (talk) 19:52, 18 October 2022 (UTC)Reply

┌────────────────────────────────────────────────────────────────────────────────────────────────────┘ @KirillW The template is now live, I've even added it in lemmas up to the letter b. If you can find the time, could you make sure there are no Laukaa inflectional things I've missed (like what you've mentioned above about valkia-valkiata)? Thanks. Thadh (talk) 23:44, 29 October 2022 (UTC)Reply

So'o and see

Latest comment: 2 years ago2 comments2 people in discussion

@Thadh, I'm not sure I understand the intent with these two: they seem to be some theoretical stems, which should be pronounced in the same way as lu'u- and soo. KirillW (talk) 18:02, 26 August 2022 (UTC)Reply

@KirillW: yeah, those are theoretical, sorry, I was just trying out whether the long o/short o distinction works. Thadh (talk) 00:27, 27 August 2022 (UTC)Reply