Wiktionary talk:About Czech
Add topicBasic entries
[edit]- I would place an empty line between
{{cs-noun|g=m}}
and# [[wind]] (''movement of air'')
. This is the way most Czech entries and most English entries are formatted. - I would indicate that the most basic entry is one without declensions and conjugations, not one with them. I understand declensions and conjugations are useful. But they take a lot of additional work to create; I think it should be of high priority to have a correctly translated entry in the first place. --Daniel Polansky 11:21, 4 March 2008 (UTC)
- Agreed and updated. ThomasWasHere 15:24, 4 March 2008 (UTC)
References
[edit]I think references are not part of the most elementary article. --Daniel Polansky 11:31, 4 March 2008 (UTC)
- Agreed and updated. ThomasWasHere 15:24, 4 March 2008 (UTC)
Proprietary resources on Czech
[edit]I would avoid referring to proprietary resources on Czech from this policy page. --Daniel Polansky 11:33, 4 March 2008 (UTC)
- Agreed and updated. ThomasWasHere 15:24, 4 March 2008 (UTC)
Relevance of references
[edit]I estimate that dictionaries are not even proper references. There is a policy saying that Wiktionary is a secondary source, for which the primary sources are the texts in which the terms occur. To use another dictionary, a secondary source, as a basis for Wiktionary is to make Wiktionary a tertiary source, which is not wanted. That is at least my understanding of this policy: Wiktionary:Wiktionary is a secondary source.
Based on this consideration, I would propose to drop references from the policy altogether. --Daniel Polansky 16:12, 4 March 2008 (UTC)
- OK. Let's just drop this section, it's anyway not a priority for Czech entries. I have read what you pointed out but I have also read Wiktionary:Entry_layout_explained#References that links to Category:Reference_templates where you can see a lot of public domain dictionaries. Maybe in the far future it will have to be considered again or if there is a public domain Czech-English translation dictionary. Thanks for your reviews. ThomasWasHere 16:36, 4 March 2008 (UTC)
- I see. Thanks for referring me to Wiktionary:Entry_layout_explained#References. I must have I misunderstood something; it seems that referring to public domain dictionaries is wanted. --Daniel Polansky 18:41, 4 March 2008 (UTC)
- A common sense rule seems to be that use of dictionaries are weak references but still useful. See Wiktionary:Referencing dictionaries. ThomasWasHere 11:17, 5 March 2008 (UTC)
Adjectives - feminine and neuter gender
[edit]I am so far not convinced that bílý is a good pattern or model for adjectives. Above all, I would prefer to align the Czech policy for adjectives with multilingual policy for all the languages having gender. Unfortunately, I do not know of such a policy.
So far, I have avoided creating any other than masculine adjectives. And when entering masculine adjectives, I have avoided entering feminine and neuter forms.
So instead of:
==Czech== ===Noun=== {{cs-adj}}, [[bílá]] {{f}} # [[white]]
I would prefer
==Czech== ===Noun=== {{cs-adj}} # [[white]]
If you would really want to have masculine and neuter forms in the policy, then I propose that you research into the current use in other languages.
The current proposal, even if accepted, would have to be extended to:
==Czech== ===Noun=== {{cs-adj}}, [[bílá]] {{f}}, [[bílé]] {{n}} # [[white]]
And it leaves other questions open: what should the entry for bílá look like? Should it also link to bílý and bílé?
Also, the minimal entry should IMHO not require the statement of feminine and neuter forms; the main point should be that there is a correct translation.
--Daniel Polansky 09:03, 5 March 2008 (UTC)
- Yes, it was not very clever from me to put this example : /. I have replace the example with malý. The masculine singular nominative form should be the only entry like for noun (gender apart) but I propose to add the comparative and superlative form in the template like in english for small. See also malý in the Czech version and malý in the German version (full table). It will need later a declension template
{{cs-decl-adj}}
. ThomasWasHere 10:17, 5 March 2008 (UTC)
- A declension template is still missing; right. --Daniel Polansky 10:42, 5 March 2008 (UTC)
Multiple gender
[edit]From the page text it seems that decision hasn't been made yet of how to handle adjectives with multiple gender like "noční". At Wiktionary:Index to templates the template {{ c }} is listed, standing for "common" - why not use that one? (Or does "common gender" mean something else?)Duncan MacCall 18:41, 15 September 2008 (UTC)
- AFAIK common stands from a merging of masculine and feminine genders in certain languages, such as Swedish, judging from W:Grammatical gender, and specifically W:Grammatical_gender#Common_and_neuter. So common and the template
{{c}}
should better be avoided at Czech entries. --Dan Polansky 08:07, 18 March 2009 (UTC)
Proverbs
[edit]I have created a dedicated section for proverbs, as a more detailed policy is probably needed. It is unclear--to me anyway--where to add literal translations, and what to do in case of not finding a semantically equivalent English proverb. --Daniel Polansky 10:47, 5 March 2008 (UTC)
- I haven't find any policy for proverbs in any language but most of them give the equivalent in English if it exists and the literal translation. Here are some ideas:
- If there is no equivalent in English we should provide a link to the Czech Wiktionary. Unfortunately there is no proverbs category in the Czech Wiktionary. Maybe we can find a public domain repository of Czech proverbs explained in English to link ?
- For the literal translation even if it is useful I would more just wikified the words of the proverb because it is redundant to translate the proverb. The only exception is when there is no English equivalent of the proverb. Yet, there is still an exception = ) if you use the Etymology header where you can give the etymology of the Czech proverb. ThomasWasHere 12:02, 5 March 2008 (UTC)
- It seems that providing an explanation of the idiomatic meaing of the proverb is what is wanted, judging from à bon chat, bon rat and falsum in uno, falsum in omnibus, also judging from Connel MacKenzie's comment in his RFC in my tak dlouho se chodí se džbánem pro vodu, až se ucho utrhne. Explanations of idiomatic meanings are what is found in English proverb entries. I do not think that providing a link to Czech Wiktionary solves the problem of information missing in English Wiktionary.
- There is Q:Czech_proverbs, explaining some of Czech proverbs. I have added a link to it to Category:Czech proverbs some time ago.
- I do not see that it is redundant to translate the proverb. The literal translation is still an interesting piece of information, isn't it?
- I still do not know how to format the literal translations. --Daniel Polansky 12:34, 5 March 2008 (UTC)
- Yes, providing an explanation of the idiomatic meaning of the proverb is useful. However if there is an equivalent proverb in English, a link to it can be enough. I missed the quotation page, it's a valuable resource, thanks. As for the literal translation it is useful too but if you use the principle of Wiktionary to the extreme you can say that the user should refer to the entry of each word or group of words in the proverb, thus wikilinked the proverb could be enough. For the formatting of the literal translation the nicer form I have seen is for Japanese proverbs that add it after the translation with a Lit. in front. I prefer this to add the literal translation in bracket after the entry header. Below is a prototype you can copy and paste in tak dlouho se chodí se džbánem pro vodu, až se ucho utrhne to see how it looks. And also good point to have put the lemma of the word in the wikilinks. Damn, it's not easy to format this = ) ThomasWasHere 14:41, 5 March 2008 (UTC)
==Czech== ===Proverb=== {{cs-prov|sg=[[tak|Tak]] [[dlouho]] [[se]] [[chodit|chodí]] [[se]] [[džbán|džbánem]] [[pro]] [[voda|vodu]] [[až]] [[se]] [[ucho]] [[utrhnout|utrhne]].}} # '''No English equivalent'''. ''Literally'', so long does one walk with a jug for water, until one day the handle breaks off. #: Explanation.
From what I now think to understand, the literal translation should not be after "#". And there is no need to state "No English equivalent" explicitly; that is obvious from not providing a link to that equivalent. The only issue open right now is where to put the literal translation; what belongs after "#" is either a link to the English equivalent if there is one or an explanation of the meaning of the proverb.
I do not know what "the principle of Wiktionary" you are referring to. I assume that you simply mean what you say after the invocation of "the principle of Wiktionary", namely that you can click on the single words. But clicking some ten words in some cases is a lot of work; forming a translation from these is still a further task, not necessarily easy one for a non-native speaker of Czech. --Daniel Polansky 17:21, 5 March 2008 (UTC)
- Sorry for the the principle of Wiktionary, I should have been tired : ). I have look at every language about page and the talk page that goes with and found nothing for proverb. The same for a search on proverb in all talk pages. I am waiting for the answer of Connel too.
Summary
[edit]- To resume:
- Wiki links on the words or group of word of the Czech proverb: yes, in lemma form
- Translation with the equivalent proverb in English: with a # after the entry, if not just omit
- Literal translation of the Czech proverb: on the same line as the entry, at the end, in bracket using the tr= parameter in the template
{{infl}}
but should only be used for transliteration or at the end of the definition in bracket ? - Explanation of the idiomatic meaning of the Czech proverb: after the translation with #:, in italic ?
Formatting of literal translations
[edit]The formatting of literal translations in Wiktionary is currently inconsistent, as follows from:
Entry | Note |
---|---|
a todo cerdo le llega su san Martín | Entry line. |
adar o'r unlliw hedant i'r unlle | Entry line. |
man kan inte lära gamla hundar sitta | Entry line. |
betri er krókur en kelda | Etymology section. |
Дурак дурака видит издалека | Definition line. |
船頭多くして船山に登る | Definition line. |
Bindfäden regnen | Definition line. |
Connel MacKenzie mentioned Category:zh-cn:Proverbs as a good model. In there, the literal translations, when present, are found in etymology section. I find it a bit strange though, as literal translations do not indicate the origin of the term or how the term came about. Sticking to the convention of putting the literal translations into the etymology section may be an okay temporary solution. --Daniel Polansky 10:42, 8 March 2008 (UTC)
- I am not convinced either by using the Etymology section. This is definitely a question to ask in the grease pit putting a summary of our discussions and link to here and Connel talk section. ThomasWasHere 21:37, 8 March 2008 (UTC)
See also
[edit]- Requests for deletion - Category:French proverbs
- Wiktionary Talk - Translation of Idioms
- Beer parlour - Choosing the "primary entry" for idiomatic phrases
- Beer parlour - Formatting of Idioms, Proverbs in non-English entries
Phrase punctuation and first letter
[edit]I would rather use upper case for the first letter and an end punctuation. There seems to be no consensus because how are you is with an upper case first letter and how do you do is without. But anyway it seems to me more correct.
Another thing is the format of the entry. An upper case first letter or end punctuation is not necessary because a user would most of the time forgot to type it when doing a search and if the user type it it will be first in the possible results. So no need to add a redirection for the entry with an upper case first letter or an end punctuation. ThomasWasHere 12:22, 7 March 2008 (UTC)
==Czech== ===Phrase=== '''Dobrý den!''' # [[good day|Good day!]] [[Category:Czech phrasebook]]
- Hi, IMHO the prevailing Wiktionary practice is that phrases start in lowercase. Proverbs start with lowercase too, despite being complete sentences. How are you is an anomaly, entered in a lower-case entry anyway. --Daniel Polansky 19:43, 7 March 2008 (UTC)
- There is a policy concerning capitalization that suggest to use uppercase first letter if the phrase is a sentence, however there is no period at the end. --Thomas was here ☻Talk 17:46, 26 March 2008 (UTC)
- Okay. But the policy does not match the current common practice, as you can see from Category:Proverbs. --Daniel Polansky 09:10, 27 March 2008 (UTC)
Punctuation
[edit]There is no period at the end of a proverb entry. The punctuation at the end of a phrase could be omitted too. Some English phrase entries have an exclamation mark or a quotation mark at the end of the lemma, while many don't. I would tend to omit the punctuation, based on the proverb model. --Daniel Polansky 20:02, 7 March 2008 (UTC)
- I don't want to change the format of the name of the page but the format of the entry line. However, if you think that both should be the same I understand better why it should stay like this. ThomasWasHere 20:54, 8 March 2008 (UTC)
- I understand that you want to change the format of the entry line, not the name of the page. I think above all that we should stick to what is a common practice: looking for what is already most common, checking examples and models, seeing what the community of the authors of Wiktionary has already been doing, instead of coming up with our own solutions. --Daniel Polansky 07:34, 9 March 2008 (UTC)
Formatting of disambiguations/gloss
[edit]I have always formatted the disambiguations in brackets in italic. In the policy, you have changed the formatting to roman. What made you change it? Do you think it is common Wiktionary practice to format it in roman? --Daniel Polansky 21:20, 7 March 2008 (UTC)
- Oups. Sorry I have changed it to fast. I should have first ask here before to change anything. I have put the italic back in the policy. I have to learn to be more patient, it is so tempting to change a page : /
- As for the disambiguation at the end of the definition line called a gloss if I don't mistake like the template
{{sense}}
but not{{context}}
or{{qualifier}}
, I have found nothing on the format in the English policy. When I look at the most edited pages like cat and dog the gloss seems mainly non italic. But it is the English entries, for the non-English entries it can be italic or not. There is a template that could be use:{{italbrac}}
, it allows people to choose how it looks in modifying their own style sheet. The template{{i}}
you are using is a shortcut of{{qualifier}}
and is maybe not the best to use for that. I just have seen in the history of the template that italbrac is a split of qualifier so I understand better why you use this one. I propose to use italbracor even sense because it is better to use semantic template than just formatting template. ThomasWasHere 20:47, 8 March 2008 (UTC)
- I have started to use
{{i}}
when accidentally coming over it, without looking at its documentation, assuming it is equivalent to the longer{{italbrac}}
. It seems that{{sense}}
would fit the purpose, as the gloss is in a Czech entry to indicate which of the several senses is meant; it is there in an English entry in synonyms section for the same purpose. - It seems okay to me if the Czech policy uses (''...'') instead of using a template. I personally am going to let my practice evolve, as things change here in Wiktionary, codifying only these things at my user page that I feel a need to get codified. --Daniel Polansky 08:05, 9 March 2008 (UTC)
- I have started to use
- Unfortunately,
{{sense}}
cannot be use because it adds a colon at the end so only{{italbrac}}
can be use or like you do now just (''...''). ThomasWasHere 16:53, 20 March 2008 (UTC)
- Unfortunately,
Model language policy
[edit]Is there any nice, model language policy in Category:Wiktionary language considerations that the Czech policy considerations could be modeled on? That could save some work. --Daniel Polansky 07:48, 9 March 2008 (UTC)
- I have formatted this page with Wiktionary:Entry layout explained as model and I find this structure easier to read than the variants you can find in other about language pages. However, we should have only two levels in the content table at the beginning of the page. A new section at the end of the page entitled Czech in non-Czech entries does not seems useful for the moment as the link Wiktionary:Translations given at at the beginning of the page is enough. Wiktionary:About Latin, Wiktionary:About Greek and Wiktionary:About Japanese are the biggest page at this time.
- Here are some rules I have followed :
- Start with basic entries, then more complex
- For each section: first the example, then short explanations, then long explanations
- Give example with the source code and a link to the entry each time necessary
- Do not explain something already explain in Wiktionary:Entry layout explained
- ThomasWasHere 16:49, 20 March 2008 (UTC)
- Wiktionary:About Hungarian has also grown to considerable size. Since it was written more recently than some of the other pages, it may contain ideas not in the older pages. --EncycloPetey 17:48, 26 March 2008 (UTC)
Desired entries
[edit]In the "Noun form other than singular nominative" section of the policy page, it currently states, "To use only for an already existing page in another language or for a very frequent form." I think this is backwards. The rare forms, not the freqent forms, are the ones people are going to go hunting for in the dictionary (I do, anyway).
I guess I think the better answer is to have complete declension tables and have Wiktionary's search be able to find those things, but until that happens, adding any form of a word should be encouraged, not discouraged. — V-ball 08:36, 26 November 2010 (UTC)
Rejzek 2015
[edit]I created the template {{R:Rejzek 2015}}
, which is similar to {{R:Rejzek 2007}}
. Besides the year there are two differences:
- It does not use {{pagename}} in the beginning, because some information can be referenced by entries with a different name than the Wiktionary page name. For example orba is referenced by the Rejzek's entry "orat" (which includes info on the expression "orba", too). There is an optional parameter to be filled with the name of the entry instead.
- ISBN and page number were added.
See also the documentation subpabe. --Jan Kameníček (talk) 18:42, 12 July 2015 (UTC)
- I have removed the ISBN as excessive for unique identification. It is IMHO visual noise that the reader should not be presented with. Most reference templates in the English Wiktionary do not provide ISBN; I like that practive. --Dan Polansky (talk) 21:31, 12 July 2015 (UTC)
- I strongly disagree. It helps the reader to find the book on the Internet. I often use ISBN when I search books and so I suppose that there are other people who do it too. Besides that, it was an optional parameter. So I will put it back. Jan Kameníček (talk) 22:27, 12 July 2015 (UTC)
- It is uncustomary in English publications to provide ISBN in references; I checked the references sections of multiple English books including Gödel, Escher, Bach. I am okay with putting ISBN into a tooltip.
- For a comparison of ease of finding a book, here's google:978-80-7335-393-3, and here's google:2015 Český etymologický slovník Rejzek. --Dan Polansky (talk) 07:07, 19 July 2015 (UTC)
- Paper publications do not enable the reader to make full use of ISBN search: for example the publisher cannot link it to a seaching machine, or the reader cannot copy it by CTRL C/V, so no wonder that paper book publishers do not find it so useful, but this is not our case. I understand that you do not like it, but you do not have to use it, you can still use the other provided information to search the book. But it does not mean that other people, who are accustomed to using it, cannot use it either. It might not be a common pracise at English language paper publications, but it is a common practise at en.wiktionary (and other Wikimedia project including English Wikipedia too). Jan Kameníček (talk) 16:51, 19 July 2015 (UTC)
- It is not a common practice in the English Wiktionary to provide ISBN in reference templates. I admit that, in attesting quotations, many editors are given to providing ISBN, and that I probably cannot do much about it. Nonetheless, the overwhelming majority of attesting quotations are provided without ISBN, fortunately.
- As to the point that I do not have to use it, that is really irrelevant. The presence of ISBN increases the amount of material the eye has to scan through on a page. It makes the user experience for people like me much worse. --Dan Polansky (talk) 17:04, 19 July 2015 (UTC)
- Well, you feel a problem if there are several more digits that your eyes have to scan, but I feel a problem if the digits are not there because I cannot use some common searching methods. I think that my problem is worse. Nevertheless, I asked at Beer parlour if the community could provide here more opinions. Jan Kameníček (talk) 17:33, 19 July 2015 (UTC)
- The links I have posted above show that search methods that do not rely on ISBN are entirely adequate. Furthermore, I am okay with providing ISBN in a tooltip. --Dan Polansky (talk) 18:48, 19 July 2015 (UTC)
- Well, you feel a problem if there are several more digits that your eyes have to scan, but I feel a problem if the digits are not there because I cannot use some common searching methods. I think that my problem is worse. Nevertheless, I asked at Beer parlour if the community could provide here more opinions. Jan Kameníček (talk) 17:33, 19 July 2015 (UTC)
- Paper publications do not enable the reader to make full use of ISBN search: for example the publisher cannot link it to a seaching machine, or the reader cannot copy it by CTRL C/V, so no wonder that paper book publishers do not find it so useful, but this is not our case. I understand that you do not like it, but you do not have to use it, you can still use the other provided information to search the book. But it does not mean that other people, who are accustomed to using it, cannot use it either. It might not be a common pracise at English language paper publications, but it is a common practise at en.wiktionary (and other Wikimedia project including English Wikipedia too). Jan Kameníček (talk) 16:51, 19 July 2015 (UTC)
- I strongly disagree. It helps the reader to find the book on the Internet. I often use ISBN when I search books and so I suppose that there are other people who do it too. Besides that, it was an optional parameter. So I will put it back. Jan Kameníček (talk) 22:27, 12 July 2015 (UTC)
- I include the ISBN in every reference template I create, provided there is one. I cannot think of a single reason to exclude such essential information, especially since MediaWiki automatically creates a link allowing the reader to find the book in a library or online bookseller. If anyone considers it "visual clutter", they're not obligated to look at it; if it isn't customary to include ISBNs at Wiktionary, we need to make it so. If anything, they should be required in both reference templates and citations for any work that has one. —Aɴɢʀ (talk) 17:40, 19 July 2015 (UTC)
- I have provided that reason: it is visual noise. The "they're not obligated to look at it" argument is nonsense: people cannot avoid looking at visual noise presented to them. ISBN is not "essential information", and the referencing practice in the books I have checked confirms. --Dan Polansky (talk) 18:46, 19 July 2015 (UTC)
Template:cs-decl-noun
[edit]I added optional parameters to the {{Template:cs-decl-noun}}
, which a) enable to add alternative forms and link them to their entries, b) add qualifiers behind the alternative forms. See Template:cs-decl-noun/documentation. If there are no objections, I will also ask some bots if they could add the optional parameters to the templates where needed, as can be seen e. g. at the dative singular of chlap (the two forms are added into a single parameter, which does not enable correct linking). --Jan Kameníček (talk) 19:52, 30 July 2015 (UTC)
- I don't object to the above but it seems preferable to luacize the template. With Lua, parameters like "chlapovi, chlapu" could be automatically parsed by Lua and rendered into proper wikilinks, albeit without qualifiers. --Dan Polansky (talk) 21:12, 31 July 2015 (UTC)
- I have nothing against this, but I am not able to do it :-( Jan Kameníček (talk) 22:24, 31 July 2015 (UTC)
Audio links categories
[edit]What is the difference between Category:Czech entries with audio links and Category:Czech terms with audio links? --Jan Kameníček (talk) 21:10, 1 August 2015 (UTC)
- Category:Czech entries with audio links is an explicit category (not from the audio template) that should be removed. DTLHS (talk) 21:26, 1 August 2015 (UTC)
- That is exactly what I thought, but I asked for sure. Thanks for the answer. Jan Kameníček (talk) 21:44, 1 August 2015 (UTC)
I nominated it for deletion. --Jan Kameníček (talk) 22:17, 2 August 2015 (UTC)
Czech "uncountable" nouns
[edit]It seems to me that the Category:Czech uncountable nouns is redundant to the Category:Czech singularia tantum. Czech grammar does not use this term, it uses only terms as collective nouns, hromadná, or material nouns, látková (which do not have a category here yet, but still fall under the broader category singularia tantum). Therefore I suggest to nominate the category Czech uncountable nouns for deletion. Jan Kameníček (talk) 22:37, 2 August 2015 (UTC)
- Let us be careful when applying Czech terminology in the English Wiktionary. Trying to limit the grammatical terminology used in the English Wiktionary to describe Czech to English analogues of Czech terms used by Czech grammarians to describe Czech can all too easily do a disservice to the native English speaker. For a native English speaker, "uncountable noun" is a well understood concept: a noun for which a plural cannot be formed. There certainly are such Czech nouns and can be placed into the category. The term "collective noun" refers to the likes of "smečka", as per collective noun, but I am not sure it refers to the likes of "uhlí" or "listí"; maybe it does. Even if it does, "collective noun" is not a hyponym of "uncountable noun" per "smečka". The term "uncountable noun" occurs in Czech: An Essential Grammar, by James Naughton, 2006, and in Legal Translation and the Dictionary, by Marta Chromá, 2004. The following two searches do not suggest to me that "singularia tantum" is unequivocally preferable to "uncountable nouns" in reference to Czech: google books:Czech "uncountable nouns", google books:Czech "singularia tantum". An obvious disadvatange of "singularia tantum" is that it is a Latin term, less accessible than "uncountable nouns". --Dan Polansky (talk) 20:04, 3 August 2015 (UTC)
- For the record: now that I have undone a premature category depopulation, it contains the following items: chudina, rákosí, uhlí, člověk. --Dan Polansky (talk) 20:28, 3 August 2015 (UTC)
- You are right that the English term "collective nouns" is not the same as the Czech term "hromadná podstatná jména", which I did not realize.
- Emptying the category "uncountable nouns" was not my primary goal, originally I only wanted to replace the parameter "uncountable" in the
{{template:context}}
for "singulare tantum", similarly as "plurale tantum" is used. As a result the category got empty and I suggested to delete it. It does not make sense to me to have both categories: singulare tantum and uncountables, and it does not make sense to me that one of them is a subcategory of the other. - As for the comprehensibility, when you use the parameter "singulare tantum", it shows text saying "singular only", which is very understandable, I think. If words like plavky are accompanied by text "plural only", than doubí should be accompanied by "singular only". This is my main point and if this is fulfilled, I do not care very much, if it is also added into the (imo redundant) category of Czech uncountables, or not (e. g. by adding the category manually at the end of the entry in square brackets). Jan Kameníček (talk) 00:53, 4 August 2015 (UTC)
Czech pronunciation module
[edit]I've created a module, Module:cs-pronunciation, to generate IPA for Czech entries, based on the Czech phonology and Czech orthography articles on Wikipedia. I think it's almost complete. Could someone more knowledgeable about Czech take a look at it and let me know if anything is missing? — Eru·tuon 09:13, 16 February 2017 (UTC)
- @Erutuon Hi. I think there is the tie in [t͡ʃ], [t͡s], and [d͡ʒ] missing. For the usage of the tie see e. g. w:Czech phonology or Šimáčková et al.: Czech Spoken in Bohemia and Moravia. I can see no other issues. --Jan Kameníček (talk) 19:06, 16 February 2017 (UTC)
- @Jan.Kamenicek: Ah, I removed the tie because I figured it was unnecessary. It is omitted in English transcriptions (for instance, choose is transcribed /tʃuːz/, not /t͡ʃuːz/). But I can add it back. — Eru·tuon 19:39, 16 February 2017 (UTC)
- I believe it is useful because in some cases (though not very frequently) they can be pronounced as two separate phonemes, e. g. t and ʃ, such as in the word podšitý (/ˈpotʃɪtiː/). Compare podšít (/ˈpotʃiːt/) and počít (/ˈpot͡ʃiːt/). --Jan Kameníček (talk) 19:48, 16 February 2017 (UTC)
- I prefer that there be no tie. The cases of ambiguity without the tie are rare. --Dan Polansky (talk) 13:08, 18 February 2017 (UTC)
- And "podšít" can be marked up using syllable divison, like IPA(key): /pot.ʃiːt/. Thus, tʃ would mean č unless syllable division would be used. This would lead to simpler typography. A related discussion is at User_talk:Jan.Kamenicek#Czech IPA for č. --Dan Polansky (talk) 13:19, 18 February 2017 (UTC)
- They are not that rare, several examples written in a minute are: podšít (podšitý, podšívka), podšálek, podstavec (podstavit), nadstavit, podsekretář, odstavit (odstávka), odsloužit, Potštát, podsouvat, podseknout, odšťavit, odsunout, odšoupnout and many more... --Jan Kameníček (talk) 16:38, 18 February 2017 (UTC)
- My conservative guess is that the ratio of the number of these to the number of uses of š and č is less than 1 in 100; a less conservative guess would be 1 in 1000. In any case, the syllable division proposed above is able to deal with them. --Dan Polansky (talk) 17:11, 18 February 2017 (UTC)
- @Dan Polansky: I would be glad to add a syllable division function to Module:cs-pronunciation, and it would be easy to simply add syllable breaks for the sequence /t.s/; but to handle other consonant clusters, I would need a set of rules. — Eru·tuon 21:36, 18 February 2017 (UTC)
- I am afraid this is not possible. E. g. with words like vystrčit the break is in front of -str (because of the prefix vy-), while in others it can be inside the cluster of consonants like in kostrbatý. What is more, there are often more ways how to divide the word into individual syllables: hospoda can be divided both ho-spo-da or hos-po-da. I am afraid this cannot be solved automatically. It would also be redundant, because the possible divisions are usually shown using the template
{{hyphenation}}
. --Jan Kameníček (talk) 23:06, 18 February 2017 (UTC)- Perhaps a hyphen could be added to the respelling to show where the syllable break should be. Then the module would replace it with the syllable break mark . in the output. In that case,
{{cs-IPA|vy-strčit}}
would yield IPA(key): [ˈvɪ.str̩.t͡ʃɪt]. — Eru·tuon 23:51, 18 February 2017 (UTC)- This would be a technical solution if we decided that we would use the syllable break only in some cases, like when we need to distinguish tš (with the syllable break between the two phonemes) and č. I do oppose such a solution for the reasons written above, I believe that using the tie is a purer solution. Syllable breaks are shown on a different line of the pronunciation section. It would not be good to use if for all words. E. g. with kostrč we would have to write two different transcriptions IPA(key): [ˈko.str̩t͡ʃ], ˈkos.tr̩t͡ʃ, which is unnecessary, because the actual pronunciation is the same. --Jan Kameníček (talk) 23:59, 18 February 2017 (UTC)
- Huh. How do we know that there are two possible syllabifications of kostrč, if the distinction has no effect on the actual pronunciation? — Eru·tuon 00:44, 19 February 2017 (UTC)
- We know that there are 2 syllables, but the border between them is simply not clear. If the border has to be specified in such words for some reason, there are sometimes more options where the border can be determined. It is typical for clusters beginning with s or š, the border can be determined voluntarily either before s/š or after s/š. It is discussed for example at CzechEncy where they give an example of čeština: češ-ti-na or če-šti-na. A page of Masaryk University on phonetics and phonology gives an example of hrstka: hrst-ka, hrs-tka or hr-stka. --Jan Kameníček (talk) 01:11, 19 February 2017 (UTC)
- Ahh, so it's somewhat like ambisyllabic consonants in English. Butter can be syllabified as /bʌt.ɚ/ (because the checked vowel /ʌ/ is supposed to only occur in closed syllables) or /bʌ.tɚ/ (so that the syllable has an onset). The module could add syllable breaks in clear cases, and omit them in these uncertain cases. (Perhaps the module could also automatically generate hyphenation too.) — Eru·tuon 02:04, 19 February 2017 (UTC)
- I personally am not very fond of unsystematic edits, so I would prefer not to use syllable breaks at all.
- As for the hyphenation: it should be possible in principle, because there are rules, but they are quite many, see Internetová jazyková příručka. Possible problems could be words like vystrč, which has just one possibility vy-strč (because of the prefix vy-), and kostrč, which can be hyphenated ko-s-trč. --Jan Kameníček (talk) 02:20, 19 February 2017 (UTC)
- @Erutuon: In fact the hyphenation rules mentioned at Internetová jazyková příručka are (with some exceptions) the same as the rules for dividing words into syllables, from which it can be seen that implementing syllable divisions into the module would be very difficult. --Jan Kameníček (talk) 13:03, 19 February 2017 (UTC)
- Ahh, so it's somewhat like ambisyllabic consonants in English. Butter can be syllabified as /bʌt.ɚ/ (because the checked vowel /ʌ/ is supposed to only occur in closed syllables) or /bʌ.tɚ/ (so that the syllable has an onset). The module could add syllable breaks in clear cases, and omit them in these uncertain cases. (Perhaps the module could also automatically generate hyphenation too.) — Eru·tuon 02:04, 19 February 2017 (UTC)
- We know that there are 2 syllables, but the border between them is simply not clear. If the border has to be specified in such words for some reason, there are sometimes more options where the border can be determined. It is typical for clusters beginning with s or š, the border can be determined voluntarily either before s/š or after s/š. It is discussed for example at CzechEncy where they give an example of čeština: češ-ti-na or če-šti-na. A page of Masaryk University on phonetics and phonology gives an example of hrstka: hrst-ka, hrs-tka or hr-stka. --Jan Kameníček (talk) 01:11, 19 February 2017 (UTC)
- Huh. How do we know that there are two possible syllabifications of kostrč, if the distinction has no effect on the actual pronunciation? — Eru·tuon 00:44, 19 February 2017 (UTC)
- This would be a technical solution if we decided that we would use the syllable break only in some cases, like when we need to distinguish tš (with the syllable break between the two phonemes) and č. I do oppose such a solution for the reasons written above, I believe that using the tie is a purer solution. Syllable breaks are shown on a different line of the pronunciation section. It would not be good to use if for all words. E. g. with kostrč we would have to write two different transcriptions IPA(key): [ˈko.str̩t͡ʃ], ˈkos.tr̩t͡ʃ, which is unnecessary, because the actual pronunciation is the same. --Jan Kameníček (talk) 23:59, 18 February 2017 (UTC)
- Perhaps a hyphen could be added to the respelling to show where the syllable break should be. Then the module would replace it with the syllable break mark . in the output. In that case,
- I am afraid this is not possible. E. g. with words like vystrčit the break is in front of -str (because of the prefix vy-), while in others it can be inside the cluster of consonants like in kostrbatý. What is more, there are often more ways how to divide the word into individual syllables: hospoda can be divided both ho-spo-da or hos-po-da. I am afraid this cannot be solved automatically. It would also be redundant, because the possible divisions are usually shown using the template
- @Dan Polansky: I would be glad to add a syllable division function to Module:cs-pronunciation, and it would be easy to simply add syllable breaks for the sequence /t.s/; but to handle other consonant clusters, I would need a set of rules. — Eru·tuon 21:36, 18 February 2017 (UTC)
- My conservative guess is that the ratio of the number of these to the number of uses of š and č is less than 1 in 100; a less conservative guess would be 1 in 1000. In any case, the syllable division proposed above is able to deal with them. --Dan Polansky (talk) 17:11, 18 February 2017 (UTC)
- They are not that rare, several examples written in a minute are: podšít (podšitý, podšívka), podšálek, podstavec (podstavit), nadstavit, podsekretář, odstavit (odstávka), odsloužit, Potštát, podsouvat, podseknout, odšťavit, odsunout, odšoupnout and many more... --Jan Kameníček (talk) 16:38, 18 February 2017 (UTC)
- @Jan.Kamenicek: Ah, I removed the tie because I figured it was unnecessary. It is omitted in English transcriptions (for instance, choose is transcribed /tʃuːz/, not /t͡ʃuːz/). But I can add it back. — Eru·tuon 19:39, 16 February 2017 (UTC)
- Let me add that the same problem as in Czech podšít in English batshit (sorry for the vulgar example), where the markup is IPA(key): /ˈbæt.ʃɪt/. Therefore, it seems that Czech does not differ from English as for the existence of the problem (tʃ vs. t.ʃ) and the applicability of the solution. As for how to automate it, it should be quite simple: once the code sees "tš", that is, t and š next to one another, the code should place a syllable break between t and ʃ; if this rule does not work universally, maybe it can be refined to work, or - can be used as proposed above to let the user do manual markup. --Dan Polansky (talk) 13:30, 19 February 2017 (UTC)
- And again, my preference is not to worry about syllable division unless it is necessary for disambiguation of tʃ and ts. This seems similar to what is currently being done for English, that is, e.g. there is ˈpəːtɪnənt without syllable break, but there is ˈbæt.ʃɪt with a syllable break, although I have seen some other English entries that do use syllable breaks, e.g. ˈtɛmp.lət. --Dan Polansky (talk) 13:36, 19 February 2017 (UTC)
- Other English entries that need disambiguation include courtship, hotshot, nutshell, nutshot, outshine, outshout, outshop, and outshoot. --Dan Polansky (talk) 13:48, 19 February 2017 (UTC)
- Looking at what Germans are doing: Klatsch is marked up as "klatʃ", while de:Klatsch is marked up as "klaʧ", so they use the ʧ ligature. I must have read somewhere that the ligature markup was proposed in IPA and then discontinued. Tratsch has tʀaːtʃ. --Dan Polansky (talk) 13:58, 19 February 2017 (UTC)
- Yes, the ligature was abandoned in favour of the tie above t and ʃ.
- As for the automation: I am afraid it is not that simple as two variants of pronunciation are possible if the phonemes appear between the root and the suffix, see větší [vjɛtʃiː] and [vjɛt͡ʃiː] (applies also for its declensions and other forms), lidský [lɪtskiː] and [lɪt͡skiː] (+ decl.) , lidštější [lɪtʃcɛjʃiː] [lɪt͡ʃcɛjʃiː] (+ decl.), dětský, dětštější, kratší, čistší, studentský, mladší, většina (all including declensions) and others... [1] [2] --Jan Kameníček (talk) 14:11, 19 February 2017 (UTC)
- Does any contemporary linguistic literature on Czech phonology distinguish between t͡ʃ and tʃ using the syllable break? I have never seen it anywhere, and the reason is that syllable break is not meant to help distinguish between phonemes. That should be done using proper IPA characters for individual phonemes. --Jan Kameníček (talk) 14:26, 19 February 2017 (UTC)
- I don't know what contemporary literature on Czech phonology does, but I know I like what the English Wiktionary does for English. The disambiguation argument you are using would apply to English as well, as I pointed out above. The tie screams "workaround" to me much more than a sporadic use of syllable mark does. --Dan Polansky (talk) 14:59, 19 February 2017 (UTC)
- No, you are mistaken. The tie is an official IPA symbol which is meant to do exactly what we use it to do. If we do not want to confuse readers reading entries with Czech phonology, we should follow the customs and rules kept in the literature on phonology of Czech language. --Jan Kameníček (talk) 16:26, 19 February 2017 (UTC)
- I report to you, honestly and accurately, that the tie is screaming workaround to me, based on observing my mental state. More to the point: I do not deny that the tie is an official IPA symbol, but it is an ugly fix added later, after using ligature was considered. I do not see why we should follow the literature on phonology of Czech literature rather than what is customary in the English IPA markup (you have not addressed the point that English has the same problem), but even then, Bičan[3] uses a tie that is below, and Bičan 2008[4] uses ligature instead of the tie. It is not obvious that all such literature uses the tie; what search did you do to find as many and as varied items from that literature as possible, and what items did you find? --Dan Polansky (talk) 09:32, 25 February 2017 (UTC)
- One another item: Krčmová's chapter 2.3 Transkripce[5] in Fonetika a fonologie, 2009, uses ligature and does not use tie. --Dan Polansky (talk) 09:54, 25 February 2017 (UTC)
- I do not know why you keep arguing with your feelings (such as screaming workaround to me) instead of with what is used. In the context of Czech language the syllable marks are not typically used. English Wiktionary accepts using the ties for foreign languages, for example Latin entries use them frequently together with syllable marks, and there are reasons to do the same in Czech entries. Yes, some literature uses ligatures because they were dismissed only a short time ago and some authors may have been using them for some time after that. I am personally not against ligatures too, but the Wikiprojects tend to abandon them.
- By the way, there are some words like dštít [tʃciːt] where the syllable mark workaround would not help any way. The only way how to show the difference in pronunciation between dšti [tʃcɪ] and čti [t͡ʃcɪ] is the tie (or the ligature). --Jan Kameníček (talk) 14:50, 25 February 2017 (UTC)
- We are not arguing facts about pronunciation but rather a convention for marking it up. Therefore, a personal preference does play a role, whether mine as an English Wiktionary contributor or the authors of the works that use the tie or other means. I have a general disregard for authority as a matter of principle. Of course, if IPA would see tie as mandatory, then we would have to use it to provide what is properly IPA, but that is not the case as far as I know. You keep on repeating that the tie is used without supplying a list of works that actualy use the tie, whereas I provided at least one example above that does not use the tie. As for "dštít", that seems to be an argument I do not know how to deal with right now; I thought it would be pronounced "dš" rather than "tš", but I am not so sure. --Dan Polansky (talk) 15:07, 25 February 2017 (UTC)
- As for wikiprojects abandoning ligatures: which wikiprojects used ligatures and then abandonded them? --Dan Polansky (talk) 15:15, 25 February 2017 (UTC)
- When talking about conventions, i. e. generally accepted standards and norms, then we have to have a look at what the convention is in our context (IPA transcription of Czech language) and forget about personal feelings.
- A quick example where ties (and ligatures) are used: http://fonetika.ff.cuni.cz/o-fonetice/foneticka-transkripce/transkripce-cj-ipa/ . It is clear that ligatures are also still used, although they are not an official IPA symbol anymore. I have nothing against them, although I slightly prefer using only official symbols.
- I have often seen replacing ligatures with ties in English Wikipedia and Czech Wiktionary also prefers ties.
- The official IPA chart says: Affricates and double articulations can be represented by two symbols joined by a tie bar if necessary. I have shown numerous examples proving that in Czech language it is necessary. IPA chart offers just ties above or below the two characters. Some publications on Czech phonology use the ties and also (dated?) ligatures. Therefore I suggest ties. --Jan Kameníček (talk) 20:11, 25 February 2017 (UTC)
- No, you are mistaken. The tie is an official IPA symbol which is meant to do exactly what we use it to do. If we do not want to confuse readers reading entries with Czech phonology, we should follow the customs and rules kept in the literature on phonology of Czech language. --Jan Kameníček (talk) 16:26, 19 February 2017 (UTC)
- I don't know what contemporary literature on Czech phonology does, but I know I like what the English Wiktionary does for English. The disambiguation argument you are using would apply to English as well, as I pointed out above. The tie screams "workaround" to me much more than a sporadic use of syllable mark does. --Dan Polansky (talk) 14:59, 19 February 2017 (UTC)
Jan Kameníček: I also dislike having inconsistency in the marking of syllable breaks, but I would rather have some syllabification than no syllabification, because it makes transcriptions easier to read for someone who doesn't speak the language like me. It breaks the transcription into meaningful units, rather than it appearing as a mass of undifferentiated symbols. — Eru·tuon 04:34, 20 February 2017 (UTC)
It might be useful to take the syllabification or hyphenation rules (linked to in Module talk:cs-pronunciation) and to translate them into module code, because it would save the effort of editors to know the rules and apply them. Unfortunately, I can't understand Czech.
I was thinking perhaps the module could show variant syllabifications in a collapsible list, so that they do not clutter up the entry, but they are still available if someone wants to see them. — Eru·tuon 20:42, 21 February 2017 (UTC)
- I cannot imagine what it would look like. Could you show an example, please? --Jan Kameníček (talk) 21:07, 21 February 2017 (UTC)
- It could look something like the box below. If there is only one possible syllabification (as with words consisting of CV syllables), the syllabification could be displayed as the main transcription, and the collapsible box could be omitted. — Eru·tuon 21:54, 21 February 2017 (UTC)
- Hmmm, looks fine. --Jan Kameníček (talk) 21:57, 21 February 2017 (UTC)
- I think two IPAs in one row would not really clutter the entry, certainly no more than the English entries are cluttered with their U.S. and U.K. pronunciations. What I mean is something like this: "IPA(key): [ˈɦos.po.da], [ˈɦo.spo.da]". Are there words that would need more than two IPAs because of syllabification? --Dan Polansky (talk) 12:00, 25 February 2017 (UTC)
- Yes, there are, and they are not rare, e. g. [ˈno.stri.fi.ko.vat], [ˈnos.tri.fi.ko.vat] and [ˈnost.ri.fi.ko.vat]. Another one: [ˈhr.stka], [ˈhrs.tka] and [ˈhrst.ka]. A more compplicated one: [ˈpro.sto.pá.šný], [ˈpros.to.pá.šný], [ˈpro.sto.páš.ný] and [ˈpros.to.páš.ný]. Jan Kameníček (talk) 14:27, 25 February 2017 (UTC)
- Could this be marked up as [ˈpro.s.to.pá.š.ný]? The implication would be that any two syllable marks separated by a single letter are a pair of alternatives to choose from. How does this existence of a multitude get handled when
{{hyphenation}}
is used? --Dan Polansky (talk) 14:45, 25 February 2017 (UTC) - I know such a principle is used to indicate hyphenation. Is there any precedent of such syllable indication? --Jan Kameníček (talk) 20:24, 25 February 2017 (UTC)
- I think that would imply that the [s] and [ʃ] were syllabic ([s̩, ʃ̍]), something like Mandarin syllabic fricatives. — Eru·tuon 22:16, 25 February 2017 (UTC)
- Could this be marked up as [ˈpro.s.to.pá.š.ný]? The implication would be that any two syllable marks separated by a single letter are a pair of alternatives to choose from. How does this existence of a multitude get handled when
- Yes, there are, and they are not rare, e. g. [ˈno.stri.fi.ko.vat], [ˈnos.tri.fi.ko.vat] and [ˈnost.ri.fi.ko.vat]. Another one: [ˈhr.stka], [ˈhrs.tka] and [ˈhrst.ka]. A more compplicated one: [ˈpro.sto.pá.šný], [ˈpros.to.pá.šný], [ˈpro.sto.páš.ný] and [ˈpros.to.páš.ný]. Jan Kameníček (talk) 14:27, 25 February 2017 (UTC)
- I think two IPAs in one row would not really clutter the entry, certainly no more than the English entries are cluttered with their U.S. and U.K. pronunciations. What I mean is something like this: "IPA(key): [ˈɦos.po.da], [ˈɦo.spo.da]". Are there words that would need more than two IPAs because of syllabification? --Dan Polansky (talk) 12:00, 25 February 2017 (UTC)
- Hmmm, looks fine. --Jan Kameníček (talk) 21:57, 21 February 2017 (UTC)
Tie, syllabification and the test module
[edit]There is no agreement on the usage of tie and syllabification, so the test module will invariably fail. Adding automatic syllabification doesn't seem practically possible without manually supplying syllable boundaries, so the tie should be preserved as well.--Anatoli T. (обсудить/вклад) 20:32, 19 February 2017 (UTC)
- That's not an argument for tie. The relatively rare case where disambiguation is required can be entered manually by providing a respelling that uses "-" to mark syllable separation. It is not clear what should be done in the absence of consensus, whether omit the tie or preserve it. A minimalist defaulting would lead to no tie, but I am not sure a minimalist defaulting in the absence of consensus is generally acceptable. --Dan Polansky (talk) 09:20, 25 February 2017 (UTC)
Conjugation tables
[edit]There is quite a mess in the conjugation sections of the entries on verbs.
- Sometimes they try to explain the whole grammar connected with the specific verb, like in ovládat. I believe this is a wrong attitude, because Wiktionary is a dictionary and not a grammar book. The table is too complicated, which is the reason why the template was not created for other verbs as well. Conjugation of Czech verbs is even much less regular than it may seem and we would either need many templates to cover all possible variations, or we could use just one general template where all the tenses, conditionals and so on would have to be filled manually one by one (extremely exhaustive). Most of the "forms" in the conjugation template used with ovládat are a combination of real forms of the verb with auxiliaries, like byl bych ovládal, meaning "(I) would have controlled". Entries on English verbs do not show conditionals and alike, and so I believe there is no reason to show them in entries on Czech verbs.
- Additional comment: The current conjugation table used e. g. with the verb ovládat is not very consistent either. It lists both past participles and passive participles, but only past participles are listed also combined with auxiliaries, while passive participles are not. Thus the table is full of e. g. various active conditionals (such as "byli bychom ovládali"), while passive conditionals (like "byli bychom ovládáni") are missing completely. The same applies for transgressives: the table offers space only for active transgressives (like "ovládav") but not for passive ones (e. g. "byv ovládán"). There are two ways to make it consistent: 1) blow the table even more, or 2) get rid of all the combinations with auxiliaries. --Jan Kameníček (talk) 00:38, 13 March 2017 (UTC)
- Sometimes they have a simple conjugation table like the one in the entry prosit. This is used most often. The main disadvantage is that the table does not show all forms of the verb (like passive forms). Besides that it does not show the difference between some past forms like prosili/prosily/prosila (this could be solved by adding the info into the table, but it would make the table more complicated again).
For these reasons I suggest to change the attitude and to show all existing forms, but not their combinations with auxiliaries. An example what it may look like for the verb psát:
- Conjugation
--Jan Kameníček (talk) 19:39, 1 March 2017 (UTC)
- I am interested particularly in opinions of the following people: @Dan Polansky, Droigheann. --Jan Kameníček (talk) 23:21, 12 March 2017 (UTC)
- Personally I don't mind large tables, as long as they are collapsible (like en-wikt French conjugations) or on a separate page (like fr-wikt French conjugations). Question is, how far do you want to go in the direction of "user-friendliness" and how far in "being academic". Somebody reasonably acquainted with the language indeed only needs to know that e.g. the neuter passive participle of psát is psáno to deduce that the future tense passive mood is bude psáno, but I guess that many a learner would prefer a conjugation table to have the full form bude psáno to its having forms like byli bychom ovládali or psav, which I suspect 99% of native speakers never came across outside high school classrooms. (Frankly, I would expect anybody advanced enough to care about přechodníky to look them up in cs-wikt.)
- I would probably prefer something like this, possibly expanded for passive forms, but given I don't need these tables as a reader and don't intend to use them myself as an editor (too much like work whatever the template) I don't advocate this, just giving you my opinion since you've asked for it. --Droigheann (talk) 23:18, 14 March 2017 (UTC)
- We have to decide how much grammar should Wiktionary teach. It is the same for English: some people might not be able to deduce the future tense passive mood will be written just from the fact that past participle of write is written. Despite that entries on English verbs do not contain such grammar tables.
- As for the table from fr.wikt: it is quite clearer and better arranged, but it does not show some less common forms (transgressives). It offers grammar structures similarly as our tables, but again, some less common ones (past conditionals) are missing, which is confusing. I believe that we should decide what we want to show (forms or grammar) and than show it in as a complete way as possible. The fr.wiki table may mislead the reader that there are no past conditionals in Czech, which is not true. It is better not to show the grammar than showing it incompletely and thus confusing the readers.
- I also believe that it should be unified with the attitude of the English part of en.wiktionary. --Jan Kameníček (talk) 00:13, 15 March 2017 (UTC)
- I think having five collapsible sections is not so nice. If this can be reduced to two collapsible sections, it could be okay. The first collapsible section could contain the most commonly used forms and information that is supposed to be in the layer 1 of importance, as it were. Transgressives are an example of what does not belong to layer 1, I think, being an archaic feature. Similarly, the section on conditionals now present in psát via
{{cs-conj-psát}}
basically reuses the past participles, and seems outside of layer 1. By contrast, present forms, imperatives and past participles are all part of layer 1, in my view. - fr:Annexe:Conjugaison en tchèque/psát looks fine; that could be the layer 1 collapsible table. --Dan Polansky (talk) 09:35, 18 March 2017 (UTC)
- In my understanding all the forms except the transgressives would go into the layer 1 of the importance (it may seem that passive participle psán is not used very often, which is true, but passive participles of many other verbs are used very frequently). I am also not sure how the two layers should be named. So, if we decided not to have 5 collapsible sections, than we could put them all into one.
Present forms | ||
singular | plural | |
1st person | píši, píšu | píšeme |
2nd person | píšeš | píšete |
3rd person | píše | píší, píšou |
Imperatives | ||
singular | plural | |
2nd person | 1st person | 2nd person |
piš | pišme | pište |
Past participles | ||
singular | plural | |
masculine animate | psal | psali |
masculine inanimate | psaly | |
feminine | psala | psaly |
neuter | psalo | psala |
Passive participles | ||
singular | plural | |
masculine animate | psán | psáni |
masculine inanimate | psány | |
feminine | psána | psány |
neuter | psáno | psána |
Transgressives | ||
present | past | |
masculine singular | píše | psav |
feminine + neuter singular | píšíc | psavši |
plural | píšíce | psavše |
- I think this solution is also not bad, although I would slightly prefer the 5 separate sections so that the readers could open and close just the section they need. What do you think? --Jan Kameníček (talk) 15:40, 18 March 2017 (UTC)
- I prefer the linked French template fr:Annexe:Conjugaison en tchèque/psát to the one you just posted, since yours is very narrow, which it compensates by being rather tall; it is so tall that it does not fit a single screen on my notebook while the French one does. OTOH, one could argue that the narrow template is better for mobile devices. I also like how the French template is formatted as one table and not a series of tables with varying column widths. Furthermore, I like how the French template uses much less boldface. Passive participle psán probably belongs to layer 1, agreed. As for the number of collapsibles, I find 5 collapsibles too many, especially if each collapsible is to contain only a small table.
- If we should go for the narrow table version, it can still be improved by being formatted as one table, which removes part of the vertical whitespace. As per my preference, it would be improved by using less boldface, perhaps using italics instead or just a different background color. In fact, the table headings could be just in normal font: they can stand out by having no hyperlinks, unlike the data cells. --Dan Polansky (talk) 16:31, 18 March 2017 (UTC)
- I think this solution is also not bad, although I would slightly prefer the 5 separate sections so that the readers could open and close just the section they need. What do you think? --Jan Kameníček (talk) 15:40, 18 March 2017 (UTC)
- I understand. The fr.wiki table includes an incomplete list of various combinations of verb forms with auxiliaries and thus it is not compatible with my suggestion. However, I tried to modify it, so that it included less vertical space. (I also changed the model verb from "psát" to "nedopsat" to show what it would look like with a longer verb.)
Present forms | ||||
indicative | imperative | |||
singular | plural | singular | plural | |
1st person | nedopíši, nedopíšu | nedopíšeme | – | nedopišme |
2nd person | nedopíšeš | nedopíšete | nedopiš | nedopište |
3rd person | nedopíše | nedopíší, nedopíšou | – | – |
Participles | ||||
Past participles | Passive participles | |||
singular | plural | singular | plural | |
masculine animate | nedopsal | nedopsali | nedopsán | nedopsáni |
masculine inanimate | nedopsaly | nedopsány | ||
feminine | nedopsala | nedopsaly | nedopsána | nedopsána |
neuter | nedopsalo | nedopsala | nedopsáno | nedopsána |
Transgressives | ||
present | past | |
masculine singular | – | nedopsav |
feminine + neuter singular | – | nedopsavši |
plural | – | nedopsavše |
- Is this better? --Jan Kameníček (talk) 18:57, 18 March 2017 (UTC)
- Thank you, I like your above proposal.
- One thing I wonder about is the future tense. Is this something to be completely omitted? Or could it be covered in a summary note below the tables, like "The future tense is created by combining budu, budeš, bude, budeme, budete or budou with psát? That would be for the imperfective "psát", not for "dopsat". --Dan Polansky (talk) 11:39, 19 March 2017 (UTC)
- Thanks. Now I will have to think about how to make an easy-filling template.
- I am not sure about the future tense. I understand the arguments for it, but if it is added, readers might start wondering, why the future active combining "budu" + infinitive is explained, while future passive combining "budu" + passive participle is not. There is also the problem with perfective and imperfective verbs: imperfective verbs (though I am not sure if all of them) use the present forms to express the future tense and in fact do not have the present tense, which would have to be explained too. Beside that there are also numerous present passives, conditionals (including present, passive, past and past passive conditionals) and passive transgressives to be explained too. English entries do not explain the grammar which makes the situation much easier. However, I will keep it in mind, because mentioning these things can be useful, and if I come to some easy solution, I will try to deal it in the next step. --Jan Kameníček (talk) 14:58, 19 March 2017 (UTC)
- Is this better? --Jan Kameníček (talk) 18:57, 18 March 2017 (UTC)
- You are right that English entries do not explain the English future tense will, and the tense grammar in general, nor do they explain conditionals and such. That's a good point. However, the question remains whether this is the most user-friendly option. German machen contains a collapsible table "Composed forms of machen", which I find to be a quite a good solution. --Dan Polansky (talk) 15:30, 19 March 2017 (UTC)
- I see. As I said, I will think about it too. --Jan Kameníček (talk) 17:46, 19 March 2017 (UTC)
Conjugation template
[edit]I have prepared the conjugation template, see {{Template:cs-conj-forms}}
.
I was thinking about two possibilities: either to have several templates for various conjugation classes or patterns (nést, prosit…) or to make one universal template. The first option seemed to have the advantage of lesser parameters, but I am afraid that various irregularities would either recquire adding more parameters anyway or creating other special templates, and I liked none of these options. E. g. skákat should "oficially" follow the class I, pattern péct, but forms following class V (skákají) can in fact be seen too. So I decided for one universal template. One of its advantages is that the user does not have to explore which template to choose, which diminishes the possibilities of some mistakes, which I have seen before. I think the basic usage is simple and only some verbs need some additional parameters. --Jan Kameníček (talk) 00:28, 21 March 2017 (UTC)
Proscribed entries
[edit]I oppose the label proscribed as misleading, unscientific and against the spirit of descriptivist lexicography, but I am in a small minority.
I wonder what kind of principles the supporters of the label intend to use. Let's look at some examples:
- Should Galicie be labeled as proscribed since Pravidla českého pravopisu (PČP) only has Galície? Source: [6]. The same source, Internetová jazyková příručka (IJP), indicates that "V pořádku jsou obě dvě, v úzu převažuje krátká podoba", that is, it indicates that both Galicie and Galície are okay, and that Galicie is more commonly used.
- Should banjo be labeled as proscribed since SSČ and ASCS only have benžo? Source: [7].
- Should scenérie be labeled as proscribed since it is absent from SSČ, which is more modern than SSJČ, which has the form?
- Should obhajoba be labeled as sometimes proscribed since it is proscribed in Naše řeč 1918[8], and if not, why not?
What is the status of IJP as for proscription? Do pronouncements of IJP override PČP? Since, if absence from PČP is read as "proscribed", then "sometimes proscribed" is descriptively accurate and IJP cannot change the fact.
Some discussions: Talk:tchýně, Talk:scénárista. Category: Category:Czech disputed terms.
--Dan Polansky (talk) 09:04, 3 March 2019 (UTC)
Updating some Czech
[edit]@Solvyn @Benwing2 (and I don't know who else to ping, to be honest, I would really love some other input). I think trying to make (West) Slavic languages more uniform in appearance would be nice; Solvyn, how do you feel about the use of + templates? (compare głowa). I also think updating derived related sections should use some sort of {{col}}
and (the rececntly made by me) {{cs-derived verbs}}
(confer razit). I also think a pronunciation module could easily be made for Czech similar to {{pl-pronunciation}}
. If you know other regular Czech editors, please ping them.
Ben, how hard would it be to implement some of these things, and how can I help? Vininn126 (talk) 12:56, 21 March 2023 (UTC)
- Thanks for your ping. The Polish entries are well formatted, and it would be great if their formats could be applied to the Czech entries. It is very necessary to use +templates. If there are more than two derived or related terms, I think the use of
{{col}}
looks good. Also, I'm looking forward to the use and promotion of{{cs-p}}
and{{cs-derived verbs}}
. Finally, I'm sorry that I have no programming skills and cannot create a precise template or module. Solvyn (talk) 16:11, 21 March 2023 (UTC)- @Surjection Do you think you'd be able to help with a pronunciation module? It wouldn't be too different from the Polish one - in fact easier because Czech doesn't have nearly as many polygraphs. The syllabification should be more or less the same, and the rhymes will be from the first vowel to the end, as Czech has fixed stress. Vininn126 (talk) 19:45, 21 March 2023 (UTC)
- I'm too busy these days to promise to commit to anything like this in the near future. — SURJECTION / T / C / L / 19:55, 21 March 2023 (UTC)
- @Solvyn, Vininn126 I've already fixed
{{cs-verb}}
to use more standard params, similar to{{pl-verb}}
. Should not be hard to apply the + templates and fix up categorization and such. As for{{cs-p}}
, I will work on that after my current work on Persian IPA. Benwing2 (talk) 20:15, 21 March 2023 (UTC)- Thank you Ben, I appreciate all the help. Vininn126 (talk) 20:18, 21 March 2023 (UTC)
- @Vininn126 I did a bunch of cleanups to Czech lemmas and am currently pushing the results. Benwing2 (talk) 04:17, 22 March 2023 (UTC)
- Cheers. Vininn126 (talk) 09:19, 22 March 2023 (UTC)
- @Vininn126 I did a bunch of cleanups to Czech lemmas and am currently pushing the results. Benwing2 (talk) 04:17, 22 March 2023 (UTC)
- Thank you Ben, I appreciate all the help. Vininn126 (talk) 20:18, 21 March 2023 (UTC)
- @Surjection Do you think you'd be able to help with a pronunciation module? It wouldn't be too different from the Polish one - in fact easier because Czech doesn't have nearly as many polygraphs. The syllabification should be more or less the same, and the rhymes will be from the first vowel to the end, as Czech has fixed stress. Vininn126 (talk) 19:45, 21 March 2023 (UTC)
- @Zhnka What do you think? Vininn126 (talk) 16:12, 22 March 2023 (UTC)
- Oh, I missed one. The parameter for the diminutive form may need to be added to Module:cs-headword. Solvyn (talk) 11:23, 23 March 2023 (UTC)
- We can look into that. @JeffDoozan Do you still have the code that converted Polish derived related terms to Czech? Do you think you could modify it for Czech, maybe using
{{col-auto}}
(unless you all have a preference). We could run a bot through the pages to see what different environments exists Vininn126 (talk) 16:40, 23 March 2023 (UTC)- @Vininn126: I didn't save it, which is either a sign that I though it would be easy to recreate or that I thought it wouldn't be useful again. Let me know exactly what needs to be done and I can take a look at it. JeffDoozan (talk) 00:42, 24 March 2023 (UTC)
- @JeffDoozan Basically the same thing as with Polish, as @Solvyn and I were discussing. I think it would be better to use
{{col-auto}}
instead. Vininn126 (talk) 09:45, 24 March 2023 (UTC)- @Vininn126: I can have the bot convert ~11,000 sections that contain just a list of lines formatted
* {{l|cs|term}}
to{{col-auto}}
, like this edit on ostřelovat. It will skip ~1200 sections with errors. JeffDoozan (talk) 15:34, 24 March 2023 (UTC)- @JeffDoozan Interesting, I'm seeing a lot of lists of
{{l}}
on the errors (like domácí). I also see some * [[TERM]] or *[[TERM]], a lot with * See {{l|cs|hydro-}}. I suppose those can be just switched, i.e. collapsed into a single{{col-auto}}
template. As for the affixes, those should probably be left alone? Vininn126 (talk) 15:40, 24 March 2023 (UTC)- The list of fixes/errors is generated from the March 20 export. At that time, domácí was using [[term]] but since then Benwing fixed it to use
{{l}}
. I've already told the bot to ignore the sections that just contain a single* See {{l|cs|term}}
line, so all of the reported errors are sections with* See {{l|cs|term}}
mixed with other lines that should probably be cleaned up manually. I can have the bot skip the affixes (you mean -ač through -ův, right?) if you want. What about the prefixes like beze- and polo-? JeffDoozan (talk) 16:01, 24 March 2023 (UTC)- Yeah, prefixes, suffixes, and interfixes should be skipped because those use templates like
{{prefixsee}}
and the like. Vininn126 (talk) 16:07, 24 March 2023 (UTC)- I updated the list of fixes/errors to exclude any page that starts or ends with "-". Let me know if I should run the bot on this list or if you see anything else that needs addressed. JeffDoozan (talk) 16:12, 24 March 2023 (UTC)
- Thanks, Jeff! Vininn126 (talk) 16:13, 24 March 2023 (UTC)
- Unless @Solvyn has any objections, I think we can convert those 11,000 sections, and we can look at the remaining skipped sections. Vininn126 (talk) 16:18, 24 March 2023 (UTC)
- I have no objections. Just do it. Solvyn (talk) 16:34, 24 March 2023 (UTC)
- @Solvyn, Vininn126, I found a bug in the bot code that was causing it to under-report the sections it was skipping and overestimate slightly the pages it was going to fix. It will still going to fix most of the same sections (down to 10,250 now), but now has identified 5000 sections to be skipped. Most of the sections being skipped already use
{{col3}}
or similar, and many others use{{l}}
with theg=
parameter. I can still run the bot on the pages it can fix, but let me know how I can make the error page more helpful for anyone who might be going through that manually (eg, is there a good way to handle the gender qualifiers, should the bot just ignore sections with{{col3}}
and friends?) JeffDoozan (talk) 19:25, 24 March 2023 (UTC)- I'd say skip gender, and convert col3 to col-auto. Vininn126 (talk) 21:14, 24 March 2023 (UTC)
- Done JeffDoozan (talk) 14:32, 25 March 2023 (UTC)
- Cheers! Vininn126 (talk) 15:58, 25 March 2023 (UTC)
- Done JeffDoozan (talk) 14:32, 25 March 2023 (UTC)
- I'd say skip gender, and convert col3 to col-auto. Vininn126 (talk) 21:14, 24 March 2023 (UTC)
- @Solvyn, Vininn126, I found a bug in the bot code that was causing it to under-report the sections it was skipping and overestimate slightly the pages it was going to fix. It will still going to fix most of the same sections (down to 10,250 now), but now has identified 5000 sections to be skipped. Most of the sections being skipped already use
- I have no objections. Just do it. Solvyn (talk) 16:34, 24 March 2023 (UTC)
- I updated the list of fixes/errors to exclude any page that starts or ends with "-". Let me know if I should run the bot on this list or if you see anything else that needs addressed. JeffDoozan (talk) 16:12, 24 March 2023 (UTC)
- Yeah, prefixes, suffixes, and interfixes should be skipped because those use templates like
- The list of fixes/errors is generated from the March 20 export. At that time, domácí was using [[term]] but since then Benwing fixed it to use
- @JeffDoozan Interesting, I'm seeing a lot of lists of
- @Vininn126: I can have the bot convert ~11,000 sections that contain just a list of lines formatted
- @JeffDoozan Basically the same thing as with Polish, as @Solvyn and I were discussing. I think it would be better to use
- @Vininn126, Solvyn Based on this discussion I created a Czech adjective module at Module:User:Benwing2/cs-adjective. I started with adjectives because they are the easiest to handle in Slavic languages of the three of nouns, verbs and adjectives. You can see tests in User:Benwing2/test-cs-adecl. The code is based on the Ukrainian adjective module and, like that module, puts feminine after neuter to take advantage of masculine-neuter syncretization. I also omitted the vocative since it's always the same as the nominative. Let me know if you disagree with either of these choices. Benwing2 (talk) 01:17, 24 March 2023 (UTC)
- Sounds good to me! Vininn126 (talk) 10:12, 24 March 2023 (UTC)
- Amazing work! Solvyn (talk) 11:54, 24 March 2023 (UTC)
- It's a big change. I am glad the short forms are now listed. The feminine forms being put aside seem weird to me. I think all Czechs would be used to put the feminine forms between the masculine and neutral, like in naming pronouns (on, ona, ono, oni, ony, ona). But I could get used to it. Nevertheless, there is a big mistake – the feminine and neutral plural forms are incorrect, hard adjectives feminine plurals should end "-é" and neutral plurals "á". Please, fix it! Zhnka (talk) 07:55, 25 March 2023 (UTC)
- The same counts for "ten" (feminine plural = ty, neutral plural = ta). Zhnka (talk) 08:01, 25 March 2023 (UTC)
- @Zhnka Thanks for pointing that out, I have fixed it. @Vininn126, Solvyn What do you think about the gender order? (Note, if I could redo things I'd actually totally reorder the cases something like the way it is in Sanskrit: nom voc acc gen loc dat ins, to take advantage of various syncretizations. The current order nom gen dat acc etc. is based on Latin and makes little sense for Slavic languages. But I think that would be far too confusing vs. existing Slavic-language resources.) Benwing2 (talk) 17:56, 25 March 2023 (UTC)
- I think the order of cases should stick to western tradition here. I think feminine should come before neuter. Vininn126 (talk) 18:24, 25 March 2023 (UTC)
- Agree with Vininn126. Solvyn (talk) 02:10, 26 March 2023 (UTC)
- Thanks for fixing it. You just forgot to fix the short forms. Please, fix them as well. Zhnka (talk) 15:20, 26 March 2023 (UTC)
- @Zhnka Can you verify now? Benwing2 (talk) 00:35, 28 March 2023 (UTC)
- The singular accusative feminine and neutral are switched. Feminine accusative of "ten" is "tu" and in hard adjectives the feminine accusative forms end in "-ou". Zhnka (talk) 10:14, 28 March 2023 (UTC)
- I've
- fixed it. Zhnka (talk) 04:05, 30 March 2023 (UTC)
- The singular accusative feminine and neutral are switched. Feminine accusative of "ten" is "tu" and in hard adjectives the feminine accusative forms end in "-ou". Zhnka (talk) 10:14, 28 March 2023 (UTC)
- @Zhnka Can you verify now? Benwing2 (talk) 00:35, 28 March 2023 (UTC)
- I think the order of cases should stick to western tradition here. I think feminine should come before neuter. Vininn126 (talk) 18:24, 25 March 2023 (UTC)
- @Zhnka Thanks for pointing that out, I have fixed it. @Vininn126, Solvyn What do you think about the gender order? (Note, if I could redo things I'd actually totally reorder the cases something like the way it is in Sanskrit: nom voc acc gen loc dat ins, to take advantage of various syncretizations. The current order nom gen dat acc etc. is based on Latin and makes little sense for Slavic languages. But I think that would be far too confusing vs. existing Slavic-language resources.) Benwing2 (talk) 17:56, 25 March 2023 (UTC)
- The same counts for "ten" (feminine plural = ty, neutral plural = ta). Zhnka (talk) 08:01, 25 March 2023 (UTC)
- @Vininn126: I didn't save it, which is either a sign that I though it would be easy to recreate or that I thought it wouldn't be useful again. Let me know exactly what needs to be done and I can take a look at it. JeffDoozan (talk) 00:42, 24 March 2023 (UTC)
- We can look into that. @JeffDoozan Do you still have the code that converted Polish derived related terms to Czech? Do you think you could modify it for Czech, maybe using
Czech noun resources
[edit]@Vininn126, Solvyn I am trying to create a Czech noun module based on the Ukrainian one, because there seem to be a lot of similarities between the two. I found a pretty good grammar in English by Laura Janda and Charles Townsend but ultimately I need better resources. For example, this grammar doesn't go into a lot of detail about stem alternations in nouns. Can you point me to (a) online dictionaries that list the declension of Czech nouns, (b) ideally any grammars (even if written in Czech) that do go into this sort of detail? Benwing2 (talk) 04:00, 25 March 2023 (UTC)
- One other question, is it ultimately necessary to make an animacy distinction between people and animals? This turned out to be necessary in Ukrainian and Belarusian, and I gather it's necessary in Polish (although currently I think Polish uses the
an
code for people, when it should usepr
). Benwing2 (talk) 04:07, 25 March 2023 (UTC)- For declension and stem alternation of Czech nouns, you can visit IJP for more information. It's unnecessary to make an animacy distinction between people and animals. Solvyn (talk) 05:44, 25 March 2023 (UTC)
- @Benwing2 FWIW Polish does have a 3 way animacy distinction which we use, it's just that some people are treated as
an
. I have found a book Česká morfologie a korpusy which claims a few things (page 20): indeterminable, feminine, feminine or neuter (not masculine), inanimate masculine, animate masculine, neuter, feminine singulare tantum, neuter plural, animate masculine (the last few seem to refer to substantivized adjectives as well as virile/non-virile as in Polish), and one referring to "any gender", I do not know what that entails. It seems to be similar to Polish, just without the 3-way distinction in animacy. Vininn126 (talk) 08:04, 25 March 2023 (UTC)- @Vininn126, Solvyn Thanks. Czech declension seems really messy in terms of alternative forms for nom pl, loc sg, etc. I will do my best to set sensible defaults but it looks like a lot of nouns will need overrides. I don't know if Polish is similarly messy; Russian is a bit less so, although in Russian and other East Slavic languages you have the whole business of accent class to contend with. Benwing2 (talk) 04:59, 26 March 2023 (UTC)
- I'd say Polish declension is fairly tidy, as far as the module works and everything there isn't any need for overhauls, except for using modern shortcuts and the like, but that's a small thing. Vininn126 (talk) 08:16, 26 March 2023 (UTC)
- Can you add more customized parameters to replace the defaults?
{{cs-decl-noun-auto}}
is less friendly to customized parameters, while{{cs-decl-noun}}
is completely customized. (I don't know if this is the right way to express it.) Solvyn (talk) 03:16, 27 March 2023 (UTC)- @Solvyn The new
{{cs-ndecl}}
will have sophisticated defaults but also allow you to override any individual case/number combination. It will work something like{{uk-ndecl}}
; see the docs for that template. The only thing you will need to specify in all cases is the gender. You also need to specify the animacy if the noun is masculine animate, and the number if the noun is singular-only or plural-only. There will be parameters to explicitly specify the reducibility (i.e. whether there is an e/ě vs. no vowel alternation in the stem), although that will have defaults according to the rules given in IJP. For example, syn might use a declension{{cs-ndecl|<m.an.nomplové>}}
, which specifies the genderm
, the animacyan
, and overrides the nominative plural to end in -ové (you could also write{{cs-ndecl|<m.an.nompl:synové>}}
and spell out the nominative plural entirely; if the case/number specnompl
is followed by a colon, a full form comes after it, otherwise just the ending). The syntax also lets you specify the declension of multiword expressions such as atomové číslo, which might be indicated as{{cs-ndecl|atomové<+> číslo<n.loce>}}
; here,+
means to decline the word atomové adjectivally (and it inherits its gender from číslo), while for číslo I think the locative singular has to be overridden (loce
means "locative singular ends in -e") because I don't think the locative in -e is predictable. Note that the reducible alternation in the číslo genitive plural čísel is handled by default. Benwing2 (talk) 04:15, 27 March 2023 (UTC)- číslo is missing alt. locative form "číslu" (https://prirucka.ujc.cas.cz/en/?slovo=%C4%8D%C3%ADslo#bref2) Anatoli T. (обсудить/вклад) 04:29, 27 March 2023 (UTC)
- @Atitarev Thanks, I think in that case the locative override is unnecessary because I think -e/-u is the default (this is what Janda and Townsend p. 19 says). Benwing2 (talk) 04:31, 27 March 2023 (UTC)
- Thanks. I was just looking at the current table where it's missing. Same with město (all manual params). Anatoli T. (обсудить/вклад) 04:55, 27 March 2023 (UTC)
- @Atitarev Thanks, I think in that case the locative override is unnecessary because I think -e/-u is the default (this is what Janda and Townsend p. 19 says). Benwing2 (talk) 04:31, 27 March 2023 (UTC)
- číslo is missing alt. locative form "číslu" (https://prirucka.ujc.cas.cz/en/?slovo=%C4%8D%C3%ADslo#bref2) Anatoli T. (обсудить/вклад) 04:29, 27 March 2023 (UTC)
- @Solvyn The new
- @Vininn126, Solvyn Thanks. Czech declension seems really messy in terms of alternative forms for nom pl, loc sg, etc. I will do my best to set sensible defaults but it looks like a lot of nouns will need overrides. I don't know if Polish is similarly messy; Russian is a bit less so, although in Russian and other East Slavic languages you have the whole business of accent class to contend with. Benwing2 (talk) 04:59, 26 March 2023 (UTC)
- @Hergilei You should read this conversation, as a regular Czech editor! Namely there will be new modules and such. Vininn126 (talk) 18:12, 27 March 2023 (UTC)
- @Vininn126, Solvyn, Hergilei, Atitarev See User:Benwing2/test-cs-ndecl. It's stilll rough and needs a good deal of work but as a first approximation it works. Benwing2 (talk) 20:05, 27 March 2023 (UTC)
- BTW I notice that bratr has a separate plural pattern bratří etc. that follows the neuter plural -í declension. Is this noun actually neuter when declined in this fashion, e.g. do you say dva bratří "two brethren" or dvě bratří? There seem to be other nouns that switch to a different gender's pattern in the plural, e.g. kníže, hrabě; do these nouns actually switch gender in the plural or do they simply adopt another gender's pattern? Benwing2 (talk) 20:09, 27 March 2023 (UTC)
- A good way may also be to look at adjectives and verbs - i.e. Polish mężczyzna is masculine in syntax, but the endings are feminine. (However, you get certain (Sprachbund) effects where the word for "girl" is grammatically neuter in many languages). Vininn126 (talk) 20:12, 27 March 2023 (UTC)
- @Benwing2, I don't know if I can follow @Vininn126's explanation but there's a note on "bratří": "vedle tvaru bratrů se užívá v určitých spojeních tvar bratří, např. bratří Čapků, českých bratří (tento tvar je reliktem starého skloňování)". It must be "dva bratří" (my assumption) but this form is only used in certain combinations according to https://prirucka.ujc.cas.cz/en/?slovo=bratr#bref5. Anatoli T. (обсудить/вклад) 22:47, 27 March 2023 (UTC)
- The declension of hrabě and kníže is rare for masculine nouns. They do become neuter nouns in their plural form. Similarly, the neuter noun dítě becomes a feminine noun in its plural form. Solvyn (talk) 16:28, 28 March 2023 (UTC)
- kuře also belongs there but it's inanimate (accusative sg = nominative sg). Anatoli T. (обсудить/вклад) 05:57, 30 March 2023 (UTC)
- A good way may also be to look at adjectives and verbs - i.e. Polish mężczyzna is masculine in syntax, but the endings are feminine. (However, you get certain (Sprachbund) effects where the word for "girl" is grammatically neuter in many languages). Vininn126 (talk) 20:12, 27 March 2023 (UTC)
- BTW I notice that bratr has a separate plural pattern bratří etc. that follows the neuter plural -í declension. Is this noun actually neuter when declined in this fashion, e.g. do you say dva bratří "two brethren" or dvě bratří? There seem to be other nouns that switch to a different gender's pattern in the plural, e.g. kníže, hrabě; do these nouns actually switch gender in the plural or do they simply adopt another gender's pattern? Benwing2 (talk) 20:09, 27 March 2023 (UTC)
- @Vininn126, Solvyn, Hergilei, Atitarev See User:Benwing2/test-cs-ndecl. It's stilll rough and needs a good deal of work but as a first approximation it works. Benwing2 (talk) 20:05, 27 March 2023 (UTC)
Conjugation table
[edit]@Solvyn
@Benwing2
@User:Vininn126
Now that we've started introducing big changes, I suggest expanding the {{cs-conj-forms}}
. I suggest including these four forms in this template:
1) both possible infinitives (dělat ~ dělati, péct ~ péci)
2) verbal noun (dělání, pečení)
3) active verbal adjective (dělající, pečící/pekoucí)
4) passive verbal adjective (dělaný, pečený)
I think these forms should be included somewhere and it would certainly be nice if the verbs could be found by searching these forms within pages. I will put it there if you agree. But I understand the current conjugation table is beautiful for its simplicity. I'd like to know your opinion? Zhnka (talk) 07:47, 31 March 2023 (UTC)
- @Zhnka I agree with you. We should include all parts of the inflection. This is how the Ukrainian and Russian tables work, for example. Benwing2 (talk) 08:04, 31 March 2023 (UTC)
- @Zhnka, @Benwing2 Note that verbal nouns are not part of the inflection table for Russian or Ukrainian. It can be provided as
|vn=
in the headword. They are not always easily derivable. Anatoli T. (обсудить/вклад) 08:10, 31 March 2023 (UTC)- @Atitarev I agree with you about verbal nouns not being part of the inflection in Ukrainian and Russian; this is similar to English. But apparently they *are* part of the Bulgarian and Czech verb inflection. (What I meant by "all parts of the inflection" are those that are considered inflections instead of derivations, on a language-by-language level.) Benwing2 (talk) 08:14, 31 March 2023 (UTC)
- The nominative form of the verbal noun is usually included in Polish declension tables. Some dictionaries like WSJP include all forms, whereas some like Doroszewski's dictionary give them a separate lemma, and he only lists enough information in the verb to tell you its type and if it's transitive or not. I like having at least the nominative there. Vininn126 (talk) 08:29, 31 March 2023 (UTC)
- @Atitarev I agree with you about verbal nouns not being part of the inflection in Ukrainian and Russian; this is similar to English. But apparently they *are* part of the Bulgarian and Czech verb inflection. (What I meant by "all parts of the inflection" are those that are considered inflections instead of derivations, on a language-by-language level.) Benwing2 (talk) 08:14, 31 March 2023 (UTC)
- @Zhnka, @Benwing2 Note that verbal nouns are not part of the inflection table for Russian or Ukrainian. It can be provided as
- Yep those definitely need covered. Vininn126 (talk) 08:07, 31 March 2023 (UTC)
- Btw @Benwing2 if you need help implementing certain changes you should let me (and perhaps others) know. Vininn126 (talk) 08:27, 31 March 2023 (UTC)
- @Zhnka I agree with your opinion. Solvyn (talk) 09:57, 31 March 2023 (UTC)
- @Vininn126 Thanks. I won't be looking into Czech verbs until after I finish nouns and do some work on Persian. They probably need a total rewrite. As with nouns they are half-implemented (if that ...) and in template code only. If you want to make changes to the template code in the meantime, please feel free :) ... Benwing2 (talk) 10:26, 31 March 2023 (UTC)
- I will try to add the forms which I mentioned into the template
{{cs-conj-forms}}
, but I don't fully agree with its being completely rewritten. I think mentioning only the forms looks better arranged and clearer than mentioning all possible tenses, which can be simply composed from these forms. As far as I know, there used to be a different template which mentioned all those tenses, yet it was abandoned and replaced by the currently used{{cs-conj-forms}}
in the course of time. Zhnka (talk) 10:42, 31 March 2023 (UTC)- @Zhnka I think you misunderstood me. I am suggesting that the code underlying
{{cs-conj-forms}}
needs total rewriting, not that the table itself needs restructuring beyond what we have already discussed here. I agree there's no particular point in mentioning all the composed tenses, since they're generally easy to derive. Even for a language like Bulgarian with amazingly complex composed tenses, we normally don't enumerate all the possibilities, e.g. in бия (bija, “to beat”), where the composed tenses are mentioned but just with text indicating how to derive them. For Bulgarian, the code to enumerate all possibilities *IS* present and is used in a few verbs such as съм (sǎm, “to be”) as an example, again due to the complexity of these tenses. Czech does not have comparable craziness so I don't think there's any need to even write the code to enumerate all the composed tenses. Benwing2 (talk) 02:04, 1 April 2023 (UTC)- I added those forms as cleverly as I was able to. Now it's up to you how you plan to rewrite the template. Zhnka (talk) 05:39, 7 April 2023 (UTC)
- @Benwing2 Can we make these tables a type of blue? Twould be nice! Vininn126 (talk) 11:43, 15 May 2023 (UTC)
- I added those forms as cleverly as I was able to. Now it's up to you how you plan to rewrite the template. Zhnka (talk) 05:39, 7 April 2023 (UTC)
- @Zhnka I think you misunderstood me. I am suggesting that the code underlying
- @Solvyn @Benwing2 @User:Vininn126 @User:Zhnka @User:Atitarev: I do not think this was a good idea, because all verbal noun, active adjective and passive adjective are usually considered to be different parts of speech (kinds of nouns or adjectives) derived from verbs, not conjugation forms of verbs. Conjugation form of a verb is still a verb. If it is a noun, it cannot be a form of a verb. So, if still possible, I suggest removing them. However, if you decide to keep them, it should be fixed, because e. g. "nakupovat" misses the passive adjective "nakupovaný". --Jan Kameníček (talk) 16:36, 15 May 2023 (UTC)
- @Jan.Kamenicek In most Slavic monolingual dictionaries I see, i.e. WSJP, gerunds and participles are given under the verb declination, and this is in line with other practices for other Slavic languages on Wiktionary. Vininn126 (talk) 16:38, 15 May 2023 (UTC)
- Well, I cannot speak for other Slavic languages, I can speak only for Czech. While I can imagine that some tables include them, I cannot imagine that tables called "conjugation" include them, because these forms are not a result of conjugation. BTW: I forgot to mention that "nakupovat" misses also the verbal noun "nakupování". --Jan Kameníček (talk) 16:44, 15 May 2023 (UTC)
- One more reason, as all active and passive adjectives and verbal nouns are adjectives and nouns, they have their own declension forms. Verbal forms, unlike adjectives and nouns, cannot be declined. --Jan Kameníček (talk) 16:51, 15 May 2023 (UTC)
- Yes, this is normal. Most often that form is at least linked, and then on it's own page has the appropriate declination table. It's usually still considered a form. Gerunds are a bit more up for debate, but most people do not consider most participles lemmas. Vininn126 (talk) 16:53, 15 May 2023 (UTC)
- Not sure what is normal... Declining Czech verbs is not normal. If something is declined, it is not a verb form, and so it looks weird when listed among verb forms. --Jan Kameníček (talk) 17:22, 15 May 2023 (UTC)
- @Jan.Kamenicek Nobody is saying that e.g. nakupování is a verb, but it is nevertheless still a verbal form - it's derived from a verb and generally speaking you can form a gerund from every verb. It is often included in dictionaries - see [9]. --TomášPolonec (talk) 18:13, 15 May 2023 (UTC)
- @Jan.Kamenicek I agree here with User:TomášPolonec and User:Vininn126. It is normal in Wiktionary and other dictionaries to include participles, gerunds and the like in the conjugation table for verbs; this is done for all languages, Slavic and non-Slavic. This is because they are conceptually part of the paradigm of the verb and take part in syntactic verbal constructions; for example, the passive participle is used in forming the passive voice, the l-participle is used in forming the past tense, the Czech transgressives are the verbal heads of adverbial clauses, etc. Whether the verbal noun is included depends on how predictable it is from the verb and how closely it is felt to be part of the paradigm; it is normally included e.g. in Czech, Arabic and Irish but not Russian, English or Greek. In any case it's not always possible to cleanly separate declension and conjugation, since for example the Czech past tense has elements of both (it has both person and gender, for example). Benwing2 (talk) 18:36, 15 May 2023 (UTC)
- @Benwing2: In Czech the difference between declension and conjugation is very clear and cannot be confused. I do not object against listing verbal nouns as derived from the verbs, I object against listing them under the title conjugation. Wiktionary is the only dictionary that does it, you cannot find it in any linguistic publication on Czech language. It would be enough if they were listed in the derived terms section. --Jan Kameníček (talk) 22:13, 15 May 2023 (UTC)
- @TomášPolonec: (with edit conflict) 1) I hear for the first time that a noun or an adjective can be a verbal form, I have always understood verbal forms to be forms of verbs. Although English gerunds are generally considered forms of verbs, Czech verbal nouns are generally considered nouns derived from verbs. 2) I did not say that dictionaries do not list them, in fact I said that I can imagine them listed in some kinds of tables. What I objected against was that the current table suggests that these forms are a result of conjugation, while no Czech nouns are a result of conjugation. The difference is that while conjugation is done by verbal endings, the aformentioned kinds of adjectives are formed by derivational suffixes. By the way, the problematics of verbal adjectives is more complex and there are more verbal adjectives than those currently added: for čistit we have listed čištěný and čistící, omitting čisticí, čistivý, and čistitelný. So my suggestion is either putting those among derived tems, or creating a separate table for them, which would not suggest that they are a result of conjugation. --Jan Kameníček (talk) 18:46, 15 May 2023 (UTC)
- @Jan.Kamenicek Sure, gerunds are not a result of conjugation, but having a separate table just to have these few things separate would not be very practical from the users' point of view. It's just a matter of convenience and having all these forms in one place. If the dictionary that I linked can list the verbal noun form among the conjugated forms, we can do it as well. I have found an encyclopedic article that addresses this issue: [10], these forms are called non-finite verbal forms. --TomášPolonec (talk) 19:19, 15 May 2023 (UTC)
- @TomášPolonec Well, the difference is that the dictionary does not title the list "Conjugation" as the Wiktionary table does. While linguists may just raise their brow upon seeing it, many ordinary readers may be puzzled by that. I would not like to see Wiktionary being given as a source of information that derivation of adjectives from verbs is a kind of conjugation. Besides, as I wrote above, it is not clear why some verbal adjectives are listed while others are not. Why is "čistící" given preference to "čisticí" (both being verbal adjectives with different meaning)? --Jan Kameníček (talk) 22:13, 15 May 2023 (UTC)
- Depends on the semantic meaning. "Interesting" has become a "full-fledged" adjective, whereas "reading" has not. And if you ask anyone about a gerund that has not become a full-fledged noun, they'll tell you (incorrectly!) that it's a verb. Only some verbal nouns become lexically different. Vininn126 (talk) 22:16, 15 May 2023 (UTC)
- @Jan.Kamenicek The reason is that čištěný and čistící are participles ([11]), which are non-finite verbal forms, while the others are simply adjectives derived with different suffixes. --TomášPolonec (talk) 22:21, 15 May 2023 (UTC)
- I do not think English language examples are useful for linguistics on Czech language. I am afraid that in all semantic meanings the words derived from verbs e.g. with "-icí" or "-ící" are considered adjectives and the words derived verbs with "-ní" or "-tí" are typically considered nouns in serious linguistic publications on Czech language, and not forms of verbs, and their formation is never called conjugation, irrespective of semantic meaning. It is also not typical to call Czech adjectives "participles". Linguistics on Czech language usually distinguishes past (or active) participles like čistil and passive participles like čistěn, while "čistící" and "čištěný" are considered to be adjectives. (BTW: should we consider "čistící" to be a participle, why is it not listed among the participles in the conjugation table then?) Wiktionary tables on Czech verbs should not diverge from the prevailing linguistic practice studying and explaining Czech language. --Jan Kameníček (talk) 23:01, 15 May 2023 (UTC)
- @Jan.Kamenicek In the (Czech) source that I have linked for you, you can see that all of these forms are called participles (see the sections "‑cí‑ové participium verbální" (čistící) and "‑n‑/‑t‑ové participium verbální" (short form čištěn, long form čištěný). The article itself cites different sources. --TomášPolonec (talk) 23:29, 15 May 2023 (UTC)
- I do not think English language examples are useful for linguistics on Czech language. I am afraid that in all semantic meanings the words derived from verbs e.g. with "-icí" or "-ící" are considered adjectives and the words derived verbs with "-ní" or "-tí" are typically considered nouns in serious linguistic publications on Czech language, and not forms of verbs, and their formation is never called conjugation, irrespective of semantic meaning. It is also not typical to call Czech adjectives "participles". Linguistics on Czech language usually distinguishes past (or active) participles like čistil and passive participles like čistěn, while "čistící" and "čištěný" are considered to be adjectives. (BTW: should we consider "čistící" to be a participle, why is it not listed among the participles in the conjugation table then?) Wiktionary tables on Czech verbs should not diverge from the prevailing linguistic practice studying and explaining Czech language. --Jan Kameníček (talk) 23:01, 15 May 2023 (UTC)
- @TomášPolonec Well, the difference is that the dictionary does not title the list "Conjugation" as the Wiktionary table does. While linguists may just raise their brow upon seeing it, many ordinary readers may be puzzled by that. I would not like to see Wiktionary being given as a source of information that derivation of adjectives from verbs is a kind of conjugation. Besides, as I wrote above, it is not clear why some verbal adjectives are listed while others are not. Why is "čistící" given preference to "čisticí" (both being verbal adjectives with different meaning)? --Jan Kameníček (talk) 22:13, 15 May 2023 (UTC)
- @Jan.Kamenicek Sure, gerunds are not a result of conjugation, but having a separate table just to have these few things separate would not be very practical from the users' point of view. It's just a matter of convenience and having all these forms in one place. If the dictionary that I linked can list the verbal noun form among the conjugated forms, we can do it as well. I have found an encyclopedic article that addresses this issue: [10], these forms are called non-finite verbal forms. --TomášPolonec (talk) 19:19, 15 May 2023 (UTC)
- Not sure what is normal... Declining Czech verbs is not normal. If something is declined, it is not a verb form, and so it looks weird when listed among verb forms. --Jan Kameníček (talk) 17:22, 15 May 2023 (UTC)
- Yes, this is normal. Most often that form is at least linked, and then on it's own page has the appropriate declination table. It's usually still considered a form. Gerunds are a bit more up for debate, but most people do not consider most participles lemmas. Vininn126 (talk) 16:53, 15 May 2023 (UTC)
- @Jan.Kamenicek In most Slavic monolingual dictionaries I see, i.e. WSJP, gerunds and participles are given under the verb declination, and this is in line with other practices for other Slavic languages on Wiktionary. Vininn126 (talk) 16:38, 15 May 2023 (UTC)
- I will try to add the forms which I mentioned into the template
- @Vininn126 Thanks. I won't be looking into Czech verbs until after I finish nouns and do some work on Persian. They probably need a total rewrite. As with nouns they are half-implemented (if that ...) and in template code only. If you want to make changes to the template code in the meantime, please feel free :) ... Benwing2 (talk) 10:26, 31 March 2023 (UTC)
- There were many posts in this discussion. I personally support the inclusion of participles, and the verbal noun, if it's feasible. I would change the order of sections, so that participles stand alone. And as with the Russian де́лать (délatʹ) (equivalent of dělat), a note could be added like "Note: For declension of participles, see their entries. Adverbial participles are indeclinable." I don't understand the opposition.
- Participles are included in Polish, Bulgarian, Russian, Ukrainian, Belarusian, also French, German, Arabic, etc conjugation tables. Czech editors must have gotten used to reduced conjugation tables. There is no reason users should be denied this information. Adding them to derived terms sections won't be be consistently done. Anatoli T. (обсудить/вклад) 23:24, 15 May 2023 (UTC)
- @Jan.Kamenicek čistící and (vy)čistivší are explicitly called the present active and past active participles in Janda and Townsend p. 37 [12], which is an English-language grammar of Czech. čištěný is not specifically mentioned but would then logically be the (long) past passive participle. Participles are strange in that they are verbal forms that are declined as adjectives; this issue exists for every language and I don't think we should deviate from the Wiktionary practice of other languages. I have no issue rearranging the order of non-finite forms and listing them below the finite forms, or whatever other people (e.g. User:Atitarev) think is best; I have just listed them above for convenience so that the infinitive, participles and verbal noun all come together. I am not sure what čisticí, čistivý and čistitelný mean but I have never seen them listed in verbal conjugation tables; as User:Vininn126 notes they must be derivational adjectives. Benwing2 (talk) 23:34, 15 May 2023 (UTC)
- @Benwing2: 1) I did not say that nobody considers them participles, I said it is not typical to consider them participles. I would even say it is quite exceptional among the tons of literature on Czech linguistics, which most commonly labels them as adjectives. 2) I also said that their formation is not conjugation and so should not be titled as conjugation. 3) Active past participles (like čistil) are listed here because they can be conjugated. Words like "čistící" cannot be conjugated, but they can be declined as adjectives. Only words which can be conjugated belong into conjugation tables. 4) BTW: If you consider them participles, why are they not listed among the participles in the table?
Despite the fact that none of these arguments was disproved, they were simply labeled as non-sense by Vininn126 below, which is partly funny and partly sad. Having understood that arguments have no weight here because the real reason is that people here simply want to have it the way they want to have it, I am leaving so that you can continue discussing much more important things like "can we make the tables blue?" :-D Please, do not ping me here anymore. Cheers, --Jan Kameníček (talk) 13:37, 20 May 2023 (UTC)- OK, apologies for the disagreement. I will implement what User:Atitarev proposes, which is to move the participles and other non-finite forms to the bottom of the table. Hope that is OK with you. Benwing2 (talk) 17:50, 20 May 2023 (UTC)
- @Benwing2: 1) I did not say that nobody considers them participles, I said it is not typical to consider them participles. I would even say it is quite exceptional among the tons of literature on Czech linguistics, which most commonly labels them as adjectives. 2) I also said that their formation is not conjugation and so should not be titled as conjugation. 3) Active past participles (like čistil) are listed here because they can be conjugated. Words like "čistící" cannot be conjugated, but they can be declined as adjectives. Only words which can be conjugated belong into conjugation tables. 4) BTW: If you consider them participles, why are they not listed among the participles in the table?
- @Benwing2 All of this nonsense aside, can we make the tables blue? lol. Vininn126 (talk) 23:37, 15 May 2023 (UTC)
- @Vininn126 Which tables? do you mean the existing conjugation tables? Sure, I can do that although my hope is to replace them with the new tables before too long. Benwing2 (talk) 23:44, 15 May 2023 (UTC)
- Oh! I thought I saw one of the new ones that was gray, but I must have been mistaken. Vininn126 (talk) 23:45, 15 May 2023 (UTC)
- @Vininn126 Which tables? do you mean the existing conjugation tables? Sure, I can do that although my hope is to replace them with the new tables before too long. Benwing2 (talk) 23:44, 15 May 2023 (UTC)
- @Benwing2: Thanks. Re: "rearranging the order of non-finite forms", perhaps, a "Participles" section and a footnote, as in the East Slavic verbs will make it more acceptable. Anatoli T. (обсудить/вклад) 23:47, 15 May 2023 (UTC)
- @Atitarev What about "Non-finite forms" instead of "Participles", including also verbal nouns (and transgressives, which are sort-of participles)? Benwing2 (talk) 00:13, 16 May 2023 (UTC)
- @Benwing2: I would move them all up (or all the way down), separate from verbal forms. "l-participle" is the only true verb form in the top section, IMO. dělal is both the l-participle and the 3rd person past tense form, as in "(he) did" Anatoli T. (обсудить/вклад) 00:33, 16 May 2023 (UTC)
- @Atitarev What about "Non-finite forms" instead of "Participles", including also verbal nouns (and transgressives, which are sort-of participles)? Benwing2 (talk) 00:13, 16 May 2023 (UTC)
- I realise that often there are other adjectives which can be sometimes derived. I think those are these three:
- 1) from masculine past transgressive and the suffix -ý, e. g. “spavý” (“sleepy”), “hravý” (“playful”)
- 2) from masculine past participle and the suffix -ý, e. g. „vzniklý” (= “vzniknuvší”, “which has arisen”), “koplý” (= “kopnutý”, “kicked”)
- 3) from infinitive and suffixes like -cí or -elný, etc.
- All of them can be derived from many verbs, but aren't derived from all verbs. As for the second one, it is sometimes used as a substitute for the past active participle (as the past active participle often sounds dated) or for the passive adjective and sometimes sounds a little informal.
- But the active adjectives ending in -cí or -vší and the passive adjectives can be derived from all verbs. I thought it would be easier if they were mentioned in the template. Zhnka (talk) 09:15, 17 May 2023 (UTC)
- @Zhnka: The #1 are derived from verbs but not verb forms. #2 depends first of all on transitivity, no passive participles from intransitive verbs and some missing forms, which are dependent on the verb class, can have a “-“ where the form can’t be created. A normal practice with current verb modules. Anatoli T. (обсудить/вклад) 09:27, 17 May 2023 (UTC)
- @Jan.Kamenicek čistící and (vy)čistivší are explicitly called the present active and past active participles in Janda and Townsend p. 37 [12], which is an English-language grammar of Czech. čištěný is not specifically mentioned but would then logically be the (long) past passive participle. Participles are strange in that they are verbal forms that are declined as adjectives; this issue exists for every language and I don't think we should deviate from the Wiktionary practice of other languages. I have no issue rearranging the order of non-finite forms and listing them below the finite forms, or whatever other people (e.g. User:Atitarev) think is best; I have just listed them above for convenience so that the infinitive, participles and verbal noun all come together. I am not sure what čisticí, čistivý and čistitelný mean but I have never seen them listed in verbal conjugation tables; as User:Vininn126 notes they must be derivational adjectives. Benwing2 (talk) 23:34, 15 May 2023 (UTC)
status udpate
[edit]@Solvyn, Vininn126, Atitarev I cleaned up the Czech reference templates. Czech nouns are getting closer but I'm discovering so many edge cases, e.g all the weird foreign-term declensions. (Foreign terms are mostly indeclinable in Russian and Ukrainian so this issue never came up.) At a certain point I will push the code live so it can be used, but before that I want to make sure there aren't major interface changes needed. In particular, the defaults are still in flux, which significantly influences how declensions are coded in the {{cs-ndecl}}
template. Benwing2 (talk) 02:12, 1 April 2023 (UTC)
- @Benwing2 Perhaps edge cases are best left to manual declension, as we sometimes do in Polish. Vininn126 (talk) 08:08, 1 April 2023 (UTC)
- @Vininn126 I believe in automating as much as possible because otherwise you're more likely to get errors as people who don't completely know what they're doing try to enter in declensions. (Not to say errors can't happen in the best of circumstances but you can minimize them.) In any case I'm basically done with masculines and feminines and addressing some remaining issues on neuters now. Probably will push live tomorrow, if not the next day. Benwing2 (talk) 08:46, 1 April 2023 (UTC)
- @Benwing2 Do you think a pronunciation module would be next? I have a sneaking suspicion this should be easier, because it should just need to incorporate the IPA module and I think syllabification will be easier because Czech has considerably fewer digraphs than Polish, and I think it could be modelled on the Polish module. Czech has first syllable stress, so rhymes should be easy (I don't know if there are exceptions, but the module should be able to detect respellings). I think we could even do like with Polish and replace current instances of cs-IPA. Vininn126 (talk) 14:30, 1 April 2023 (UTC)
- @Vininn126 That should be possible, although I need to finish the changes to the Persian IPA module as well. Benwing2 (talk) 17:18, 1 April 2023 (UTC)
- @Solvyn, Vininn126, Atitarev The template is now live. There are some things still to do involving certain foreign declensions; nouns with 'mixed' declension like kotel "cauldron", kámen "stone"; pluralia tantum; nouns that change gender in the plural; etc. But I don't think the basic format of the indicators will change. Benwing2 (talk) 19:23, 2 April 2023 (UTC)
- @Solvyn, Vininn126, Atitarev, Hergilei Also beware, a lot of the existing nouns whose declension is set using
{{cs-decl-noun-auto}}
have wrong declensions. User:Hergilei I think you've been particularly active adding such declensions. Please (a) switch right away to the new{{cs-ndecl}}
(see User:Benwing2/test-cs-ndecl for examples), (b) be more careful checking the results of the declension output against IJP ([13]). I will be running a script to auto-convert all the existing uses of{{cs-decl-noun-auto}}
; normally I would check the output of the old template against the new output to make sure nothing changes, but I don't think that's possible here due to the wealth of existing mistakes. Benwing2 (talk) 19:35, 2 April 2023 (UTC)- Also, if you are in doubt about how to code things up using
{{cs-ndecl}}
, please do *NOT* guess unless you're willing to carefully check the results against IJP; instead, use{{cs-ijpdecl}}
to manually enter in the declension, and/or ask me how to proceed. Thanks! Benwing2 (talk) 19:37, 2 April 2023 (UTC)- @Solvyn, Vininn126, Atitarev, Hergilei FYI I semi-manually converted all the masculine animate nouns going through Template:cs-decl-noun-auto and deleted the 19 underlying declension-specific templates used by this template. I did it semi-manually because of the large number of declension mistakes, both inherent to the tables and due to sloppy use of the templates (e.g. treating velars in -ch/h/g the same as nouns in -n, not bothering with -ové vs. -é vs. -i differences in the nom pl). Benwing2 (talk) 09:26, 3 April 2023 (UTC)
- Cheers. I hope we'll be able to better represent Czech declension because of this. Kinda sad such a major language was represented so poorly. Vininn126 (talk) 09:36, 3 April 2023 (UTC)
- @Benwing2 Sorry for not getting back to you in time, I was very busy this weekend. Thank you for all your hard work. Solvyn (talk) 05:06, 4 April 2023 (UTC)
- @Solvyn, Vininn126, Atitarev, Hergilei FYI I semi-manually converted all the masculine animate nouns going through Template:cs-decl-noun-auto and deleted the 19 underlying declension-specific templates used by this template. I did it semi-manually because of the large number of declension mistakes, both inherent to the tables and due to sloppy use of the templates (e.g. treating velars in -ch/h/g the same as nouns in -n, not bothering with -ové vs. -é vs. -i differences in the nom pl). Benwing2 (talk) 09:26, 3 April 2023 (UTC)
- Also, if you are in doubt about how to code things up using
- @Vininn126 That should be possible, although I need to finish the changes to the Persian IPA module as well. Benwing2 (talk) 17:18, 1 April 2023 (UTC)
- @Benwing2 Do you think a pronunciation module would be next? I have a sneaking suspicion this should be easier, because it should just need to incorporate the IPA module and I think syllabification will be easier because Czech has considerably fewer digraphs than Polish, and I think it could be modelled on the Polish module. Czech has first syllable stress, so rhymes should be easy (I don't know if there are exceptions, but the module should be able to detect respellings). I think we could even do like with Polish and replace current instances of cs-IPA. Vininn126 (talk) 14:30, 1 April 2023 (UTC)
- @Vininn126 I believe in automating as much as possible because otherwise you're more likely to get errors as people who don't completely know what they're doing try to enter in declensions. (Not to say errors can't happen in the best of circumstances but you can minimize them.) In any case I'm basically done with masculines and feminines and addressing some remaining issues on neuters now. Probably will push live tomorrow, if not the next day. Benwing2 (talk) 08:46, 1 April 2023 (UTC)
meaning of symbols in SSJC
[edit]@Solvyn, Vininn126, Atitarev, Hergilei I am trying to understand some of the notation in SSJC. There's no guide anywhere on the site explaining the symbols. For example, the second meaning for justice [14] has a † symbol by it; what does this mean? I am guessing "archaic" but some entries also have zast. by them which I take as zastaralý "outdated", so maybe not. And what does ob. mean in the entry for tchyně [15]? Benwing2 (talk) 00:06, 5 April 2023 (UTC)
- @Benwing2: "ob." stands for "obecná čeština". It's non-standard but not sure what label is appropriate (colloquial, low colloquial, etc.). Anatoli T. (обсудить/вклад) 00:20, 5 April 2023 (UTC)
- Actually, it's just colloquial, maybe same as ano vs jo. --Anatoli T. (обсудить/вклад) 00:22, 5 April 2023 (UTC)
- @Atitarev, @Benwing2 I was taught that obecná čeština refers to the colloquial language commonly spoken in Bohemia and Moravia, which is slightly different from the colloquial language of standard Czech. BTW, the abbreviation expr. (expresivní), which often appears in Czech dictionaries, refers to expressions with emotion. I have no idea what the English equivalent of this term would be. Maybe figurative? Solvyn (talk) 03:05, 5 April 2023 (UTC)
- @Solvyn: You may be right about obecná čeština. I don't know. Anatoli T. (обсудить/вклад) 03:23, 5 April 2023 (UTC)
- Obecná čeština on Czech Wikipedia, w:Czech_language#Common_Czech - in English. Anatoli T. (обсудить/вклад) 03:25, 5 April 2023 (UTC)
- @Solvyn: You may be right about obecná čeština. I don't know. Anatoli T. (обсудить/вклад) 03:23, 5 April 2023 (UTC)
- Visit pravidla.cz for more information about abbreviations in the Czech dictionary. Solvyn (talk) 00:34, 5 April 2023 (UTC)
- @Solvyn Thank you. I think the † symbol does mean "archaic" or "obsolete"; Czech Wikipedia reports that "gallows" (the definition by which the † symbol appears) is an archaic meaning of justice. As for "expressive", there is no such similar term that I know of in English dictionaries. "figurative" doesn't seem quite right, as it just means anything that isn't literal. Can you give me some examples of words marked as "expressive"? Benwing2 (talk) 04:04, 5 April 2023 (UTC)
- BTW we are down to less than 50 uses of
{{cs-decl-noun-auto}}
, all of which are "difficult" declensions that I need to chip away at (e.g. the "mixed i-stem" type of feminine nouns, which have six distinct subtypes, as explained here: [16]). The existing declensions with{{cs-decl-noun-auto}}
for these nouns are almost certainly all wrong. I will add some documentation for{{cs-ndecl}}
as it has a lot of options. Benwing2 (talk) 04:07, 5 April 2023 (UTC)- @Benwing2: Thank you. When you have a chance, pls add the good headword parameters (dependent on PoS), you added to other Slavic languages, such
|adj=
and others. I just noticed this was missing. - BTW,
|adj=
is a nice parameter a great number of languages could use. Anatoli T. (обсудить/вклад) 05:07, 5 April 2023 (UTC)- I have a question, why is Česko reported as having a reducible stem on
{{cs-ndecl|n.sg}}
? Anatoli T. (обсудить/вклад) 05:11, 5 April 2023 (UTC)- @Atitarev I'll add those params. Česko is reported as reducible because I added default reducibility; nouns in -Cko default to reducible, and if this is wrong you have to disable it using
-*
. In this case it's a singular-only noun so even with default reducibility it shouldn't be reported as reducible, as the reducibility applies only to the genitive plural. I'll fix. Benwing2 (talk) 05:27, 5 April 2023 (UTC)- @Benwing2: Thanks. I without ".sg" to check why it was reporting so. There is a big number of countries and other nouns with -sko suffix. Anatoli T. (обсудить/вклад) 05:32, 5 April 2023 (UTC)
- @Atitarev It looks like most nouns in -sko are countries, which are singular-only; the remainder are mostly nouns in -isko, which have two possible genitive plurals, in -isek and -isk (and I have it default to displaying both forms). There are only 3 others I found: fiasko (reducible), tělísko (reducible), vojsko (non-reducible). Benwing2 (talk) 05:42, 5 April 2023 (UTC)
- @Benwing2: Thanks and sorry! I didn't check properly. Anatoli T. (обсудить/вклад) 05:44, 5 April 2023 (UTC)
- @Atitarev I missed tuzemsko, which can be either reducible or non-reducible. Benwing2 (talk) 05:45, 5 April 2023 (UTC)
- @Benwing2: Thanks and sorry! I didn't check properly. Anatoli T. (обсудить/вклад) 05:44, 5 April 2023 (UTC)
- @Atitarev It looks like most nouns in -sko are countries, which are singular-only; the remainder are mostly nouns in -isko, which have two possible genitive plurals, in -isek and -isk (and I have it default to displaying both forms). There are only 3 others I found: fiasko (reducible), tělísko (reducible), vojsko (non-reducible). Benwing2 (talk) 05:42, 5 April 2023 (UTC)
- @Benwing2: Sorry for the confusion but it was only a question. I didn’t check properly. If -sko nouns are reducible by default, even if they are not used in the plurals, they should stay reducible. Eg Russian Ивановка has a reducible stem, even if it’s a singularia tantum. Anatoli T. (обсудить/вклад) 08:44, 5 April 2023 (UTC)
- @Benwing2: Hi. I didn't use a good example on the phone. A better example with reduced stem Константи́новка (Konstantínovka). It can potentially/theoretically have plurals (not necessary), so genitive plural would be Константи́новок. Perhaps the same with (ze dvou) "Česek". What do you think? Sorry again at "05:32, 5 April 2023" I posted incorrectly. Anatoli T. (обсудить/вклад) 06:00, 6 April 2023 (UTC)
- @Atitarev I see what you mean and apologies for not responding to your previous ping, it got missed among other pings. I guess the problem here is if it's not attested, in some cases it's just guessing whether it would be reducible in the gen pl. Benwing2 (talk) 06:11, 6 April 2023 (UTC)
- @Benwing2: Thanks. For East Slavs, many such words are intuitively reducible - grammatically, there is no difference between Вя́тка (Vjátka) and ве́тка (vétka), even if the former is a proper noun and normally is used in sg. only. One can form a diminutive for Вя́тка (Vjátka): Вя́точка (Vjátočka) (with an "о" inside). I wonder if some such Czech words are perceived as inherently reducible. Let's see if any regular native speakers chime in on this. Anatoli T. (обсудить/вклад) 06:20, 6 April 2023 (UTC)
- @Atitarev I see what you mean and apologies for not responding to your previous ping, it got missed among other pings. I guess the problem here is if it's not attested, in some cases it's just guessing whether it would be reducible in the gen pl. Benwing2 (talk) 06:11, 6 April 2023 (UTC)
- @Benwing2: Hi. I didn't use a good example on the phone. A better example with reduced stem Константи́новка (Konstantínovka). It can potentially/theoretically have plurals (not necessary), so genitive plural would be Константи́новок. Perhaps the same with (ze dvou) "Česek". What do you think? Sorry again at "05:32, 5 April 2023" I posted incorrectly. Anatoli T. (обсудить/вклад) 06:00, 6 April 2023 (UTC)
- @Benwing2: Thanks. I without ".sg" to check why it was reporting so. There is a big number of countries and other nouns with -sko suffix. Anatoli T. (обсудить/вклад) 05:32, 5 April 2023 (UTC)
- @Atitarev I'll add those params. Česko is reported as reducible because I added default reducibility; nouns in -Cko default to reducible, and if this is wrong you have to disable it using
- I have a question, why is Česko reported as having a reducible stem on
- @Benwing2: číslo or město don't add alternative locative forms "číslu" or "městu". They probably should. Anatoli T. (обсудить/вклад) 05:29, 5 April 2023 (UTC)
- @Benwing2: Thank you. When you have a chance, pls add the good headword parameters (dependent on PoS), you added to other Slavic languages, such
- Regarding expresivní, for example, hrubec and hrubián both mean "rude man", but the latter is expresivní. Solvyn (talk) 05:51, 5 April 2023 (UTC)
- @Solvyn Hmm, maybe 'colloquial' or something? Maybe an analogous pair in English is push vs. shove; the latter has more expressive content and is more informal. Benwing2 (talk) 06:11, 5 April 2023 (UTC)
- I know what you mean, but that's a bit hasty. Solvyn (talk) 13:41, 5 April 2023 (UTC)
- @Solvyn For now I think we can just label them as 'expressive'. I can even make these categorize as Category:Czech expressive nouns or something, since we support language-specific labels. Benwing2 (talk) 18:19, 5 April 2023 (UTC)
- Great! And the Czech Wiktionary also has this category. Solvyn (talk) 02:17, 6 April 2023 (UTC)
- @Solvyn For now I think we can just label them as 'expressive'. I can even make these categorize as Category:Czech expressive nouns or something, since we support language-specific labels. Benwing2 (talk) 18:19, 5 April 2023 (UTC)
- I know what you mean, but that's a bit hasty. Solvyn (talk) 13:41, 5 April 2023 (UTC)
- @Solvyn Hmm, maybe 'colloquial' or something? Maybe an analogous pair in English is push vs. shove; the latter has more expressive content and is more informal. Benwing2 (talk) 06:11, 5 April 2023 (UTC)
- BTW we are down to less than 50 uses of
- ASSC is better. Solvyn (talk) 05:31, 5 April 2023 (UTC)
- @Solvyn, Vininn126, Atitarev, Hergilei I have eliminated all uses of
{{cs-decl-noun-auto}}
. Please don't use it any more; I'm about to delete it. Benwing2 (talk) 18:24, 5 April 2023 (UTC)- Excellent, a good day for Czech. Vininn126 (talk) 19:49, 5 April 2023 (UTC)
- Cheers! Solvyn (talk) 02:05, 6 April 2023 (UTC)
- BTW, Please convert all
{{cs-decl-noun}}
in Category:Czech terms suffixed with -ník and Category:Czech terms suffixed with -dlo to{{cs-ndecl}}
. Solvyn (talk) 09:34, 6 April 2023 (UTC) - Benwing2, impressive work! Hergilei (talk) 02:20, 7 April 2023 (UTC)
- @Solvyn Yup, I have a script to analyze manually-specified Czech declensions and try to work out what the correct
{{cs-ndecl}}
spec is, and another script that compares the results with the manually-specified declensions to make sure they're the same. I was working today on implement support for the various declension categories but that is now essentially done. Benwing2 (talk) 03:44, 7 April 2023 (UTC)- @Solvyn, Vininn126, Atitarev, Hergilei I am doing a bot run to auto-convert as many manually-specified declensions as possible. When done it should have converted about 3300 of 4000 terms. The remainder have weirdnesses in them: Either they have errors in the manual declension tables, which is unfortunately very common, or they legitimately have irregularities in their declension that my analyze script didn't correctly work through. Both cases need to be handled manually. Benwing2 (talk) 08:02, 9 April 2023 (UTC)
- Can you show a list of all these terms? I am willing to convert manually. Solvyn (talk) 09:53, 9 April 2023 (UTC)
- @Solvyn You can see them here: Special:WhatLinksHere/Template:cs-decl-noun. Be warned however that most of these are "hard" in the sense that they typically require special features of
{{cs-ndecl}}
that I haven't yet documented well, and sometimes actually need updates to the code to be handled properly; the easy ones have all been done. So you might want to wait until I finish the documentation for{{cs-ndecl}}
, which I'll get done in a couple of days. Benwing2 (talk) 21:43, 9 April 2023 (UTC)- OK, I'll wait. As for the obsolete declension of týden, I think there is no harm in keeping it. Solvyn (talk) 03:04, 10 April 2023 (UTC)
- @Solvyn You can see them here: Special:WhatLinksHere/Template:cs-decl-noun. Be warned however that most of these are "hard" in the sense that they typically require special features of
- Can you show a list of all these terms? I am willing to convert manually. Solvyn (talk) 09:53, 9 April 2023 (UTC)
- @Solvyn, Vininn126, Atitarev, Hergilei I am doing a bot run to auto-convert as many manually-specified declensions as possible. When done it should have converted about 3300 of 4000 terms. The remainder have weirdnesses in them: Either they have errors in the manual declension tables, which is unfortunately very common, or they legitimately have irregularities in their declension that my analyze script didn't correctly work through. Both cases need to be handled manually. Benwing2 (talk) 08:02, 9 April 2023 (UTC)
- @Solvyn Yup, I have a script to analyze manually-specified Czech declensions and try to work out what the correct
- @Solvyn, Vininn126, Atitarev, Hergilei I have eliminated all uses of
- @Solvyn Thank you. I think the † symbol does mean "archaic" or "obsolete"; Czech Wikipedia reports that "gallows" (the definition by which the † symbol appears) is an archaic meaning of justice. As for "expressive", there is no such similar term that I know of in English dictionaries. "figurative" doesn't seem quite right, as it just means anything that isn't literal. Can you give me some examples of words marked as "expressive"? Benwing2 (talk) 04:04, 5 April 2023 (UTC)
plurals of proper names
[edit]@Solvyn, Vininn126, Atitarev, Hergilei Can any of you help with plurals of proper names? IJP has a whole section on declining given names and surnames, see [17], but consistently lists them as singular-only and doesn't say anything about plurals. I'm sure names like Zbyněk can be pluralized but how? The issue is especially in the nominative plural, which could take -i or -ové or maybe -é, and sometimes the locative plural. Benwing2 (talk) 18:40, 7 April 2023 (UTC)
- @Benwing2: I've done Zbyněk based on https://prirucka.ujc.cas.cz/en/?slovo=Zbyn%C4%9Bk#nadpis4_1 and https://sklonuj.cz/jmeno/Zbyn%C4%9Bk, which has the plural forms. Anatoli T. (обсудить/вклад) 23:09, 7 April 2023 (UTC)
- @Atitarev Thank you! I couldn't find any reference to plurals in the first link (IJP) but I didn't translate all the text so maybe it's there. The second mentions -ové, which is in fact the default for animate proper names (somewhere else in IJP it says this), so all is well. Benwing2 (talk) 23:13, 7 April 2023 (UTC)
- @Benwing2: Yep, Ivan should default to Ivanové in the plural. (I haven't edited it.) Anatoli T. (обсудить/вклад) 23:18, 7 April 2023 (UTC)
- @Atitarev Hmm. It's defaulting to -é/i because of another rule that says that animate nouns in -an following a soft consonant or labial have -é/i (compare Moravan, Kyjevan, Varšavan and others). Unfortunately there's no way to distinguish these from cases like Ivan. Benwing2 (talk) 23:24, 7 April 2023 (UTC)
- @Atitarev I think https://sklonuj.cz/ is unreliable. I put in fest, which is an archaic word meaning "mummy" or "indestructible person" and is masculine; the site outputs a feminine i-stem declension. I think it's guessing based on the form of the word. When I put in háček, which has both animate and inanimate meanings with different declensions, it outputs only the animate meaning, with a declension that disagrees with IJP. So I wouldn't trust it. Benwing2 (talk) 05:45, 8 April 2023 (UTC)
- @Benwing2: Thanks, that’s right. It was okay for two names I checked. BTW, I confirmed reducibility of -sko proper nouns with web examples like “dvou Polsek”, “dvou Němecek”. Anatoli T. (обсудить/вклад) 13:05, 8 April 2023 (UTC)
- @Atitarev I think https://sklonuj.cz/ is unreliable. I put in fest, which is an archaic word meaning "mummy" or "indestructible person" and is masculine; the site outputs a feminine i-stem declension. I think it's guessing based on the form of the word. When I put in háček, which has both animate and inanimate meanings with different declensions, it outputs only the animate meaning, with a declension that disagrees with IJP. So I wouldn't trust it. Benwing2 (talk) 05:45, 8 April 2023 (UTC)
- @Atitarev Hmm. It's defaulting to -é/i because of another rule that says that animate nouns in -an following a soft consonant or labial have -é/i (compare Moravan, Kyjevan, Varšavan and others). Unfortunately there's no way to distinguish these from cases like Ivan. Benwing2 (talk) 23:24, 7 April 2023 (UTC)
- @Benwing2: Yep, Ivan should default to Ivanové in the plural. (I haven't edited it.) Anatoli T. (обсудить/вклад) 23:18, 7 April 2023 (UTC)
- @Atitarev Thank you! I couldn't find any reference to plurals in the first link (IJP) but I didn't translate all the text so maybe it's there. The second mentions -ové, which is in fact the default for animate proper names (somewhere else in IJP it says this), so all is well. Benwing2 (talk) 23:13, 7 April 2023 (UTC)
- I am in the countryside for the weekend and will be unable to add much of value. Vininn126 (talk) 11:17, 8 April 2023 (UTC)
- Okay, I am back in the land of the living, is there anything I can help with? Vininn126 (talk) 14:58, 13 April 2023 (UTC)
nouns that change gender in the plural
[edit]@Solvyn It seems there are some nouns that change gender in the plural like dítě, oblak, kníže, hrabě (and related words markrabě, starohrabě, falckrabě, lankrabě/lantkrabě) and others that assume a different gender's form in the plural but don't actually change gender (i.e. the agreement is still the same as the singular gender, like člověk pl. lidé/lidi). We discussed this issue above but I didn't get a clear answer on bratr pl. bratří, I assume this remains with masculine agreement but declines like a neuter plural? Are there other words in either of the above two categories (either actually changing gender in the plural, or declining like a different gender in the plural but not actually changing)? BTW I created a category Category:Czech nouns that change gender in the plural; so far it only has oblak in it because the others haven't been switched to use {{cs-ndecl}}
. Thanks! Benwing2 (talk) 03:21, 10 April 2023 (UTC)
- Another question: mekáč (“McDonald's”): the manual decl had mekáčovi as alternative for the dat/loc sg. Normally this would be animate-only, whereas this term is inanimate. There do exist Google hits for v mekáčovi but they are rare compared with v mekáči (400 vs. 102,000 or so). Is it worth mentioning the -ovi endings? Are these "expressive" endings that indicate that the speaker is viewing mekáč as animate? Benwing2 (talk) 03:58, 10 April 2023 (UTC)
- As for an animate mekáč, I think some people do think so. Solvyn (talk) 06:43, 10 April 2023 (UTC)
- Regarding bratří, you're right. An example is the Czech translation of Howard Fast's novel My Glorious Brothers, Moji stateční bratří. Both moji and stateční are animate nominative masculine plural forms. (Interestingly, the title of the web page uses bratři instead of bratří on the book cover) Solvyn (talk) 06:12, 10 April 2023 (UTC)
@Benwing2: Hi. I pinged you on Uruguay edit. These two feminine proper nouns are soft-declined like Brunej. Just in case you missed. Anatoli T. (обсудить/вклад) 04:43, 13 April 2023 (UTC)
Czech verb groups
[edit]@Solvyn, Atitarev, Vininn126 I am looking into creating a Czech conjugation module. The issue I'm running into is that different authors seem to use different subdivisions. Wikipedia here w:Czech conjugation and here w:Morphological classification of Czech verbs has a division into groups I through V, each with subtypes. Janda and Townsend [18] have a different division into types I (pres 1sg -ám), II (pres 1sg -ím) and III (pres 1sg -i/-u), each with several subtypes (especially type III). The Wikipedia article w:Morphological classification of Czech verbs actually describes several different ways of subdividing Czech verbs before settling on one (without clear reasons for doing this). In Russian, it was easier as there's Zaliznyak's division into types 1 through 16 which seems fairly standard, and I followed the same division for Ukrainian and Belarusian, which are very similar. Similarly, Germanic strong verbs have a standard division into 7 types. But it doesn't seem so clear for Czech. Solvyn, as a native speaker, how are you taught Czech verb morphology? Is there a division into primary groups or types and if so, what is this? Also, Wikipedia uses the term "transgressive" for what are called gerunds in Janda and Townsend and which correspond approximately to adverbial participles in Russian (although they decline for gender and number in Czech). Examples are dělaje "while doing", dělav "having done". What do native speakers call these forms? Benwing2 (talk) 19:43, 17 April 2023 (UTC)
- BTW IJP refers to dělaje as the adverbial present active participle. Benwing2 (talk) 20:11, 17 April 2023 (UTC)
- @Benwing2: Not sure I could help right now but what if you start with obvious classes/types and then you can add more as you go along? lack of stress patterns should make it somewhat easier. "Internetová jazyková příručka" (IJP) should have most, if not all conjugation examples in the present. Past tense seems relatively simple. Anatoli T. (обсудить/вклад) 01:12, 18 April 2023 (UTC)
- @Atitarev It's not quite so simple as starting with the obvious classes and expanding over time; I need to have the overall structure in mind from the very start otherwise I'll end up rewriting it several times. I am going to use the Wikipedia classification (for now at least) as the page covering it is very detailed. Note that although Czech has no accent patterns, it has vowel length and seems to have a lot more morphological alternations due to various sound changes that didn't occur in Russian (vowel fronting after soft consonants, contraction across /j/, etc.). For similar reasons, the Czech noun declension module is longer than the Ukrainian one even without all the accent patterns. Benwing2 (talk) 16:45, 18 April 2023 (UTC)
- @Benwing2: Thanks. Appendix:Czech verbs must be an attempt to describe but it's small. Good luck! I hope you can do it. Belarusian verbs must be too similar to Ukrainian and Russian, since you did them so well with even less documentation, although, slounik.org is pretty good. Anatoli T. (обсудить/вклад) 00:02, 19 April 2023 (UTC)
- @Atitarev Yes, Belarusian conjugation and declension is extremely similar to Ukrainian and Russian. Thanks for that link; it describes yet another classification of verbs that is slightly different from the two mentioned above. How do you think we should lay out the table? The tables in Appendix:Czech verbs have a lot more information in them than the tables e.g. in dělat (imperfective) and dodělat (perfective). Overall the dělat/dodělat tables look pretty good to me except for labeling the perfective future as "present". Maybe we can find some middle ground like in the Russian tables, where we show the periphrastic imperfective future but not other composed tenses, and instead just have a few lines describing how to construct the composed tenses? Cf. the Bulgarian verb tables. Also the tables in Appendix:Czech verbs have "informal" = Common Czech forms, which I don't think we should include for now. Benwing2 (talk) 03:38, 19 April 2023 (UTC)
- @Benwing2: Thanks.
- The Wikipedia (Czech conjugation) also includes alternative colloquial forms, e.g. kryji, kryju but only on Class III table. The only (consistent) difference, IMO is -ji/-ju (1st pers. sg) and -jí/-jou (3rd pers. pl). This is what I learned a while ago but not sure if it is the only difference and there are different levels of "colloquial".
- I don't like the way they label future as present on perfective verbs. The Bulgarian is also confusing, if, e.g. Bulgarian даде́ш (dadéš) and Czech dáš mean "(you) will give".
- I'd offer the uk/be table format as a start for Czech verbs and see the feedback. I think I mentioned this issue to Dan P. about present/future confusion long before he was blocked but he didn't like my idea. Anatoli T. (обсудить/вклад) 06:36, 19 April 2023 (UTC)
- @Atitarev Thanks for your comments. Take a look at Slovak dať. With different colors this might be a good arrangement. See also Polish dawać and dać. Benwing2 (talk) 09:06, 19 April 2023 (UTC)
- @Benwing2: Slovak looks good and correct. Anatoli T. (обсудить/вклад) 10:58, 19 April 2023 (UTC)
- Polish is good as well. Anatoli T. (обсудить/вклад) 11:04, 19 April 2023 (UTC)
- @Benwing2: Hi. I would settle for Slovak template but Slovak verb conjugations are not in a great state. Anatoli T. (обсудить/вклад) 00:10, 21 April 2023 (UTC)
- @Atitarev Yes, a lot of verbs are missing conjugations. There's actually a reasonable-looking Slovak conjugation module at Module:sk-verb written about 1.5 months ago by User:TomášPolonec and apparently modeled after Module:ru-verb; but this user didn't do the work to add the appropriate conjugations to most existing Slovak verbs. BTW this Slovak module uses its own division of Slovak verbs (again into five main groups) which is different from Wikipedia's and all the others. So there are at least the following mutually incompatible divisions: Wikipedia (for Czech); IJP (for Czech); Janda and Townsend (for Czech); Appendix:Czech verbs (for Czech); Module:sk-verb (for Slovak). I am currently working on getting all the forms generated for the various verb types and haven't yet gotten around to the layout/presentation, but I may model it after the Slovak one but with standard Slavic blue colors. Benwing2 (talk) 00:35, 21 April 2023 (UTC)
- @Benwing2: Thanks. Sounds like a good plan. Can you (re-)link me to Janda and Townsend file/website, please? Anatoli T. (обсудить/вклад) 00:45, 21 April 2023 (UTC)
- Is http://www.seelrc.org:8080/grammar/pdf/compgrammar_czech.pdf the one? Anatoli T. (обсудить/вклад) 00:46, 21 April 2023 (UTC)
- @Atitarev Yup, that's it. Benwing2 (talk) 03:11, 21 April 2023 (UTC)
- Is http://www.seelrc.org:8080/grammar/pdf/compgrammar_czech.pdf the one? Anatoli T. (обсудить/вклад) 00:46, 21 April 2023 (UTC)
- @Benwing2 Hi, just wanted to say that I am planning to add the
{{sk-conj}}
template to the rest of the Slovak verbs, I just haven't found the time yet. The division of the Slovak verbs is taken from the book Morfológia slovenského jazyka (pp. 459-460), which are the official guidelines for the Slovak morphology (regulated by a legal norm). This book is super detailed, so the work wasn't that difficult. - Also, forms such as dělaje and dělav are called transgressives (in Czech "přechodník" or "transgresiv") - dělaje/dělajíc/dělajíce are the forms of the present transgressive ("přechodník přítomný") and dělav/dělavši/dělavše are the forms of the past transgressive ("přechodník minulý"). Good luck with the Czech module! --TomášPolonec (talk) 14:48, 21 April 2023 (UTC)
- @TomášPolonec Thanks. Do you know if there is an equivalent of this book for Czech? I am guessing not since different sources have different verb type divisions and specify different forms as being standard, archaic or colloquial. Benwing2 (talk) 16:47, 21 April 2023 (UTC)
- @Benwing2 I guess Příruční mluvnice češtiny would be the book you are looking for. See p. 50. There are two ways to divide verbs into classes: by the present stem (under §510 on that page) and by the infinitive stem (§512). Both in Czech and in Slovak, there are 5 classes according to the first criterion and 6 classes according to the second one. I used the former system for the Slovak module (and the book I used preferred it as well), but the Czech book that I linked is structured mainly according to the latter one, which seems to be more traditional.
- Another problem that you are referring to is that there are more layers of "standardness" of Czech (see here), so you will have standard forms, tolerated forms and substandard forms. --TomášPolonec (talk) 18:57, 21 April 2023 (UTC)
- @TomášPolonec Thank you! It looks like the present-tense classification in that book is the same as the Wikipedia article I mention above. I think it makes the most sense to use this because the infinitive is already available to the conjugation module as the page name, so having users specify the present-tense group should be most of the needed information for conjugating the verb. Benwing2 (talk) 20:27, 21 April 2023 (UTC)
- @Benwing2 Exactly. Usually you only need those two stems to conjugate a verb (plus to know whether some forms, e.g. participles exist), the system is virtually the same as for Slovak, you can even find corresponding conjugational patterns. Should you need any help, let me know. --TomášPolonec (talk) 22:06, 21 April 2023 (UTC)
- @TomášPolonec, @Benwing2: Hi.
- On this site they seem to use the former system - five classes (třída) by the present stem: https://www.diktaty-online.cz/slovni-druhy-slovesa/ Anatoli T. (обсудить/вклад) 03:13, 25 April 2023 (UTC)
- @Benwing2 Exactly. Usually you only need those two stems to conjugate a verb (plus to know whether some forms, e.g. participles exist), the system is virtually the same as for Slovak, you can even find corresponding conjugational patterns. Should you need any help, let me know. --TomášPolonec (talk) 22:06, 21 April 2023 (UTC)
- @TomášPolonec Thank you! It looks like the present-tense classification in that book is the same as the Wikipedia article I mention above. I think it makes the most sense to use this because the infinitive is already available to the conjugation module as the page name, so having users specify the present-tense group should be most of the needed information for conjugating the verb. Benwing2 (talk) 20:27, 21 April 2023 (UTC)
- @TomášPolonec Thanks. Do you know if there is an equivalent of this book for Czech? I am guessing not since different sources have different verb type divisions and specify different forms as being standard, archaic or colloquial. Benwing2 (talk) 16:47, 21 April 2023 (UTC)
- @Benwing2: Thanks. Sounds like a good plan. Can you (re-)link me to Janda and Townsend file/website, please? Anatoli T. (обсудить/вклад) 00:45, 21 April 2023 (UTC)
- @Atitarev Yes, a lot of verbs are missing conjugations. There's actually a reasonable-looking Slovak conjugation module at Module:sk-verb written about 1.5 months ago by User:TomášPolonec and apparently modeled after Module:ru-verb; but this user didn't do the work to add the appropriate conjugations to most existing Slovak verbs. BTW this Slovak module uses its own division of Slovak verbs (again into five main groups) which is different from Wikipedia's and all the others. So there are at least the following mutually incompatible divisions: Wikipedia (for Czech); IJP (for Czech); Janda and Townsend (for Czech); Appendix:Czech verbs (for Czech); Module:sk-verb (for Slovak). I am currently working on getting all the forms generated for the various verb types and haven't yet gotten around to the layout/presentation, but I may model it after the Slovak one but with standard Slavic blue colors. Benwing2 (talk) 00:35, 21 April 2023 (UTC)
- @Benwing2: Slovak looks good and correct. Anatoli T. (обсудить/вклад) 10:58, 19 April 2023 (UTC)
- Also I think Bulgarian даде́ш (dadéš) is actually present. Benwing2 (talk) 09:09, 19 April 2023 (UTC)
- No, даде́ш (dadéš) is future (it's perfective), да́ваш (dávaš) is present (imperfective).
- Compare with Russian дашь (dašʹ) and даёшь (dajóšʹ),
- Easier to compare 1st pers. plural:
- Bulgarian: даде́м (dadém) (future, pf), да́ваме (dávame) (present, impf)
- Russian: дади́м (dadím) (future, pf), даём (dajóm) (present, impf) Anatoli T. (обсудить/вклад) 11:04, 19 April 2023 (UTC)
- @Atitarev Hmmm, in that case the verb tables are wrong because they specify a separate future for дам (dam) using the particle ще (šte). Benwing2 (talk) 13:16, 19 April 2023 (UTC)
- @Benwing2:
- Bulgarian verb tenses are a bit complicated but "present" tense of perfectives without ще is normally used in subordinate clauses:
- "Обади ми се, когато ти даде книгата" (Call me when he gives you the book). I guess, no need to change if Bulgarian grammar considers it (grammatical) present tense. Anatoli T. (обсудить/вклад) 13:48, 19 April 2023 (UTC)
- @Atitarev Hmmm, in that case the verb tables are wrong because they specify a separate future for дам (dam) using the particle ще (šte). Benwing2 (talk) 13:16, 19 April 2023 (UTC)
- @Atitarev Thanks for your comments. Take a look at Slovak dať. With different colors this might be a good arrangement. See also Polish dawać and dać. Benwing2 (talk) 09:06, 19 April 2023 (UTC)
- @Atitarev Yes, Belarusian conjugation and declension is extremely similar to Ukrainian and Russian. Thanks for that link; it describes yet another classification of verbs that is slightly different from the two mentioned above. How do you think we should lay out the table? The tables in Appendix:Czech verbs have a lot more information in them than the tables e.g. in dělat (imperfective) and dodělat (perfective). Overall the dělat/dodělat tables look pretty good to me except for labeling the perfective future as "present". Maybe we can find some middle ground like in the Russian tables, where we show the periphrastic imperfective future but not other composed tenses, and instead just have a few lines describing how to construct the composed tenses? Cf. the Bulgarian verb tables. Also the tables in Appendix:Czech verbs have "informal" = Common Czech forms, which I don't think we should include for now. Benwing2 (talk) 03:38, 19 April 2023 (UTC)
- @Benwing2: Thanks. Appendix:Czech verbs must be an attempt to describe but it's small. Good luck! I hope you can do it. Belarusian verbs must be too similar to Ukrainian and Russian, since you did them so well with even less documentation, although, slounik.org is pretty good. Anatoli T. (обсудить/вклад) 00:02, 19 April 2023 (UTC)
- @Atitarev It's not quite so simple as starting with the obvious classes and expanding over time; I need to have the overall structure in mind from the very start otherwise I'll end up rewriting it several times. I am going to use the Wikipedia classification (for now at least) as the page covering it is very detailed. Note that although Czech has no accent patterns, it has vowel length and seems to have a lot more morphological alternations due to various sound changes that didn't occur in Russian (vowel fronting after soft consonants, contraction across /j/, etc.). For similar reasons, the Czech noun declension module is longer than the Ukrainian one even without all the accent patterns. Benwing2 (talk) 16:45, 18 April 2023 (UTC)
- @Benwing2: Not sure I could help right now but what if you start with obvious classes/types and then you can add more as you go along? lack of stress patterns should make it somewhat easier. "Internetová jazyková příručka" (IJP) should have most, if not all conjugation examples in the present. Past tense seems relatively simple. Anatoli T. (обсудить/вклад) 01:12, 18 April 2023 (UTC)
Verb complications
[edit]@Solvyn, TomášPolonec, Atitarev I don't know if any of you can help me with this, but there are several intransitive and even reflexive verbs listed in IJP with past passive participles (PPP's). What do such participles mean and under what circumstances are they used? For example, vyhoupnout se is given in IJP with PPP 'vyhoupnut'; mihotat se is given in IJP with PPP 'mihotán'; tknout se is given with PPP 'tknut'; dotknout se is given with PPP 'dotčen ~ dotknut'; plesat "to dance (intr.)" is given with PPP 'plesán'; same with klouzat "to slide" with PPP 'klouzán'; same with plavat "to swim, to float" with PPP 'plaván'; same with žehrat "to grumble; to lament" with PPP 'žehrán'; same with umírat "to die, to die away" with PPP 'umírán'; etc. These are just the ones I've encountered so far. BTW many of the Czech verb classes are quite complicated to implement due to the different variations. For example, after a bunch of analysis it seems that verbs in the V.2 class such as tesat "to carve", hýbat "to move", orat "to plow" have 9 different variations because the present can be formed in one of three ways ([1] -ám + -u; [2] -u + -ám; [3] -u only) and similarly the imperative ([1] -ej only; [2] no ending + -ej; [3] -ej + no ending), and these variations are somewhat independent of each other so all can occur in any combination. I'm looking into verbs in -nout now, which are even more complicated as there are various ways of forming the past tense, various ways of forming the PPP, various ways of forming the past transgressive and various ways of forming the verbal noun and they occur in different combinations. Benwing2 (talk) 20:04, 23 April 2023 (UTC)
- Also, IJP says verbs like dotknout se where the stem ends in a consonant form the feminine only 'dotkla' and similarly masculine animate plural only 'dotlki' and same for other forms not the masculine singular, but the existing Wiktionary paradigm at dotknout lists both 'dotkla' and 'dotknula' and similarly 'dotkli' and 'dotknuli', etc. Which one is correct? Finally, if the stem ends in a vowel like minout, hrnout, trnout, does it form past 'minul, minula, minulo, minuli' etc. or 'minul, mila, milo, mili'? I assume the former? Benwing2 (talk) 21:17, 23 April 2023 (UTC)
- @Benwing2: You can find some explanation in the book that I linked above (pp. 51-54), e.g. you can see that hýbat, tesat and orat all belong to the type mazat, but this group has a lot of verbs that gradually move towards the more regular type volat. This also happens in Slovak, but I didn't implement it, in those cases I simply use two conjugational tables under each other, as these are two separate sets of forms.
- The type minout always has the -nu- in the past, so e.g. minul, hrnul, trnul. The conjugational patterns tisknout, minout and tnout are on the page 52 in the PDF. For the type tisknout (where dotknout belongs as well), see here for a detailed explanation in Czech.
- The past participles in question are only used rarely, e.g. I only found one hit for 'plaván' on Google (excluding typos): "Na začátku května jsme se zúčastnili dálkového závodu Námořní míle, který je, jako jeden z mála dálkových závodů, plaván na bazéně a ne na otevřené vodě." ("At the beginning of May, we participated in the long-distance race called the 'Nautical Mile', which is one of the few long-distance races that is swum in a pool rather than in open water."). --TomášPolonec (talk) 23:15, 23 April 2023 (UTC)
- @TomášPolonec Thank you! This makes sense. I also found a long section on these verbs in the IJP; having it online makes it easier for me to cut and paste into Google Translate as I'm not able to read Czech. I guess what I really need is a standardized system that describes all the variation and assigns codes to each variant, and ideally lists, for each verb, the variant codes for that verb (this is what Zaliznyak's Russian grammar does). However, it looks like this doesn't exist and I have to go by IJP and SSJČ. I've essentially made up my own set of variant codes. Benwing2 (talk) 06:39, 24 April 2023 (UTC)
lemmatizing reflexive-only verbs
[edit](Notifying Solvyn, Vininn126, Atitarev, Hergilei, Zhnka): User:TomášPolonec I am thinking we should lemmatize Czech reflexive-only verbs with the reflexive particle se or si following. This is what IJP and SSJC do, for example, but it's not current Wiktionary practice; e.g. stát is two different verbs, one of which is always reflexive 'stát se' but doesn't have the reflexive particle in either the pagename or the conjugation table. This would mean that stát would split into stát (“to cost, to be worth”) and stát se (“to become”). What do you think? Benwing2 (talk) 15:25, 25 April 2023 (UTC)
- @Benwing2 I'm reminded of our talk about this with Polish - one reason to create the reflexive page at LEAST as a redirect would be for interwiki linking. Many monolingual Polish dictionaries also include different headwords/pages specifically for reflexive verbs.
- Another solution we proposed was to include it in the head - i.e. a different headword with
|head=PAGENAME reflexive pronoun
Vininn126 (talk) 15:32, 25 April 2023 (UTC) - @Benwing2: I had this dillema last month with the Slovak verbs: [19]. I thought the same as you, but then I noticed that e.g. Polish (bać), German (freuen) and other Wiktionary entries have the reflexive particle specified only with the meaning, not directly in the page name, so I did the same with the Slovak verbs. Therefore it would be a rather big change. I agree with Vininn126 that a redirect would be a good solution. --TomášPolonec (talk) 15:36, 25 April 2023 (UTC)
- @TomášPolonec It really depends on the language. Spanish and Portuguese, for example, follow the solution I'm proposing where verbs that have both reflexive and non-reflexive variants are lemmatized at the non-reflexive verb, but reflexive-only verbs are lemmatized with the reflexive particle in the pagename. Italian lemmatizes all reflexive verbs (whether or not they have non-reflexive variants) with the reflexive particle in the pagename. Bulgarian sometimes follows Spanish/Portuguese practice and sometimes Italian practice. Macedonian follows Italian practice, with the particle се preceding the verb. Russian, Ukrainian and Belarusian follow Italian practice as well (although in these languages the reflexive particle is always joined to the verb, so this situation is a bit different). Benwing2 (talk) 15:40, 25 April 2023 (UTC)
- BTW changing this isn't quite as hard as you think; it was done for Spanish and Portuguese, which used to follow Italian practice. User:JeffDoozan did this by bot for these languages. Benwing2 (talk) 15:42, 25 April 2023 (UTC)
- @Vininn126 Not sure I like the idea of having the reflexive particle only in the head; usually I think it's best to avoid having heads that differ from the pagename like this (although this is sometimes done with terms in English, Spanish, German and other languages that always come with an associated article). Benwing2 (talk) 15:44, 25 April 2023 (UTC)
- @Benwing2:If there is a tendency to shift towards doing it this way, then it can be by all means done also with Czech and Slovak verbs, since dictionaries normally list reflexive verbs as such separately. But I think it should be consistent, at least among all the West Slavic languages (so Polish should be then included as well, in my opinion). --TomášPolonec (talk) 15:55, 25 April 2023 (UTC)
- Agreed. Vininn126 (talk) 15:57, 25 April 2023 (UTC)
- BTW changing this isn't quite as hard as you think; it was done for Spanish and Portuguese, which used to follow Italian practice. User:JeffDoozan did this by bot for these languages. Benwing2 (talk) 15:42, 25 April 2023 (UTC)
- @TomášPolonec It really depends on the language. Spanish and Portuguese, for example, follow the solution I'm proposing where verbs that have both reflexive and non-reflexive variants are lemmatized at the non-reflexive verb, but reflexive-only verbs are lemmatized with the reflexive particle in the pagename. Italian lemmatizes all reflexive verbs (whether or not they have non-reflexive variants) with the reflexive particle in the pagename. Bulgarian sometimes follows Spanish/Portuguese practice and sometimes Italian practice. Macedonian follows Italian practice, with the particle се preceding the verb. Russian, Ukrainian and Belarusian follow Italian practice as well (although in these languages the reflexive particle is always joined to the verb, so this situation is a bit different). Benwing2 (talk) 15:40, 25 April 2023 (UTC)
- I think the reflexive particle should, at least, be shown on the definition line, as it is, e.g. with Czech bát
{{lb|cs|reflexive-se}}
or Slovak báť:{{lb|sk|reflexive-sa}}
. - A similar label for Polish doesn't exist but I think it should, e.g. bać. Bulgarian and Macedonian verbs have them.
- An inconsistency exist with Macedonian, they lemmatise reflexive verbs with the particle but not always. I'm OK either way, as long as it's consistent. The trend seems to be to lemmatise without the particle. It makes more sense to lemmatise reflexive verbs with the reflexive particle, if they are never used without them (only reflexive), e.g. smát se, bát se, etc. Anatoli T. (обсудить/вклад) 23:26, 25 April 2023 (UTC)
- Thanks everyone. I think I will do what Anatoli suggests and lemmatize reflexive-only verbs with the reflexive particle but otherwise not. This will happen after I finish the verb module. This is proving a bit slow going due to all the variations and alternative forms, but I am getting there. Benwing2 (talk) 01:24, 26 April 2023 (UTC)
- @Benwing2: Just to be clear. I'm OK either way but it makes more sense to lemmatise reflexive-only verbs but the consistency is important as well. If something is applied, then it would be great, if that's done for all West Slavic languages, as TomášPolonec said.
- E.g. Bulgarian сме́я се (sméja se, “to laugh”) is lemmatised with the reflexive particle.
- сме́я (sméja, “to dare”) is an unrelated verb. Anatoli T. (обсудить/вклад) 02:02, 26 April 2023 (UTC)
- @Atitarev Right. If I write a script to convert reflexive-only verbs to have the verb lemmatized with the particle, it should be possible with only a few modifications to run this for all West Slavic languages (provided of course that the Polish editors are OK with this). Benwing2 (talk) 02:39, 26 April 2023 (UTC)
- I've added a label to Module:labels/data/lang/pl to show reflexive with się Anatoli T. (обсудить/вклад) 02:57, 26 April 2023 (UTC)
- I created something similar for Old Polish. Silesian and Kashubian should probably also get that. Vininn126 (talk) 09:01, 26 April 2023 (UTC)
- Oh, and Old Czech. Vininn126 (talk) 09:04, 26 April 2023 (UTC)
- Please note that adding it to the module, doesn't change entries automatically. So, e.g. label
{{lb|pl|reflexive}}
in Polish reflexive verbs needs to change to{{lb|pl|reflexive-się}}
to display the reflexive particle. Anatoli T. (обсудить/вклад) 23:01, 27 April 2023 (UTC)- I am aware. We could probably have a bot convert them. Vininn126 (talk) 08:55, 28 April 2023 (UTC)
- Please note that adding it to the module, doesn't change entries automatically. So, e.g. label
- Oh, and Old Czech. Vininn126 (talk) 09:04, 26 April 2023 (UTC)
- I created something similar for Old Polish. Silesian and Kashubian should probably also get that. Vininn126 (talk) 09:01, 26 April 2023 (UTC)
- I've added a label to Module:labels/data/lang/pl to show reflexive with się Anatoli T. (обсудить/вклад) 02:57, 26 April 2023 (UTC)
- @Atitarev Right. If I write a script to convert reflexive-only verbs to have the verb lemmatized with the particle, it should be possible with only a few modifications to run this for all West Slavic languages (provided of course that the Polish editors are OK with this). Benwing2 (talk) 02:39, 26 April 2023 (UTC)
- Thanks everyone. I think I will do what Anatoli suggests and lemmatize reflexive-only verbs with the reflexive particle but otherwise not. This will happen after I finish the verb module. This is proving a bit slow going due to all the variations and alternative forms, but I am getting there. Benwing2 (talk) 01:24, 26 April 2023 (UTC)
Adding Template:rfinfl to Czech nouns, proper nouns and adjectives
[edit](Notifying Solvyn, Vininn126, Atitarev, Hergilei, Zhnka): I'm planning on doing a bot run to add {{rfinfl|cs|noun}}
to all Czech nouns lacking a declension, and similarly for proper nouns and adjectives. This should make it easier to find the terms where a declension should be added. Any objections? Benwing2 (talk) 18:30, 27 April 2023 (UTC)
- Are there any indeclinable nouns/adjectives? (Also no I don't care). Vininn126 (talk) 18:31, 27 April 2023 (UTC)
- @Vininn126 Yes and my script handles this as long as they have
|indecl=1
in the headword. Benwing2 (talk) 18:36, 27 April 2023 (UTC)- Support. I have added inflections to most, if not all, country names in Czech. Many personal names have no inflections. Anatoli T. (обсудить/вклад) 22:59, 27 April 2023 (UTC)
- @Vininn126 Yes and my script handles this as long as they have
(Notifying Solvyn, Vininn126, Atitarev, Hergilei, Zhnka): @TomášPolonec I am working on a Czech verb module and I will be adding support for the imperfective future (e.g. budu dělat), past tense (e.g. dělal jsem), conditional (e.g. dělal bych) and conditional past (e.g. byl bych dělal). I have a question though about reflexives. If the verb has an attached reflexive pronoun se or si, where should it be positioned in the multiword forms? I gather in single-word forms it goes by default after the verb (this is what IJP does, for example), but it's less obvious for multiword forms. I know that the reflexive particles can move around but there should still be a default position. Benwing2 (talk) 18:35, 27 April 2023 (UTC)
- Take my words with a grain of salt, I believe they go after typically. Vininn126 (talk) 18:37, 27 April 2023 (UTC)
- @Vininn126 Do you mean directly after the verb or after the auxiliary in the forms dělal jsem and dělal bych? Benwing2 (talk) 18:38, 27 April 2023 (UTC)
- I mean the end, but now I'm beginning to doubt where it should be when there are at least 2 other elements... Vininn126 (talk) 18:39, 27 April 2023 (UTC)
- @Vininn126 Do you mean directly after the verb or after the auxiliary in the forms dělal jsem and dělal bych? Benwing2 (talk) 18:38, 27 April 2023 (UTC)
- @Benwing2: For example the verb ptát se ("to ask") has these forms: ptám se (present), budu se ptát (future), ptal jsem se (past), byl jsem se ptal (past perfect) ptal bych se (present conditional) and byl bych se ptal (past conditional). So the reflexive particle goes after the forms of být ("to be"). For the single-word forms, it goes after the form. --TomášPolonec (talk) 18:47, 27 April 2023 (UTC)
- But let's not forget about the 2nd person singular forms ses and sis: ptal ses (past), byl ses ptal (past perfect), ptal by ses (present conditional) and byl by ses ptal (past conditional). Zhnka (talk) 18:53, 27 April 2023 (UTC)
- @Benwing2: It looks you can totally model the position of se on the Slovak conjugation of báť sa (where sa is the Slovak reflexive particle), @TomášPolonec, pls confirm that the order is the same. Pls also see comments you may have missed above. Anatoli T. (обсудить/вклад) 01:00, 28 April 2023 (UTC)
- @Atitarev: That's true, but as Zhnka (who is Czech, so you can trust him more than me) correctly noted, the conditional is a little bit tricky because of the contractions (by jsem se = bych se, by jsi se = by ses...). --TomášPolonec (talk) 08:40, 28 April 2023 (UTC)
- @TomášPolonec, @Zhnka, @Benwing2: Thank you! Please assess the table at the German Wiktionary: Flexion:bát_se. It must be sourced from the book "Tschechische Verben: 100 konjugierte Verben". I don't see another Czech verb conjugation reference as complete as this one, which covers reflexives. Anatoli T. (обсудить/вклад) 09:13, 28 April 2023 (UTC)
- @Benwing2: I am not sure if anyone else will respond with an assessment but I did mine. The tables from that book seem accurate, although there is no classification by conjugation type. You can import the German templates (as they are) and we can translate the headings to have an idea what the output should look like. The templates are nested and depend on each other, though. Anatoli T. (обсудить/вклад) 05:32, 1 May 2023 (UTC)
- @Atitarev Thanks. So far I have been designing my own table (sorry, I got Covid from my trip to Spain, which has slowed down work on the Czech verb module) but I'll take a look and see what the German templates look like. Benwing2 (talk) 05:53, 1 May 2023 (UTC)
- @Benwing2: Oh, sorry to hear! Get well soon. Take it easy, mate. Anatoli T. (обсудить/вклад) 05:56, 1 May 2023 (UTC)
- Get well soon! Vininn126 (talk) 09:08, 1 May 2023 (UTC)
- Wishing you a speedy recovery! --TomášPolonec (talk) 10:22, 2 May 2023 (UTC)
- @Atitarev, Vininn126, TomášPolonec Thank you all for your well wishes! I am starting to feel better. Please see User:Benwing2/test-cs-conj for an example of a conjugated verb, and let me know what you think. Benwing2 (talk) 03:34, 4 May 2023 (UTC)
- @Benwing2: It's a great start! Anatoli T. (обсудить/вклад) 07:54, 4 May 2023 (UTC)
- BTW it appears for jmenovat that the long passive participle is jmenovaný with short a even though the short passive participle is jmenován and the verbal noun is jmenování, both with long á. Why is this and what is the rule here? Benwing2 (talk) 03:59, 4 May 2023 (UTC)
- @Benwing2: I can't answer this and I couldn't find. Asked on an online forum. Anatoli T. (обсудить/вклад) 07:54, 4 May 2023 (UTC)
- @Benwing2Looks great! The short -a in the ending -aný is a thing for all verbs that end with -ovat in infinitive (class III.2). --TomášPolonec (talk) 09:48, 4 May 2023 (UTC)
- @Benwing2, @TomášPolonec: Is only for this class? From web searches:
- "Jak je vidět výše, u některých sloves je v příčestí trpném dlouhý vokál á. Tento vokál se zkracuje, pokud je od tohoto tvaru tvořeno adjektivum: dělán > dělaný, kupován > kupovaný, mazán > mazaný, brán > braný. U ostatních sloves k vytvoření tohoto typu adverbií stačí přidání koncovky –ý (nesený, pečený, tištěný, začatý atd.)." Anatoli T. (обсудить/вклад) 11:27, 4 May 2023 (UTC)
- From the above, I can tell that -án > -aný transformation is common. Anatoli T. (обсудить/вклад) 11:29, 4 May 2023 (UTC)
- These two links are interesting: https://dspace5.zcu.cz/bitstream/11025/28250/1/Honzova%20-%20DP.pdf and http://www.morgannilsson.se/9197348732.pdf (in Swedish) Anatoli T. (обсудить/вклад) 11:32, 4 May 2023 (UTC)
- That vowel alteration has to do with West Slavic often lengthening vowels in closed syllables. Vininn126 (talk) 11:32, 4 May 2023 (UTC)
- @Vininn126: Thanks! In the "dělat" table here, should "děláný" be changed to "dělaný"? Anatoli T. (обсудить/вклад) 11:36, 4 May 2023 (UTC)
- I'm fairly sure that it should be. Vininn126 (talk) 11:38, 4 May 2023 (UTC)
- @Vininn126: Did you mean "open syllables"? (Also Polish vowels are always short, must be Czech/Slovak only). Anatoli T. (обсудить/вклад) 11:39, 4 May 2023 (UTC)
- Up until the Middle Polish period Polish had vowel length distinction, where in closed vowels they lengthened and in open vowels they shortened, but most of them died out in Polish except with some alteration for example in lód/lodu. Vininn126 (talk) 11:44, 4 May 2023 (UTC)
- @Vininn126: I see. Thanks. Anatoli T. (обсудить/вклад) 12:07, 4 May 2023 (UTC)
- @Atitarev, Vininn126 BTW I checked the list of Czech adjectives; there are no adjectives ending in -áný and 191 ending in -aný so I assume the shortening is general. Benwing2 (talk) 20:29, 4 May 2023 (UTC)
- Just FYI there are at least four sources of vowel length in Czech: (1) contraction across /j/; (2) lengthening before a lost yer; (3) length preserved in Proto-Slavic stressed syllables bearing the acute or neoacute tone (circumflex syllables shortened); (4) length preserved in Proto-Slavic pretonic syllables. AFAIK Slovak and Polish length has similar origins except that syllables with old acute were shortened. Polish length manifests today in two places: ó vs. o, and ą (represents a formerly long nasal vowel) vs. ę (represents a formerly short nasal vowel). Slovak also has the rhythmic law that prevents adjacent syllables from bearing vowel length. Benwing2 (talk) 20:35, 4 May 2023 (UTC)
- @Benwing2: Thanks. Please change "děláný" to "dělaný" in the "dělat" table. Anatoli T. (обсудить/вклад) 22:54, 4 May 2023 (UTC)
- @Atitarev, Vininn126 BTW I checked the list of Czech adjectives; there are no adjectives ending in -áný and 191 ending in -aný so I assume the shortening is general. Benwing2 (talk) 20:29, 4 May 2023 (UTC)
- @Vininn126: I see. Thanks. Anatoli T. (обсудить/вклад) 12:07, 4 May 2023 (UTC)
- Up until the Middle Polish period Polish had vowel length distinction, where in closed vowels they lengthened and in open vowels they shortened, but most of them died out in Polish except with some alteration for example in lód/lodu. Vininn126 (talk) 11:44, 4 May 2023 (UTC)
- @Vininn126: Thanks! In the "dělat" table here, should "děláný" be changed to "dělaný"? Anatoli T. (обсудить/вклад) 11:36, 4 May 2023 (UTC)
- That vowel alteration has to do with West Slavic often lengthening vowels in closed syllables. Vininn126 (talk) 11:32, 4 May 2023 (UTC)
- These two links are interesting: https://dspace5.zcu.cz/bitstream/11025/28250/1/Honzova%20-%20DP.pdf and http://www.morgannilsson.se/9197348732.pdf (in Swedish) Anatoli T. (обсудить/вклад) 11:32, 4 May 2023 (UTC)
- From the above, I can tell that -án > -aný transformation is common. Anatoli T. (обсудить/вклад) 11:29, 4 May 2023 (UTC)
- @Atitarev, Vininn126, TomášPolonec Thank you all for your well wishes! I am starting to feel better. Please see User:Benwing2/test-cs-conj for an example of a conjugated verb, and let me know what you think. Benwing2 (talk) 03:34, 4 May 2023 (UTC)
- @Atitarev Thanks. So far I have been designing my own table (sorry, I got Covid from my trip to Spain, which has slowed down work on the Czech verb module) but I'll take a look and see what the German templates look like. Benwing2 (talk) 05:53, 1 May 2023 (UTC)
- @Benwing2: I am not sure if anyone else will respond with an assessment but I did mine. The tables from that book seem accurate, although there is no classification by conjugation type. You can import the German templates (as they are) and we can translate the headings to have an idea what the output should look like. The templates are nested and depend on each other, though. Anatoli T. (обсудить/вклад) 05:32, 1 May 2023 (UTC)
- @TomášPolonec, @Zhnka, @Benwing2: Thank you! Please assess the table at the German Wiktionary: Flexion:bát_se. It must be sourced from the book "Tschechische Verben: 100 konjugierte Verben". I don't see another Czech verb conjugation reference as complete as this one, which covers reflexives. Anatoli T. (обсудить/вклад) 09:13, 28 April 2023 (UTC)
- @Atitarev: That's true, but as Zhnka (who is Czech, so you can trust him more than me) correctly noted, the conditional is a little bit tricky because of the contractions (by jsem se = bych se, by jsi se = by ses...). --TomášPolonec (talk) 08:40, 28 April 2023 (UTC)
- @Benwing2: It looks you can totally model the position of se on the Slovak conjugation of báť sa (where sa is the Slovak reflexive particle), @TomášPolonec, pls confirm that the order is the same. Pls also see comments you may have missed above. Anatoli T. (обсудить/вклад) 01:00, 28 April 2023 (UTC)
- But let's not forget about the 2nd person singular forms ses and sis: ptal ses (past), byl ses ptal (past perfect), ptal by ses (present conditional) and byl by ses ptal (past conditional). Zhnka (talk) 18:53, 27 April 2023 (UTC)
@Atitarev Done. Benwing2 (talk) 23:00, 4 May 2023 (UTC)
- @Benwing2: Great work! The module/template looks cool and is the most comprehensive out there! It's perhaps a bit too wide and big. You will probably fix it soon enough.
- Not sure if you're aware of repeated forms on short passive participle: čištěn, čistěn, etc. Anatoli T. (обсудить/вклад) 04:02, 5 May 2023 (UTC)
- @Atitarev Do you have suggestions on how to reduce the size of the table? E.g. potentially we could hide the conditional forms separately. As for čištěn vs. čistěn, I have both of them in the table for čistit, are you referring to something else? Benwing2 (talk) 04:48, 5 May 2023 (UTC)
- @Benwing2: Sorry, my eyes let me down. Pls ignore "čištěn, čistěn" comment.
- As for the size - wrapping texts, automatic resizing? The fonts can be slightly reduced (like Bulgarian tables). If it can't be, it's not that critical. Anatoli T. (обсудить/вклад) 05:00, 5 May 2023 (UTC)
- @Atitarev Do you have suggestions on how to reduce the size of the table? E.g. potentially we could hide the conditional forms separately. As for čištěn vs. čistěn, I have both of them in the table for čistit, are you referring to something else? Benwing2 (talk) 04:48, 5 May 2023 (UTC)
- @Benwing2 I would have a suggestion for the visual side of the table. It looks good, but the lighter blue is very light and it kind of blends in with the neutral white background for me and my eyes have a little bit of trouble distinguishing where the header ends and the content begins. It's just an idea, it's not a big problem. --TomášPolonec (talk) 09:48, 5 May 2023 (UTC)
- @TomášPolonec This must depend on the particular monitor, as it looks OK for me. Can you experiment with different colors and see which one works for you? Just edit the code at Module:User:Benwing2/cs-verb and preview User:Benwing2/test-cs-conj. I have used blue because that's what other Slavic conjugation and declension templates use, so you might try different shades of blue. Benwing2 (talk) 17:47, 5 May 2023 (UTC)
- I'd also prefer keeping to shades of blue :p Vininn126 (talk) 18:43, 5 May 2023 (UTC)
- @Benwing2 Something just slightly darker might do the job:
#e7f3ff
in comparison to the current#eff7ff
, which seems almost white to me. But of course, as you said, it will depend on the particular monitor as well. --TomášPolonec (talk) 19:30, 5 May 2023 (UTC)- @Vininn126, TomášPolonec I applied your change. This resulted for me in the two shades of blue being too close, so I darkened the other one a bit. Let me know how it looks for you. Benwing2 (talk) 21:00, 5 May 2023 (UTC)
- @Benwing2 Thanks, that's better. --TomášPolonec (talk) 23:12, 5 May 2023 (UTC)
- Cheers! Vininn126 (talk) 06:22, 6 May 2023 (UTC)
- @Vininn126, TomášPolonec I applied your change. This resulted for me in the two shades of blue being too close, so I darkened the other one a bit. Let me know how it looks for you. Benwing2 (talk) 21:00, 5 May 2023 (UTC)
- @TomášPolonec This must depend on the particular monitor, as it looks OK for me. Can you experiment with different colors and see which one works for you? Just edit the code at Module:User:Benwing2/cs-verb and preview User:Benwing2/test-cs-conj. I have used blue because that's what other Slavic conjugation and declension templates use, so you might try different shades of blue. Benwing2 (talk) 17:47, 5 May 2023 (UTC)
declension of compound nouns like hřib satan
[edit]@Atitarev, Vininn126, TomášPolonec, Zhnka How do you correctly decline terms like hřib satan, hřib koloděj, hřib kovář in the plural? In each of these three cases, the second term by itself is animate but hřib is normally declined inanimate. So do you get hřiby satani or hřiby satany (or even hřiby satan)? Benwing2 (talk) 04:29, 4 May 2023 (UTC)
- I think each element declines according to its non-compound declination. Vininn126 (talk) 08:16, 4 May 2023 (UTC)
- The second element in all these names is indeclinable, also in sýkora uhelníček, pánev wok and babočka admirál. Those second elements are just attributes (nominativus appellativus), they are usually not declined if they are nouns. --TomášPolonec (talk) 09:57, 4 May 2023 (UTC)
- @TomášPolonec Thanks! I wonder if it's different for Czech though. The Wikipedia page babočka admirál has an image titled Housenky babočky admirála, which suggests that both parts decline. Similarly, there are only 6 hits for hřiby satan (three of which are not real hits), but 89 hits for hřiby satany and around 12 for hřiby satani. I find an academic article titled NEOBVYKLÁ LOKALITA HŘIBU SATANA – RUBROBOLETUS SATANAS –VPRAZE NA STRAHOVĚ which cites another one Hřiby satani u Vysoké nad Labem. On the other hand, locative pánvi wok predominates over pánvi woku; I'm actually having a hard time finding any real hits for the latter but the former has thousands. User:Solvyn, any thoughts? Benwing2 (talk) 10:31, 4 May 2023 (UTC)
- Yes, I am sorry, I was typing quickly, I wanted to write that they can be indeclinable. I haven't done the research for it, but I think it is an alternative option. Sorry for the confusion. --TomášPolonec (talk) 10:57, 4 May 2023 (UTC)
- @TomášPolonec Thanks! I wonder if it's different for Czech though. The Wikipedia page babočka admirál has an image titled Housenky babočky admirála, which suggests that both parts decline. Similarly, there are only 6 hits for hřiby satan (three of which are not real hits), but 89 hits for hřiby satany and around 12 for hřiby satani. I find an academic article titled NEOBVYKLÁ LOKALITA HŘIBU SATANA – RUBROBOLETUS SATANAS –VPRAZE NA STRAHOVĚ which cites another one Hřiby satani u Vysoké nad Labem. On the other hand, locative pánvi wok predominates over pánvi woku; I'm actually having a hard time finding any real hits for the latter but the former has thousands. User:Solvyn, any thoughts? Benwing2 (talk) 10:31, 4 May 2023 (UTC)
@Benwing2: I tried searching for genitives of these terms in korpus.cz and in Google books, i. e. combinations of both variants of the genitive of "hřib" (hřiba and hřibu) with genitive or nominative of satan/koloděj/kovář.
Results in www.korpus.cz:
hřib satan [20] | hřib koloděj [21] | hřib kovář [22] | ||||
satana | satan | koloděje | koloděj | kováře | kovář | |
hřiba | attested | not attested | attested | not attested | attested | not attested |
hřibu | attested | attested | attested | not attested | attested | not attested |
Results in Google books:
satana | satan | koloděje | koloděj | kováře | kovář | |
hřiba | attested | not attested | not attested | not attested | attested | not attested |
hřibu | attested | not attested | attested | not attested | attested | not attested |
So it seems that declining both elements prevails. --Jan Kameníček (talk) 00:18, 21 May 2023 (UTC)
- @Jan.Kamenicek Thanks! Any idea about terms like babočka paví oko and bršlice kozí noha? I am guessing in this case the second part is indeclinable but I'm not sure. Benwing2 (talk) 00:46, 21 May 2023 (UTC)
- @Benwing2: I am afraid that they all have to be attested individually.
- babočka paví oko has both variants (with first element declined and with both elements declined) in korpus.cz
- babočka admirál has there only the genitive babočky admirála but the accusative of both babočku admirála and babočku admirál. Google Books seem to attest both genitives, so I would accept both kinds of declination here.
- bršlice kozí noha has only forms with both elements declined in korpus.cz. --Jan Kameníček (talk) 01:50, 21 May 2023 (UTC)