Module talk:bg-headword

From Wiktionary, the free dictionary
Jump to navigation Jump to search

Change requests

[edit]

@Benwing2: Hi. Could you please missing PoS's such as adverbs? Do you think we should (eventually) force an accent on terms with missing accents?

For nouns, can we have some inflected forms in the headwords - definite subject form, definite object form and indefinite plural form? If these are populated with stress marks. For example: бряг -> брегъ́т -> брега́ -> брегове́. These can be used to determine inflection patterns. What do you think? --Anatoli T. (обсудить/вклад) 12:49, 16 March 2020 (UTC)Reply

@Atitarev Sure, I can do these things. Benwing2 (talk) 13:12, 16 March 2020 (UTC)Reply
@Benwing2: Thanks, maybe optional for now? There are too many entries to go through. The ones with accents populated should be considered right for stress patterns.
BTW, The current declension templates don't allow stresses for ALL forms and no template for when the stress is on the ending. --Anatoli T. (обсудить/вклад) 13:19, 16 March 2020 (UTC)Reply
@Atitarev: Sorry, what do you mean "don't allow stresses for ALL forms and no template for when the stress is on the ending"? Benwing2 (talk) 13:42, 16 March 2020 (UTC)Reply
@Benwing2: 1. If you look at the infection table at телефо́н (telefón), only two forms have stresses, the rest is taking it from the page title. 2. In бряг (brjag), I want to show stresses on endings - брегъ́т, брега́, брегове́, брегове́те, бря́га. --Anatoli T. (обсудить/вклад) 13:53, 16 March 2020 (UTC)Reply
@Atitarev: Yeah I'm gonna totally rewrite the noun inflection tables so there's a single template {{bg-decl-noun}} or {{bg-ndecl}} along with:
  1. common plural patterns ( [which automatically palatalizes final к, г, х, which appears to be regular], ++и [which does not palatalize final к, г, х], +ове, +о́ве, +ове́, +ьове, etc.)
  2. indications of (a) reducible nouns (maybe using *), (b) nouns that flip between -ръ- and -ър- (maybe using ър), (c) nouns that flip between -я- and -е- (maybe using я if this isn't predictable)
  3. ways of specifying different stems (e.g. for бряг there are are least three, бря́г-, брег- and брегове́-, although most of the time the above indications should be enough and we shouldn't have to manually specify different stems; but there are totally irregular nouns like чове́к, pl. хо́ра or archaic лю́де)
  4. as a last resort, ways of overriding individual forms
  5. ways of adding footnotes, as in the current Russian templates
I'm not sure exactly what syntax I'll use. There are at least two examples that work differently: (1) the current Russian noun template syntax, which might look like {{bg-ndecl|бряг|я+ове́}}, (2) the current Latin template syntax, which might look like {{bg-ndecl|бряг<я+ове́>}}. The advantage of the Latin syntax is that it is somewhat more intuitive to specify the declension of multiword expressions. E.g. for senātus populusque rōmānus you'd say
{{la-ndecl|senātus<4> populus<2>que rōmānus<+>}},
indicating that senātus is 4th declension, populus is 2nd declension and rōmānus is an adjective; and for pars ōrātiōnis you would say
{{la-ndecl|pars<3.F.abl-e-occ-i> ōrātiōnis}},
indicating that pars is third-declension, feminine, ablative singular in -e or occasionally -ī, and ōrātiōnis is invariant. The equivalent in Russian template syntax might be
{{la-ndecl|4|senātus|2|populus|_|que|$|rōmānus|+}} and
{{la-ndecl|3|pars|F.abl-e-occ-i|ōrātiōnis|$}}.
Both syntaxes can handle alternative declensions, e.g. Russian
{{ru-noun-table|ка́мень|*m|or||*m-ья|каме́н}}
where the equivalent in Latin syntax would be something like
{{ru-noun-table|((ка́мень<*m>,ка́мень/каме́н<*m-ья>))}} or maybe {{ru-noun-table|((ка́мень<*m>,ка́мень<*m-ья/каме́н>))}}. Benwing2 (talk) 15:01, 16 March 2020 (UTC)Reply
@Atitarev: BTW for Bulgarian nouns I might have {{bg-noun}} follow the same syntax as {{bg-ndecl}} (this is how {{ru-noun+}} and {{la-noun}} work), rather than having separate arguments to explicitly indicate the plural, definite, and count forms. Benwing2 (talk) 15:10, 16 March 2020 (UTC)Reply
@Benwing2: Thanks, I don't understand a few things in your post but it's fine. It looks you're making sense of the project. I will work on more cases, entries, examples. --Anatoli T. (обсудить/вклад) 22:14, 16 March 2020 (UTC)Reply

Support for diminutives

[edit]

@Benwing2: diff - thanks for that. I suggested once to add support for Russian diminutives but it was rejected - the argument being too many possible diminutives (even levels of diminutives) and the senses are not the same for each diminutive (they could also be endearing or somewhat pejorative). I still think it's a good idea as it would take less resources to provide common Russian diminutives in the headword. So this improvement (support dim= for diminutives of nouns) can benefit other languages. Also used for Dutch and Polish, at least. Please consider later. --Anatoli T. (обсудить/вклад) 01:42, 3 April 2020 (UTC)Reply

Unknown stress or inflections

[edit]

From Talk:вям

@Benwing2 Hi. Please add an additional parameter/method for when the stress or inflection is unknown for rare or partially attested words. I'm sure you did it for some obscure Russian terms but I can't find it. --Anatoli T. (обсудить/вклад) 05:25, 5 May 2020 (UTC)Reply

@Atitarev I'll have to think how to do this. Benwing2 (talk) 14:01, 5 May 2020 (UTC)Reply

Acceleration tags for adverbs?

[edit]

Hi @Benwing2,

{{bg-adv}} is used to specify the (optional) comparative and superlative forms of adverbs; I was wondering if we could add acceleration tags so that creating entries for those forms could be simplified. I've just gone and changed the handful of extant Category:Bulgarian adverb forms to use e.g. {{head|bg|comparative adverb}} and {{infl of|bg|whatever||comd}}, so they should serve as viable examples of what accel tags we'd need. If you're overbooked, I can try to make that code change - I'd only ask for a link to a good example on adding accel tags.

Thanks,

Chernorizets (talk) 10:32, 21 October 2023 (UTC)Reply

@Chernorizets It shouldn't be too hard to add. See Module:es-headword#L-242 for an example of adding the accelerator 'form' key, which should take the value "comd" here. You probably also have to specify the 'pos' key to have the value "comparative adverb", since it would default to "adverb form", which isn't right. I would modify insert_infl() to take an additional param for the accelerator structure, and pass it in so it has the appropriate value e.g. {form = "comd", pos = "comparative adverb"} or similar for the superlative. You shouldn't need to modify Module:accel/bg with any additional rules. Benwing2 (talk) 20:17, 21 October 2023 (UTC)Reply
@Benwing2 thanks! I'll take a look. Chernorizets (talk) 22:33, 21 October 2023 (UTC)Reply
@Benwing2 done - I tried it on the adverb безшумно and it did the right thing. Chernorizets (talk) 04:59, 22 October 2023 (UTC)Reply
@Chernorizets Cool, and you are right to use the "comparative" and "superlative" form values; I forgot about the special handling of these in Module:accel. Benwing2 (talk) 05:14, 22 October 2023 (UTC)Reply

Diminutive forms of adverbs

[edit]

@Benwing2 I'm considering adding support for diminutive forms of Bulgarian adverbs. This is found in temporal adverbs, e.g. сега (sega, now)сегичка (segička), рано (rano, early)раничко (raničko). Such diminutives are often colloquial or dialectal, and have certain stylistic properties. Any concerns? Chernorizets (talk) 01:36, 23 October 2023 (UTC)Reply

@Chernorizets I think this is fine; we already have diminutives added to Russian nouns and maybe adjectives and adverbs as well. One thing to keep in mind is that some months ago I added support for diminutives, augmentatives, endearing forms, pejoratives, etc. to Italian noun headwords and populated some common nouns based on the Treccani dictionary. See libro and famiglia for examples. This resulted in some complaints that many of these forms are rare or dialectal and would be better placed in the "Derived terms" section below. I did add support for qualifiers appended to specific terms; you can see an example in libro, where one of the headword terms has a "literary" qualifier by it, and a couple of terms qualified as "rare" or "literary" in famiglia. I'm going to clean this up at some point but the issue I'm running into is that I'm not a native speaker so I don't have a good sense of which terms are common and which aren't. Since you are a native speaker of Bulgarian you shouldn't run into this issue, but I'd advise either not adding the rare or dialectal terms to the headword, or adding support for qualifiers to the diminutive forms and including appropriate qualifiers in the headword. If you want to do the latter, you can reuse the Italian code that implements this; see Module:it-headword#L-335 for the top-level entry point, and the two functions or so above this that implement the actual parsing. What it does essentially is use parse_inline_modifiers() from Module:parse utilities to parse the qualifier modifiers specified using the inline modifier syntax like |dim=famigliòla,famigliuòla<q:literary>,famiglìna<q:rare>, and store the result into an object that is then stored directly into the inflections field that is passed to Module:headword; the latter module already knows how to display qualifiers attached to inflections. Module:headword has good documentation for how the inflections field works, and Module:parse utilities should also have pretty good docs. There's a bit of extra complexity in the Italian code to handle different types of separators (not only commas, but also slashes and semicolons) in the input, which is preserved in the output (the intention is to use the separators for grouping at different levels, e.g. semicolon to separate groups of similar terms, and slash to group together terms that are trivial variants of each other). There's also some extra code to remove non-final accents when creating links. Neither of these extra bits of complexity are needed for Bulgarian; accents are automatically stripped and the different types of separators are probably unnecessary. Benwing2 (talk) 02:42, 23 October 2023 (UTC)Reply
@Benwing2 we'd def only be using this for diminutive forms that aren't rare, but rather in common circulation. Thanks for the info! Chernorizets (talk) 04:59, 28 October 2023 (UTC)Reply

Nix "splithyph" and "nolinkhead" parameters or make them useful

[edit]

Hi @Benwing2,

This module currently "supports" the parameters splithyph and nolinkhead. There is no code that explicitly uses the former, and there is broken code that uses the latter, relying on the uninitialized variable auto_linked_head. Per YAGNI, I'm inclined to simply remove them, unless you think that one or both should be made functional for Bulgarian headwords. Lemme know.

Thanks,

Chernorizets (talk) 05:04, 28 October 2023 (UTC)Reply

@Chernorizets IMO nolinkhead is useful; it's used in general for multiword terms that have been borrowed wholesale from some other language and thus it doesn't make sense to link the individual words (cf. a fortiori in English, and lots of English phrases borrowed wholesale into other languages). splithyph is less obviously useful so you might just remove it; it depends on whether Bulgarian frequently creates compounds joined with a hyphen. The idea is that in languages like English and French that frequently create hyphenated compounds, we want to autolink the components of hyphenated compounds if there aren't any spaces in the compound, but we may or may not want to do the same when there are both spaces and hyphens (vs. just linking the space-separated components), so we provide a param to control this. This sort of thing doesn't make sense e.g. in German, which normally has closed compounds (i.e. the words are directly concatenated, without spaces or hyphens) and uses hyphens for different purposes. BTW something is currently broken in the autolinking functionality in {{bg-noun}}; for example, остър ъгъл (ostǎr ǎgǎl) should have the two individual words autolinked, but it doesn't. Benwing2 (talk) 05:36, 28 October 2023 (UTC)Reply
@Benwing2 thanks for the explanations. I'll put it on my list of things to look into. I found other headword modules where these params are functional, so I have examples. Chernorizets (talk) 05:57, 28 October 2023 (UTC)Reply

"first-singular present indicative" in verb headwords

[edit]

Hi @Benwing2,

I've noticed that after a recent change you made, all BG verb headwords now show whether the form is 1ps or 3ps present indicative. I can understand the utility for headwords that use an impersonal (3ps) form, because that's not the usual lemmatization for BG verbs, but it seems extraneous for the regular case. We already have the usual lemmatization documented in Wiktionary:About Bulgarian#Dictionary forms. Was there a discussion I missed where it was agreed that all verbs should now show this in their headword line?

Thanks,

Chernorizets (talk) 01:14, 1 December 2023 (UTC)Reply

@Chernorizets: I am not sure about change but I think it would be beneficial to display the lemma along with its description in the conjugation table (or somewhere).
Compare:
  1. Russian жить (žitʹ) has "infinitive" + lemma on the top of the conjugation table, making it clear what the lemma is
  2. Bulgarian живе́я (živéja) has no such thing in the table. I suspect that was the reason for that change by @Benwing2.
Anatoli T. (обсудить/вклад) 02:18, 1 December 2023 (UTC)Reply
@Atitarev the 1ps present slot in the conj table for живе́я (živéja) is bolded and unclickable, because it matches the lemma form. Unlike Russian, we don't have a separate lemma form that's distinct from the conjugation - the lemma form is the 1ps present indicative. Macedonian - another language without an infinitive - chooses to lemmatize at the 3ps present indicative. Either way, even in Russian, headword for жить doesn't specify that it's an infinitive - you have to go to the table.
Maybe Bulgarian verb headwords could be something like:
живе́я (živéja) lemma form, impf
where "lemma form" is a clickable link to the section in WT:ABG explaining lemma forms. Chernorizets (talk) 02:34, 1 December 2023 (UTC)Reply
@Chernorizets. @Benwing2: I'll leave it to you, as long as it's clear in the entry that this is the lemma. Keep up the good work. Anatoli T. (обсудить/вклад) 04:07, 1 December 2023 (UTC)Reply
@Chernorizets @Atitarev This was based on the discussion in the Beer Parlour about converting Latin verb definitions to read "to ..." instead of "I ...", where some users requested that there be an indication that the lemmatized form is not the infinitive. This didn't end up getting added for Latin, but I added the ability to display an explanatory note generally and thought it might be especially helpful for cases like Bulgarian and Macedonian, which (it seems to me) are less familiar to the average Wiktionary user than Latin and where (IMO) it's very confusing that Bulgarian uses pres1s and Macedonian uses pres3s. I don't have any particular attachment to the current format but I'd like to have *SOME* way of indicating what's going on; a clickable link is totally fine. The difference between Bulgarian and Russian is that the infinitive form is what's used in most languages and maps directly to the English infinitive, whereas for Bulgarian and other languages that don't lemmatize on the infinitive, there's a mismatch between the lemma form and the definition that could use some explanation. Benwing2 (talk) 06:01, 1 December 2023 (UTC)Reply
@Benwing2 I've started an appendix on Bulgarian verbs; it's very much WIP at the moment, but the link to the lemma form is going to be Appendix:Bulgarian verbs#Lemma form. I'll be adding some information in there about "impersonal" verbs that are lemmatized at the 3ps. One idea might be to make the aspect marker (e.g. impf) clickable, and having it link either to that, or to its parent section: Appendix:Bulgarian verbs#Dictionary conventions. Chernorizets (talk) 07:02, 2 December 2023 (UTC)Reply
@Chernorizets Not sure about having the impf be clickable as that might be confusing but I wonder whether it might work to have something clickable that reads [more info] or just [info] or [help] following the lemma, either before or after the translit. Benwing2 (talk) 20:07, 2 December 2023 (UTC)Reply