Jump to content

Wiktionary:Beer parlour/2020/October

From Wiktionary, the free dictionary

Request for rights

[edit]

Hi. I sometimes take a look through recent changes here, and would like to be able to patrol edits that are fine (I'm a global rollbacker, so I can and do filter recent changes to look for unpatrolled edits to investigate, but if I see edits that are fine there is nothing I can do right now). I couldn't find where to request patroller rights (Wiktionary:Patrollers is a redirect) but Wiktionary:Rollbackers says to request rollback here so I figured patroller requests made here would be fine as well.

Also, can I request local rollback? I have global rollback rights already, but since I plan to be watching recent changes more regularly I'd prefer to have local rights.

Finally, as a global rollbacker I have autopatrol rights on all wikis, meaning that my edits haven't been as visible to other users to confirm that they are okay. If indeed they are fine, can I be locally autopatrolled to confirm that? Thanks, --DannyS712 (talk) 04:12, 1 October 2020 (UTC)[reply]

We appreciate help with vandal-fighting, but I don't really see sufficient evidence of your familiarity with Wiktionary to merit being a patroller. It can be very challenging to patrol Wiktionary, because recognising potentially problematic edits often requires a great deal of background knowledge. Similarly, you have global autopatrol rights, so I'm sure you're a trustworthy person, but I wouldn't grant you local autopatrol, because you don't seem to have made any substantive edits. If you ever choose to help build the dictionary, I'm sure you will make mistakes like all newbies do, and then learn from them — as long as someone checks your edits, of course. —Μετάknowledgediscuss/deeds 04:55, 1 October 2020 (UTC)[reply]
Edits that I make outside of patrolling are more likely to be WikiGnome cleanup in nature, like my work implementing WT:NORM in [1] or cleaning up syntaxhighlighting errors, but I understand your concerns. DannyS712 (talk) 05:44, 1 October 2020 (UTC)[reply]
Implementing NORMs is a bot job, anyway. It's a better use of your time to do gnomish work that only humans can do 100% correctly, like fixing misspellings. —Μετάknowledgediscuss/deeds 07:23, 1 October 2020 (UTC)[reply]
@DannyS712, Metaknowledge BTW in my experience there are (at least) two kinds of bad edits: those that are basically vandalism, which are fairly easy for anyone to spot and correct, and those that are (at least somewhat) in good faith, but are wrong because the editor doesn't know (and often doesn't care to learn) the formatting rules or the language being edited. The second type of edits are much more problematic because the people doing them often make large amounts of changes and fixing (or even spotting) them can require significant domain knowledge of the language in question. Furthermore the people who do these sorts of edits are often not very receptive to being told they need to change their ways. I think Metaknowledge's point is that we especially need people who can fix the second type of problems, but to do this requires at the very least a good knowledge of standard Wiktionary practices. Benwing2 (talk) 01:05, 4 October 2020 (UTC)[reply]
I focus on the first type, but the explanation makes sense DannyS712 (talk) 01:07, 4 October 2020 (UTC)[reply]
@Metaknowledge would you mind taking another look? The patrolling I would plan on doing would be marking as reviewed edits I revert, so that other patrollers don't waste their time on them, or clearly valid edits. Thanks, --DannyS712 (talk) 03:40, 29 October 2020 (UTC)[reply]
Are the edits you rollback not marked as patrolled? I haven't observed this. As for "clearly valid edits", this is where the complexity comes in. An edit may be valid, but fail to follow formatting niceties or orthographic principles that you aren't aware of. The only way that I'll be able to have complete faith in your abilities to patrol others' edits is if you make good substantive edits yourself. In other words, I only trust contributors to this wiki. —Μετάknowledgediscuss/deeds 17:55, 29 October 2020 (UTC)[reply]
That's why I noted "clearly" valid, like a simple typo fix, or edits outside of the content namespace (like a post here) than are fine. Edits that are rolled back are automatically silently marked as patrolled, but manually reverted edits are not DannyS712 (talk) 20:25, 29 October 2020 (UTC)[reply]

Umm… do we really need this category? Do we need etymologies confirming that a word like accommodating is just accommodate with -ing? Or that shipping is just ship with -ing? Maybe I’m just being a smart aleck here, but last I recalled the consensus was that noncanonical, derivative forms should not have etymologies, presumably because they would be redundant (and uninteresting), but I see that we have some exceptions that are more elaborate.

Personally I would prefer that we at least add some sort of disclaimer, or reminder, in the category pages that they are intended for very specific derivations, but I suppose that this confusion isn’t common enough to justify that. —(((Romanophile))) (contributions) 04:47, 2 October 2020 (UTC)[reply]

But why shouldn't they constitute category? I believe you're missing the point. Such categories don't necessarily serve to clarify anything. There are entire categories that contain 100% the same set of words as one another. They only exist for the purposes of categorization, which may or may not come in handy to someone who wants to go through a list of all such words.
Also, the etymological difference between a recent formation with -ing and one that was inherited from Middle English or earlier is completely irrelevant to whether a word belongs in the category. Such categories do not distinguish between synchrony and diachrony. For an extreme example: A modern Greek word ending in -σις that was inherited all the way from a Proto-Indo-European word with its ancestral form in *-tis still belongs in Category:Ancient Greek words suffixed with -σις, even though the formation may be 5000 years old or older. — 69.120.64.15 15:58, 10 October 2020 (UTC)[reply]
My feeling is that such words that are just present participles should not be in this category, but nouns should be. SemperBlotto (talk) 16:01, 10 October 2020 (UTC)[reply]
I agree that they should be distinguished, as something like Category:English words suffixed with -ing (nominal) versus Category:English words suffixed with -ing (verbal). Though, this would be a hell of a lot of additional work, and I would argue that the nominal use of -ing is so productive that basically any gerund can be used as a noun, even when alternative nominal formations exist. (For an arbitrary example: “I am studying dechlorinating.”) The only difference between a noun like hiking in the sentence Hiking is a popular activity and a gerund like in the sentence He enjoys hiking is that in the former case we can interpret the word as a sort of "set phrase" which everyone agrees to be a noun, whereas for words like dechlorinating or thematicizing there is no such broad, implicit consensus (to which contrast dechlorination, thematicization). And lumping them all together as "-ing words" is actually useful and economical, because it reflects the fluidity of -ing to be used for various parts of speech (noun, gerund, verb, adjective) interchangeably. — 69.120.64.15 16:40, 10 October 2020 (UTC)[reply]

Learned borrowings template

[edit]

I am in trouble when using Template:learned borrowing. It does not behave like other kinds of borrowings e.g. Template:semantic loan. I expected it to add the word at 1) Cat:XXX learned borrowings from XXX and 2) at the Derived general category. Example: all modern greek words that are learned borrowings from Ancient Greek, and they are many. Manual application at Category:Greek learned borrowings from Ancient Greek. On the other hand, Template:borrowed for regular borrowings works fine. The distinction of the two kinds is marked in greek dictionaries and is extremely frequent. Thank you. ‑‑Sarri.greek  | 07:44, 3 October 2020 (UTC)[reply]

I don't understand what isn't working the way you want it to. If I put {{learned borrowing|el|grc|...}} on a Greek entry, it gets automatically categorized into CAT:Greek terms borrowed from Ancient Greek, CAT:Greek learned borrowings, and CAT:Greek terms derived from Ancient Greek. What's the problem? —Mahāgaja · talk 12:47, 3 October 2020 (UTC)[reply]
My training in etymology, @Mahagaja is minimal and I cannot support my view adequately. Probably I am wrong. I understand that learned is a subcategory of borrowings. Like all the other kinds, which categorize by language donor and lagnuage receiver. It is impossible to have a direct modern greek loan from ancient greek, because we have never met. A Category:Modern Greek borrowed from Ancient Greek seems to me impossible. Possible is either inheritance or 'lbor' the only channel being texts and authors. Why does this template place the words at Cat.Borrowed? Derived, yes, it is a parallel categorization for all borrowings. Example:
Borrowed terms = XX borrowings from XX. Clear borrowings, without characterisitcs of the following subcategories which are kinds of borrowings:
  • Calques = XX calques from XX. eg eng.cal But they are placed outside Borrowings. why?
  • Semantic loans = XX sl from XX. eg eng.sl Also placed outside. Are they not borrowings too?
  • Unadapted borrowings = XX upadated borrowings from XX. But the words are also thrown into normal borrowings, which are assumed adapted, otherwise they would be marked. eg eng.unadpt from lat
  • Learned borrowings: eg eng.lbor Here, we do not have the pattern XX learned borrowings from XX at all, but they are directed to normal borrowings.
  • etc
Of course, +learned could be attached to subcategories as well, which would complicate things. So, I focus strictly on 'learned borrowings'. ‑‑Sarri.greek  | 16:51, 3 October 2020 (UTC)[reply]
Well, I would say that probably everything in CAT:Greek terms borrowed from Ancient Greek should be in CAT:Greek learned borrowings from Ancient Greek, since learned borrowing is the only type of borrowing that exists between these two languages. I don't know why the category tree is the way it is, but I don't disagree with the way the templates categorize entries. —Mahāgaja · talk 17:47, 3 October 2020 (UTC)[reply]
@Mahagaja, Sarri.greek I am probably going to change the categorization so that you get 'FOO learned borrowings from BAR' (and similarly for semi-learned borrowings) unless someone objects. Benwing2 (talk) 00:05, 4 October 2020 (UTC)[reply]
@Benwing2 thank you. I do not know, if the defintion of 'borrowings' here in en.wiktionary is broad (all kinds of borrowings) or strict (adpated, not learned, not calques, not ...,). The tree of borrowings could also be
  • Loans (all)
    • Borrowings (strict)
    • Learned borrowings.
    • Calques
    • etc
I do not know much about etymology, and I do not wish to upset things. I am sure, that there are experts here who can decide, you included. ‑‑Sarri.greek  | 00:21, 4 October 2020 (UTC)[reply]
@Sarri.greek The {{borrowed}}/{{bor}} template is normally reserved for direct borrowings. This includes learned and semi-learned borrowings (which can also use the more specific templates {{learned borrowing}}/{{lbor}} and {{semi-learned borrowing}}/{{slbor}}). Other sorts of borrowings should usually use the appropriate template: {{calque}}/{{cal}} (or the more specific {{semantic loan}}/{{sl}} and {{partial calque}}), {{orthographic borrowing}}/{{obor}}, {{phono-semantic matching}}/{{psm}}. Note that the categories associated with some of these other sorts of borrowings are typically subcategories of some of the "... borrowed ..." categories (not always in a consistent fashion; this should be fixed). Benwing2 (talk) 00:34, 4 October 2020 (UTC)[reply]

──────────────────────────────────────────────────────────────────────────────────────────────────── @Sarri.greek I implemented the more specific categories for all four subtypes of borrowings (learned, semi-learned, orthographic and unadapted). I also changed Category:Greek learned borrowings from Ancient Greek to use {{auto cat}} now that it recognizes these categories; please change the words in this category to use {{lbor}} and not set the category manually, thanks! Benwing2 (talk) 05:52, 8 October 2020 (UTC)[reply]

Thank you, thank you @Benwing2, now, etymologies for Greek can be done. I hope it will be usefual for other languages too. You are great. ‑‑Sarri.greek  | 08:07, 8 October 2020 (UTC)[reply]
But i still cannot move them from Cat.borrowed to Cat.lbor. They are still repeated to both. e.g. ακρόαση ‑‑Sarri.greek  | 08:38, 8 October 2020 (UTC)[reply]
@Mahagaja I too have been wondering why there isn't a technical solution for this. It's a waste of time to go through the entire category CAT:Greek terms borrowed from Ancient Greek and replace all {{bor}} with {{lbor}} (or {{slbor}}). Why not build in a feature to the modules on which {{bor}} etc. run that recognizes a particular set of languages as classical languages and others as modern, and then modify {{bor}} accordingly to automatically categorize learned borrowings? A modern word borrowed from a classical language is definitionally a learned borrowing, and there will never, ever be an exception. — 69.120.64.15 16:07, 10 October 2020 (UTC)[reply]
To addend and clarify my point: I don't mean to say that some aren't semi-learned, just that semi-learned borrowings should also be considered learned borrowings. They effectively have two etymologies, in the same way that a word can be inherited yet have borrowed semantics from another language and therefore be in both inherited and borrowed categories. — 69.120.64.15 16:51, 10 October 2020 (UTC)[reply]
@Sarri.greek In response to your last concern about repeated categories, this is intentional and mirrors the previous behavior. Perhaps we shouldn't repeat things like this, but IMO it makes a certain amount of sense as all learned borrowings are borrowings. Benwing2 (talk) 01:45, 11 October 2020 (UTC)[reply]
@Benwing2, thank you again! It is a matter of defintion of 'borrowing'. See above: Loans (all). Like Category:en:Beverages includes Category:en:Coffee whose word-members are not repeated in Beverages. This is not only about classical languages. It is about different kinds of loans. ‑‑Sarri.greek  | 02:07, 11 October 2020 (UTC)[reply]
@Sarri.greek Yes, it's true that we tend not to repeat words in lower topic categories into higher categories, although we do also in the case of categories like Category:Greek terms borrowed from English and Category:Greek terms derived from English (there was actually a vote on this issue). Benwing2 (talk) 03:42, 11 October 2020 (UTC)[reply]

FWOTD should be more obviously editable

[edit]

The English word of the day has a nice "edit" link by it, but the FWOTD does not. There was a typo in it ("adress") and it was not obvious to me how to fix it, so someone who doesn't know the template system will have no hope of finding it. Benwing2 (talk) 00:30, 5 October 2020 (UTC)[reply]

I agree, insofar as we ensure that all the pages have been protected beforehand. @Lingo Bingo DingoΜετάknowledgediscuss/deeds 01:07, 5 October 2020 (UTC)[reply]
I'm fine with that, but obviously cannot implement that. By the way it seems that the entries are only protected with regard to creation? ←₰-→ Lingo Bingo Dingo (talk) 16:52, 5 October 2020 (UTC)[reply]
They need to be edit-protected (which is vastly more important). @Benwing2, maybe you can make that happen? —Μετάknowledgediscuss/deeds 16:55, 5 October 2020 (UTC)[reply]
Perhaps the implementation of an edit link can be delayed until the entries of the current and next few days are edit-protected. ←₰-→ Lingo Bingo Dingo (talk) 19:01, 5 October 2020 (UTC)[reply]

Splitting RFD non-English

[edit]

RFD non-English is getting large. How about splitting it into Latin and non-Latin scripts, or splitting out the Romance languages? Vox Sciurorum (talk) 13:09, 5 October 2020 (UTC)[reply]

It may seem like it with @Dentonius basically filibustering the deletion request pages because he doesn't like the concept itself (if we ever have an rfd for "that guy in the the band- you know, the one with the funny hair", I know how he'll vote). We also have some people who have made it their mission recently to clean up neglected corners of the project. I'm not sure how long the current volume is going to last. Chuck Entz (talk) 14:13, 5 October 2020 (UTC)[reply]
@Chuck Entz, filibustering: I nearly fell off my chair. :-) Don't get me wrong. There are lots of things I skipped over which I think should probably be deleted. I just saw a lot of things there that I like and which I thought could be improved in order to add value to this dictionary. The current set of people have their way of doing things but the future Wiktionarians, undoubtedly, will be very different from them ... And, on the contrary, I love the RFD page. That's the people's page right there. -- Dentonius (my politics | talk) 14:17, 5 October 2020 (UTC)[reply]
Well, describing it as filibustering wasn't accurate, but you were adding a whole lot of votes en masse. More to the point, rfd isn't a popularity contest: we're supposed to be assessing whether the entries meet our Criteria for inclusion. You disagree with a major chunk of that, but those are the rules that have been arrived at by over a decade of community consensus, debate and votes. In a way, rfd is a sort of judicial body, and you're doing the equivalent of acquitting a bunch of people because you don't like the idea of stop signs. Chuck Entz (talk) 15:24, 6 October 2020 (UTC)[reply]

My 2¢ on RFD non-English: Split according to close-knit language families. The language families which come up a lot get their own page: Germanic, Romance, Hellenic, Turkic, Finno-Ugric, Balto-Slavic, Japonic, Sino-Tibetan, Indo-Aryan, Semitic, ... . Anything not covered: Other. -- Dentonius (my politics | talk) 20:48, 5 October 2020 (UTC)[reply]

@Dentonius Language families aren't always a good fit for this purpose: there's a lot of overlap between Chinese, Japanese, Korean and Vietnamese in spite of the fact that they're unrelated, while Chinese people are a lot less likely to know Tibetan or Burmese than Japanese. Chuck Entz (talk) 15:24, 6 October 2020 (UTC)[reply]
As of a month ago, the RFD count through the end of 2019 was 43 for Romance languages (including Latin and Esperanto), 16 Germanic (including a giant Old English poly-RFD), 16 Latin script (Translingual, Asian, European, American languages), 20 Chinese and Japanese, 7 for Asian languages with non-Latin scripts, 5 Cyrillic, 3 Greek script, 2 Arabic. From my point of view, I have nothing meaningful to say about RFD requests for languages written in non-Latin scripts. I may comment on Romance, Germanic, and Translingual which is really Latin. Vox Sciurorum (talk) 16:00, 6 October 2020 (UTC)[reply]

All pre-reform spelling variants should be removed from the category. The category is for terms with obsolete senses, not spellings. Allahverdi Verdizade (talk) 14:38, 5 October 2020 (UTC)[reply]

Right. I removed the "obsolete" label. The correct category for these is Category:Russian superseded forms. --Vahag (talk) 05:35, 6 October 2020 (UTC)[reply]
Should we also remove these spellings from Category:Russian obsolete forms? How is it different from Category:Russian superseded forms? --Vahag (talk) 05:43, 6 October 2020 (UTC)[reply]
Obsolete has to do with use, and superseded with the decision of a national language institute. People might still use superseded forms in a diary, or on Facebook, but they won't use them in a résumé or a college application essay, because they are now seen as officially incorrect. Eventually, if the institute's decision holds firm and their prestige remains high, superseded forms will be rendered obsolete as well. —Μετάknowledgediscuss/deeds 06:14, 6 October 2020 (UTC)[reply]
Then both categories are appropriate. The only person still using pre-revolutionary orthography is User:Fay Freak. --Vahag (talk) 06:24, 6 October 2020 (UTC)[reply]
@Vahagn Petrosyan, Allahverdi Verdizade Thank Vahag for removing Category:Russian terms with obsolete senses. Currently a term like дѣло (dělo) goes into two categories: Category:Russian obsolete forms and Category:Russian pre-1918 spellings. The latter is a subcategory of Category:Russian superseded forms so we probably don't need to add that category in addition. Benwing2 (talk) 13:13, 7 October 2020 (UTC)[reply]

Welsh Mutation template on entries that do not undergo mutation

[edit]

Welsh words can undergo initial consonant mutation if they start with the letters p, t, c, b, d, g, m, ll, or rh (also initial vowels can undergo some changes, e.g. eglwys). In no circumstance can a word that begins with any other letter undergo any mutations. However, there exist quite a few Welsh entries which do not undergo mutations, but include the mutation template (e.g. heddwas, ffermwr), and others which omit the template (e.g. newydd, ffermio). I think there should be some consistency in these entries, either that every entry includes a mutation table regardless of initial consonant (which seems unhelpful to me), or that only entries which would actually undergo mutation should include a mutation table (which makes more sense to me).

Is this something that should be put to a vote? – Guitarmankev1 (talk) 15:13, 5 October 2020 (UTC)[reply]

I try never to put mutation tables on Irish entries that begin with non-mutatable consonants, but the practice of including them seems fairly well established for Welsh, with some exceptions as you noted. In my opinion, the only words that don't undergo mutation that definitely should include a table in my opinion are the ones that begin with a normally mutatable consonant, such as braf and gêm, which do not undergo soft mutation to *fraf (at least not in the standard language) and *êm. I'd prefer ffermwr and ffermio not to have tables, but I don't feel particularly strongly about it. —Mahāgaja · talk 16:47, 5 October 2020 (UTC)[reply]
On the other hand, I remember when I was learning Welsh that a lot of people (sometimes even the teacher!) mistakenly thought that ff underwent soft mutation to f and would say things like o fenest y tŷ (from the window of the house) in place of the correct o ffenest y tŷ. So maybe keeping the mutation template would be a good reminder to learners not to do that. Lack of a table might be interpreted as forgetfulness rather than deliberate omission. —Mahāgaja · talk 16:51, 5 October 2020 (UTC)[reply]
Indeed, there's an argument to be made both ways. While it could be helpful to remind learners of which consonants don't undergo mutation (e.g. ffenestr), it seems like it doesn't provide much info otherwise. Like, many prepositions undergo inflection in Welsh, but there's no table in ger that lists the non-inflected forms to remind learners not to inflect ger...
And I do very much agree that words which always resist normal mutation circumstances should have mutation tables which reflect that, like in the case of gêm. – Guitarmankev1 (talk) 21:24, 5 October 2020 (UTC)[reply]
@Guitarmankev1, Mahagaja Another possibility is just to modify {{cy-noun}} to automatically display no mutation in the headword on all words that don't mutate based on their initial consonant, and add a flag to indicate no mutation for words that don't mutate but begin with a mutating consonant. Benwing2 (talk) 00:49, 6 October 2020 (UTC)[reply]
It would have to be all the headword-line templates, though; braf, for example, is an adjective, and proper nouns resist mutation more than common nouns do. —Mahāgaja · talk 06:27, 6 October 2020 (UTC)[reply]
That isn't hard to do; if it's agreed to do it I can probably implement it. Benwing2 (talk) 06:36, 6 October 2020 (UTC)[reply]
It might be nice to have a CAT:Welsh words that resist mutation, too. We musn't forget, though, that some words resist one mutation but not the other (gêm resists soft mutation but not nasal mutation, for example). —Mahāgaja · talk 06:49, 6 October 2020 (UTC)[reply]
This can be done using a param like |nomut= that takes different values: |nomut=y or |nomut=1 or |nomut=all displays resists mutation or resists all mutations, |nomut=nasal displays resists nasal mutation, |nomut=soft,aspirate displays resists soft and aspirate mutations, etc. Terms that resist some but not all mutations should still have a mutation table, and if the situation is too complex to easily display in the headword, use a usage note (e.g. at braf). Benwing2 (talk) 07:04, 6 October 2020 (UTC)[reply]
I'm not sure all that is necessary, since it seems like we already have all the tools we need in order to express what we're trying to express. If we want entries to be explicit that non-mutatable consonants indeed do not have mutated forms, we can include {{cy-mut-auto}} on every entry by default. Then for unusual circumstances like braf or gêm we can use {{cy-mut-table}} with its existing parameters to list specific unusual mutations. Any unusual circumstances can also be explained in usage notes, as is currently the case with words like braf, nain, and tan. – Guitarmankev1 (talk) 13:41, 6 October 2020 (UTC)[reply]
This is silly and a waste of space:
Mutated forms of heddwas
radical soft nasal aspirate
heddwas unchanged unchanged unchanged
"Some of these forms may be hypothetical." There are no forms. Replace the box with a one line note or remove it entirely. Or add it to the head line, "heddwas m (plural heddweision, not mutable)". Vox Sciurorum (talk) 11:17, 7 October 2020 (UTC)[reply]
@Vox Sciurorum, Guitarmankev1 I completely agree. This is the same issue as giving a full declension table for indeclinable nouns and adjectives in languages like Russian. It makes no sense and just looks silly; a headword note is much more practical and helpful. If Guitarmankev1's concern is that having a |nomut=soft,aspirate param is too complex, then that's a well-taken concern and we can include the headword note only when there is no mutation at all, but when there is no mutation we should not have a mutation box. Benwing2 (talk) 13:10, 7 October 2020 (UTC)[reply]
It seems like the circumstances in question here are 1) words which begin with normally-mutable letters and undergo expected mutations (e.g. camlas), 2) words which begin with normally-immutable letters and expectedly do not undergo mutations (e.g. heddwas), 3) words which begin with normally-mutable letters but do not undergo all expected mutations (e.g. gêm), and 4) words which begin with normally-immutable letters but may undergo unexpected mutations (e.g. nain).
Entries for words in Category 1 should look as they currently do, with an automatic mutation table. I think that the current practice for entries in Categories 3 and 4 is acceptable as well, with a non-automatic mutation table showing the irregular mutation pattern and the irregularities being detailed in a usage note.
The main debate here appears to be what to do with entries in Category 2. The mutation tables for words which begin with normally-immutable letters are largely predictable even if you are unfamiliar with the word, except for rare Cat 4 exceptions. That is, if you assume that a word beginning in a non-mutable consonant doesn't undergo mutations, then you would be correct >99% of the time. I suppose that if we include something in the headline indicating that a normally-immutable word indeed does not mutate, then the worst that would happen would be the reader thinking "well duh". It's not technically incorrect but it seems redundant to me, like adding a note to vowel-terminating English entries that they do not have a different form when being used before a vowel sound, because of the existence of an (not a perfect comparison but hopefully you can see what I'm getting at). – Guitarmankev1 (talk) 13:47, 7 October 2020 (UTC)[reply]
I agree the entire table (complete with note about some forms being hypothetical. which is unnecessary and should be suppressed when no forms are being listed!) is probably excessive for cases where a word does not mutate. I am inclined to think a short mention on the headword line that something doesn't mutate (when it doesn't mutate at all) is helpful. If it does mutate but only in some ways, I'd think that's the kind of situation where the table (with some parameters set to "unchanged"), or at least a usage note, would be more appropriate, and a headword-line note would be redundant / unnecessary, no? I don't speak Welsh, so my perspective would be looking for this information to see how a short phrase would translate or to check what the mutation of a word was. If there are cases where a word that would be expected not to mutate nonetheless does mutate, then either a table or usage notes should definitely point that out. I also continue to think "words that start with [...] don't mutate" is the kind of thing we could document on WT:ACY...especially if we decide not to document it on headword lines...? - -sche (discuss) 15:57, 7 October 2020 (UTC)[reply]
PS the template on nain should not be generating a link (green to encourage someone to create an ACCELerated entry!) to "unchanged", lol. - -sche (discuss) 16:03, 7 October 2020 (UTC)[reply]
Yeah, the {{cy-mut-table}} template needs to be modified to allow non-linkable arguments! I might get around to that later if I have some time. – Guitarmankev1 (talk) 18:03, 7 October 2020 (UTC)[reply]
Looking at Module:cy-mut, a starting point is to put a guard of if mut1 or mut2 or mut3 then around the series of concatenations of result, with the else branch generating "This form does not mutate." Vox Sciurorum (talk) 18:51, 7 October 2020 (UTC)[reply]
@Guitarmankev1, Mahagaja, Vox Sciurorum, -sche I converted Welsh noun, proper noun, adjective and verb templates to use Lua (Module:cy-headword) and added the non-mutable indication in the headword. We should now remove the {{cy-mut-auto}} templates from non-mutable words (which are in Category:Welsh non-mutable terms, although it's incompletely populated as of yet). In the process I cleaned up a lot of things that were incompletely implemented in the template code. There are a ton of adjectives where the comparative type wasn't properly specified; they display a red unknown comparative message and are placed in Category:Requests for inflections in Welsh adjective entries. I eliminated |alt1s= in verbs in favor of |1s2= (if |alt1s= was given without |1s=, you now need to use {{cy-verb|1s=+|1s2=FOO}}). I merged the singulative support in Template:cy-noun/new into Template:cy-noun; if there are multiple singulatives, you specify them with |sg2=, |sg3=, ... instead of |pl2=, |pl3=, ..., as was the case with Template:cy-noun/new. Otherwise, the new templates should be compatible with the old ones. There is additional support for various things, e.g. {{cy-noun}} supports |f=, |f2=, ... for female equivalent, |m=, |m2=, ... for male equivalent, |dim=, |dim2=, ... for diminutive. There were some badly specified words that I need help with:
  1. da-da (gender was just p, I converted to m-p and f-p)
  2. teithi (likewise)
  3. jac-do, jam, je (there was no mutation rule for initial j; I made them non-mutable)
  4. ès, (there was no mutation rule for initial grave-accented vowels; I made them work like other vowels)
Benwing2 (talk) 06:49, 10 October 2020 (UTC)[reply]
Forgot to mention, all of nouns, proper nouns, adjectives and verbs support |nomut=1 to indicate non-mutable terms that would otherwise be mutable, and |mut=1 to indicate mutable terms that would otherwise be non-mutable. Benwing2 (talk) 07:26, 10 October 2020 (UTC)[reply]
@Benwing2: Thanks for all your hard work! Yes, j is also an immutable consonant, and vowels with diacritics behave exactly the same as vowels without them. —Mahāgaja · talk 08:39, 10 October 2020 (UTC)[reply]
Still seems strange to me to make something that's accurately predictable 99% of the time so explicit in the headword-line by default... – Guitarmankev1 (talk) 20:24, 10 October 2020 (UTC)[reply]
@Benwing2: I went through Category:Welsh non-mutable terms and removed all all instances of {{cy-mut}} on non-mutable entries. You mentioned that this list may be incomplete - are you referring to non-lemma entries? Any other types of entries you may have had in mind that aren't on that list? – Guitarmankev1 (talk) 15:03, 16 October 2020 (UTC)[reply]
I'm also finding many "form of" entries which had mutation templates for non-mutable words, but since they use the standard "head" template instead of the cy-specific ones, they lack a "not mutable" line in the headword (e.g. ffyrdd, chwardd, and possibly hundreds of others). Any idea on how to resolve that? – Guitarmankev1 (talk) 16:34, 16 October 2020 (UTC)[reply]

@Benwing2: The accelerated entry defaults generated by the links in {{cy-mut}} still contain the template for {{cy-mut-auto}}. I'm not knowledgeable enough with the ins-and-outs of accelerated entry stuff to know how to change the default template to use {{cy-mut}}... are you? Might save your bot some effort in the future! – Guitarmankev1 (talk) 23:19, 11 October 2020 (UTC)[reply]

@Guitarmankev1: Thanks! I fixed it; it's in Module:accel/cy. Benwing2 (talk) 23:22, 11 October 2020 (UTC)[reply]
@Guitarmankev1 Apologies for the late response. What I meant above by "the list may be incomplete" is that when you add a new category to a template, it takes a few days for the category to be completely populated. Since you acted a couple of days after I added the category, it was probably completely populated already. As for non-lemma forms using {{head}}, that is the standard practice in most languages, but I can change it by bot so that they use a Welsh-specific template that correctly generates the not mutable tag. Benwing2 (talk) 13:55, 26 October 2020 (UTC)[reply]

New Cantonese–English dictionary

[edit]

FYI: https://languagelog.ldc.upenn.edu/nll/?p=48643Justin (koavf)TCM 05:29, 6 October 2020 (UTC)[reply]

Recently created accounts and RFV / RFD

[edit]

Hi all, We should probably prevent recently created accounts from initiating RFV and RFD procedures. As it stands, a mischievous user could create a phony account, use a VPN, and attack entries. Due to our existing policies, the requests would have to be considered. Now, imagine a skilled attacker using several hijacked PCs to do this.

Take for example: Wiktionary:Requests_for_verification/Non-English#from_me_born started by Dubitator. This person showed up last night, didn't contribute anything to our dictionary, and began an RFV. This person is obviously an experienced user who has another account here. -- Dentonius (my politics | talk) 03:57, 7 October 2020 (UTC)[reply]

And you are not a recently created account? DTLHS (talk) 04:59, 7 October 2020 (UTC)[reply]
There are a couple of IPs who have been active in rfv and rfd in the past month or two, so we don't have grounds for running a check on this account- or I would have done so already. The name (Latin for "doubter") makes it look like it was created solely for posting rfvs and rfds, but that, in itself, isn't a violation of the rules. Chuck Entz (talk) 07:54, 7 October 2020 (UTC)[reply]
Inquisitor would have been more apt. – Jberkel 08:17, 7 October 2020 (UTC)[reply]

Latin-based scripts of the peoples of the USSR

[edit]

@Atitarev has removed a lot of historically attested renditions of workers of the world, unite in the Latin scripts used during w:Latinisation in the Soviet Union, that is 1920s and 1930s.

His argument is that Module:languages/data2 doesn't state that Chechen can be written in Roman letters. The use of Roman letters in Chechen was really very fleeting and had no impact for the sources we consider acceptable for WT:CFI. According to WT:CFI, even one year would be enough: use in permanently recorded media, conveying meaning, in at least three independent instances spanning at least a year. In this case, the scripts in question were used for more than a decade. A peculiar case is Yakut, which used two different Latin scripts, one in 1917—1929 and the other in 1929—1939. This particular motto was included on the front page of every single newspaper in the USSR, so it's quite well attested in the scripts in question in permanently recorded media.

So, should the translations section of workers of the world, unite include renditions in those scripts, or not? --Crash48 (talk) 10:50, 7 October 2020 (UTC)[reply]

@Crash48, Atitarev I personally don't really see the point of including words from the Soviet Latinization period into Wiktionary unless the word isn't otherwise attested, because it will just be confusing and it will give a false impression that the word can also be used in the Latin script (which is certainly no longer the case). And I *definitely* don't think such words or phrases belong in a translation section. The point of a translation section is to aid the reader by giving the *typical* translation, not to give every possible rendition. 13:05, 7 October 2020 (UTC) — This unsigned comment was added by Benwing2 (talkcontribs).[reply]
@Crash48 Yes, we can have entries for terms in obsolete scripts, but no, we shouldn't have translations in them. I don't have much time to explain, but it basically boils down to what is referred to over at Wikipedia as "undue weight". The normal modern usage should be used in the translation table, and one could make an argument for linking to it in that entry with a gloss stating that its obsolete, but even that is debatable. The obsolete-script form will be findable in the search box for those that run into it elsewhere, and it will show up in categories- that should suffice. Chuck Entz (talk) 15:19, 7 October 2020 (UTC)[reply]
Basically what Chuck says. If Latin-script spellings are attested and were at some point a/the normal spellings (as opposed to the one-off use of Cyrillic to write English, etc, that one sometimes sees, which we've excluded when cases have come up in the past), then I would be inclined to have entries for them, but in general would list only the modern spelling(s) in the translations tables. (We also shouldn't list e.g. 1800s-era obsolete German spellings in translations tables, etc.) - -sche (discuss) 15:46, 7 October 2020 (UTC)[reply]
@Chuck Entz @-sche That makes sense, but there are two fine points:
  1. I certainly don't propose adding obsolete spellings into translations of everyday words, but this particular motto was much more widely used a century ago than it is now, so its "normal modern usage" should not overweigh its historic usage. By way of example, tehdä huorin gives the English translation "Thou shalt not commit adultery." which is in no way the "normal modern language".
  2. One could argue that a couple dozen individual entries, as opposed to a compact list in one entry, attributes even more of "undue weight" to the obsolete scripts.
--Crash48 (talk) 16:12, 7 October 2020 (UTC)[reply]
"So, should the translations section of workers of the world, unite include renditions in those scripts, or not?" No, it shouldn't. Allahverdi Verdizade (talk) 08:42, 8 October 2020 (UTC)[reply]
Re the second point, the entries would just use "form of" templates to soft-redirect to the main spelling, like вис#Romanian. (Or, if even that was deemed to be granting them too much weight, they would be mentioned as alternative forms in the entry for the main, modern spelling.) - -sche (discuss) 18:34, 8 October 2020 (UTC)[reply]

Call for feedback about Wikimedia Foundation Bylaws changes and Board candidate rubric

[edit]

Hello. Apologies if you are not reading this message in your native language. Please help translate to your language.

Today the Wikimedia Foundation Board of Trustees starts two calls for feedback. One is about changes to the Bylaws mainly to increase the Board size from 10 to 16 members. The other one is about a trustee candidate rubric to introduce new, more effective ways to evaluate new Board candidates. The Board welcomes your comments through 26 October. For more details, check the full announcement.

Thank you! Qgil-WMF (talk) 17:11, 7 October 2020 (UTC)[reply]

Shouldn't categories Category:Doves and Category:Pigeons be collected under a common Category:Columbidae? Currently, we have Category:Birds of prey which do not constitute a genetic clade (the similarities in this order are due to evolutionary convergence), but species which are genetically related are split. It's quite misleading from a record-keeping point of view. Безименен (talk) 10:04, 8 October 2020 (UTC)[reply]

@Bezimenen: Yes, they should. This has already been requested at Wiktionary:Requests for moves, mergers and splits#Category:Doves, Category:Pigeons, where I suggested the title CAT:Columbids. —Mahāgaja · talk 10:51, 8 October 2020 (UTC)[reply]

Informal discussion: that SOP term isn't as bad as you think ...

[edit]

Me again. I was having a conversation with @Lambiam but I didn't want to continue it in the RFD section. What I've seen in the short time I've been here as an editor is that a lot of people put a lot of effort into making great entries. Along come the RFV/RFD folks and they see their effort go up in smoke. Don't get me wrong: I do believe that there needs to be a standard (and we have one). We do need to set the bar high. People here instinctively know that something like world map deserves to be here. However, the SOP guys come along and say, well, no, any smart person can figure out what it means by analysing the separate words. I won't repeat what I said above about convenience and not having to click around. People codify these ideas and then we buy into them as if they're immutable, as if they're the best ideas ever which need never be revised. And pretty soon everybody's out to get everybody. We start looking around trying to tear down each other's contributions instead of trying to build an awesome useful dictionary. "That guy sent my entry to RFV and then to RFD. I'm going to see how I can poke a hole in his entry." This dictionary doesn't need to be a lean and trim dictionary. And I'll say it again and again: we are not paper. So put your green leaf tea in if it makes you happy. Heck, I'd love to see green leaf tea with a goddamn picture of that teabag.

There's this slippery slope argument which is used here all the time, that if we let in a few SOP's we'll have to let them all in. No, no, and no again! Lambiam gave me an example in Italian: nella stessa casa, nello stesso istante, nello stesso libro, nello stesso respiro, nello stesso tempo. All SOP, sure. But if you look at it from a language learner's perspective, only a few of those expressions are extremely useful and will come up in typical conversation again and again. There are many editors here who speak a foreign language, so I don't need to explain that.

There are so many things which SOP terms have to offer which can't be covered on the pages of the separate parts. All of the following would all be extremely helpful to language learners -- you know, the people using this dictionary:

  • audio
The words joined together are not necessarily pronounced the same way as they are pronounced separately.
  • convenience
Who wants to click around several times visiting several pages when there are dictionaries which show you all you need to know on one page?
  • photos
A picture speaks a thousand words.
  • structural analysis
I saw a question in RFD about what part of speech was at work in fin dall'inizio. These things aren't as obvious as you'd all think.
  • usage notes
There are some things which you can say about the collocation which would be inappropriate to say on any one of the pages of the words which make it up.

etc.

Why cripple Wiktionary if we don't have to? I've seen several SOPs here (e.g., a lot of) which I don't think anybody wants to get rid of. Food for thought. -- Dentonius (my politics | talk) 21:06, 8 October 2020 (UTC)[reply]

Agreed! But I think SOP entries should not be given the same status as regular entries. Collocations should be distinguished from regular entries, but still included somehow. It drives me nuts that we can never agree on this, since we're trying to be a multilingual dictionary, and multilingual dictionaries are only useful if they include SOP collocations. Andrew Sheedy (talk) 04:45, 9 October 2020 (UTC)[reply]
I agree: As I think mentioned somewhere else recently, I support somehow including lists of selected common and natural (SoP) collocations, including the opportunity to provide translations and notes etc., but not as full entries. One possibility would be to just have a "Collocations" section in each article, where necessary, where we can list these. My slight concern with this is that we already have "Related terms" and "Derived terms", and in practice we can't even distinguish these consistently in articles, so if we add "Collocations" too, it might get even more random and confusing. Also, it would ideally need to be intuitively clear to the user why "Related terms" linked to other articles while "Collocations" did not. I support the idea in principle, but I think there needs to be some thought put into the exact way it is presented. Mihia (talk) 08:34, 9 October 2020 (UTC)[reply]
I think they should be regular entries. Some worry about the possible spike in SOP contributions we'd receive. An easy thing we could do is simply to require a certain number of quotations / references for SOP terms. Win-win. -- Dentonius (my politics | talk) 10:30, 9 October 2020 (UTC)[reply]

Imagine that in some alternate English language toothpaste were written as two separate words: "tooth paste". To write the words together would be incorrect in the same way that we perceive "worldmap" to be incorrect. Would we be calling for the removal of "tooth paste" as SOP? (By the way, just about all the other Germanic languages write words together whereas in English we use spaces to separate the parts. These words in German and Dutch are all here but nobody's saying remove them because they're SOP). I sincerely believe we need to stop looking at the space between the words and ask ourselves: Does the thing in question correspond to a single (real/imaginary) entity? -- Dentonius (my politics | talk) 12:11, 9 October 2020 (UTC)[reply]

@Dentonius: Already someone left because of people tying the SOP criterion to spaces. Wiktionary:Beer parlour/2017/July § Delete SoP compounds in languages like German and Dutch. So Niggerschwanz will be kept, if you create that, while nigger cock the majority wants deleted. Although WT:SOP does not ordain such views, only presupposes their existence. And although it is clear that many things written together have to be deleted as SOP, e.g. one-letter Arabic prepositions + some noun because such is always written together and no reasonable dictionary includes such, although this failed for لله. Fay Freak (talk) 12:44, 9 October 2020 (UTC)[reply]
@Fay Freak, thanks for bringing that conversation to my attention. I'll go over it soon to see what took place. I won't create that German version of N-word cock. Mine needs to stay true to its roots. ;-) -- Dentonius (my politics | talk) 12:59, 9 October 2020 (UTC)[reply]
But @Fay Freak, maybe you should create Niggerschwanz. Ich frage mich, was danach passieren würde. -- Dentonius (my politics | talk) 17:09, 9 October 2020 (UTC)[reply]
Nothing would happen to it, as I have forespoken. I am not interested in this term. Besides I think that quotes containing compounds can also be employed at the simplicia. The bloke who decided that the quote containing Muselmanenmäusken cannot go to Muselman was a buffer lacking problem awareness (Problembewusstsein is a set term, so what’s with the English, entryworthy?).
And hyponyms can be SOP? I found most of these weighttraining splits are too much private language to be entryworthy but still they are opaque and demand usage notes or glosses in the dictionary so I added a weighttraining definition, and even if by itself “split”, outwith compounds, lacks use because without qualifiers (either by compound or by adjective or prepositional phrase, e.g. “the split of your training”) it is incomprehensible in contexts the gloss should be there (because we can’t create as durably attested or sufficiently used all the compounds for weightlifting splits – but if we could create all and it is not used alone some would argue the meaning “is only found in compounds” so should not be present in the simplex. But we can label “in compounds”, as I have done on Persian خر (xar). I have also added to مَغْرِبِيّ (maḡribiyy) and طُومَار (ṭūmār) glosses “concerning a certain style of Arabic calligraphy” because nobody would look up the whole phrases and there are several possible). Fay Freak (talk) 18:21, 9 October 2020 (UTC)[reply]
It seems they were coming at it from the opposite angle (negative culture) again. I'd like us to recognise our compound words which are formed with spaces -- the same ones which refer to a single entity. -- Dentonius (my politics | talk) 13:38, 9 October 2020 (UTC)[reply]
The main non-esthetic reason for excluding SOP terms with full entries is that they diffuse content. If you want to learn everything about a term that's part of a number of SOP terms, you have to go to all of those entries. This is especially a problem with a wiki, because a change to one of the parts will not be propagated to all of the SOP phrases, and a change to one of the SOP phrases that's relevant to a part won't get propagated to that part, let alone to all the other SOP phrases that contain the part. It's the same reason that only lemmas have the definitions, etymology, etc. that apply to every form of the term.
It's very easy to say that we should just make sure they're all in sync. In practice, though, coordination between entries is rather boring and time-consuming, so it doesn't get done. It would be nice, but it would also be nice to give people a dollar every time they used our site- lovely as an abstract idea, but it's never going to happen.
I would dearly love to have a collocations section, but the proposal was shot down by advocates of the translation hubs: they were afraid that translations in a collocation section would make translation hubs unnecessary, and they considered translation hubs a better way to host the translations. Chuck Entz (talk) 19:12, 9 October 2020 (UTC)[reply]
I'm not buying the consistency argument just like I no longer buy the "volunteer effort" argument. If repetitive mindless tasks are boring, the bot guys can do them. We can have collocations and translation hubs. I don't see any reason why we shouldn't. Bring it up again. I'll support it. -- Dentonius (my politics | talk) 19:55, 9 October 2020 (UTC)[reply]
Y'all need to stop shooting each other down, by the way. That's why we can't have nice things. -- Dentonius (my politics | talk) 19:58, 9 October 2020 (UTC)[reply]
Synchronising those kinds of content changes across articles would be almost impossible to reliably automate. Mihia (talk) 20:42, 9 October 2020 (UTC)[reply]
So you mean if we allow world map and somebody makes a mistake in world or map, there will be unimaginable pain to rectify world map? ;-) I think you all are making the assumption that the SOP page would duplicate the content of the parts. That's so not true. The world map page looks nothing like world or map. -- Dentonius (my politics | talk) 20:48, 9 October 2020 (UTC)[reply]
If collocation sections makes translation hubs unnecessary, great, all the more reason to have them. I do not like translation hubs. They are just another annoying reason, along with COALMINE, whereby we are obliged to keep blatant SoP terms as full entries. Mihia (talk) 20:01, 9 October 2020 (UTC)[reply]

Serious question here. Who indoctrinated you all into this anti-SOP culture? It would make sense for a paper dictionary or for back in the day when this entire database was an RDBMS but for us newcomers it really is ridiculous. You guys are operating as if we don't live in the age of Big Data. -- Dentonius (my politics | talk) 20:10, 9 October 2020 (UTC)[reply]

It looks like a witch-hunt. -- Dentonius (my politics | talk) 20:25, 9 October 2020 (UTC)[reply]
On the contrary, it's having individual full entries for a trillion phrases that mean nothing more than exactly what the individual words say that would be "ridiculous". Mihia (talk) 20:47, 9 October 2020 (UTC)[reply]
Meaning is only a part of it. I pointed out a few things at the top which could be done with those SOP pages. There's probably a lot more that you can do too: connotations, example sentences, translations, anagrams, etc. The most common collocations won't amount to "trillions" of pages. It doesn't in German or Dutch, so I don't see why it would happen in English. -- Dentonius (my politics | talk) 21:34, 9 October 2020 (UTC)[reply]
Literally anything could be done. Proper-noun articles could be expanded to Wikipedia-like proportions, the whole of English grammar could be explained, whatever you like. The question is not what could be done if the whole of human knowledge were to be incorporated, but what is in scope within the dictionary space of this project, on which opinions may vary, but where personally, in the case of phrases that mean no more than the sum of their parts, and are readily understandable as such, I would not go further than "Common collocations" sections under main headwords or head-phrases. Mihia (talk) 22:12, 9 October 2020 (UTC)[reply]
Could you give me a few concrete examples? -- Dentonius (my politics | talk) 22:17, 9 October 2020 (UTC)[reply]

I thought of a few things at home and made the list below. They're all SOP. According to the established culture here, we'd expect them to all be red. If they aren't, how do you feel about these terms? Should they stay or go? Which? Why?

  1. disk drive
  2. door handle
  3. game controller
  4. hardwood floor
  5. media player
  6. picture frame
  7. USB flash drive
  8. window frame
  9. wireless printer
  10. wireless router

-- Dentonius (my politics | talk) 21:59, 9 October 2020 (UTC)[reply]

@Dentonius: These terms are held by translation considerations (WT:THUB), except 1. USB flash drive, because one can connect drives in various fashions (USB, SATA, M.2, U.2 …). The entry flash drive is crappy and from 2005 and must be updated (it hosts the translations for what we term USB-Stick but an SSD is a flash drive too; I don’t remember the usage in 2005 but now this is clearly misleading and a colloquialism that should be avoided by context, a move to rarer pen drive or memory stick or memory key is and was appropriate since flash drive is only a hypernym). 2. disk drive I don’t know what this is supposed to be, the gloss is opaque. What’s “a computer drive that reads disks”, how am I supposed to translate this nonsense to German? Especially since it apparently is nothing other than drive (probably used because of the polysemy of “drive”, but not SOP because of this peculiarity, “disk” meaning rather nothing at all so not making a sum). 3. wireless … is nothing special, like with USB + random device. 4. not hardwood floor, it just a random material + floor, which works with any language. Fay Freak (talk) 23:55, 9 October 2020 (UTC)[reply]
@Fay Freak, aren't you all tired of making up these legalistic sounding excuses and justifications? Just call it what it is: you all know that some terms are so important that you could never leave them out. What you all need to do is find out the core reason why these terms are so important that they can't be left out. I've already stated why several times. -- Dentonius (my politics | talk) 06:27, 10 October 2020 (UTC)[reply]
@Dentonius, Fay Freak: I agree. See also tennis player, basketball player, soccer player, etc., which have survived multiple RFD/RFV attempts and exist primarily as translation hubs (it is argued that they carry the non-SOP meaning of "usually professionally" but I don't really buy that). Some of the red-linked terms you mention above should (potentially) exist for the same reason. Benwing2 (talk) 00:07, 10 October 2020 (UTC)[reply]

A question of mine from above was also avoided. Nobody's answered it yet. If it were only acceptable to write toothpaste as "tooth paste" in English, would we want to remove it for being an SOP? -- Dentonius (my politics | talk) 06:30, 10 October 2020 (UTC)[reply]

I think this has been touched-on above, but in the past there was a proposal to have a "Collocations:" namespace (not just a "section" on entries), accessed via a tab at the top of the page like "Citations:" are. One benefit to having them in their own namespace is that it's easier to then include translations of them without ballooning the main entries to the point of breaking (since translations are what has broken several of the long-term residets of CAT:E). As Chuck touches on, this fell victim to fans of THUBs or the Phrasebook worrying such a namespace would weaken the argument for those other things. It was a decent idea, though (at least, if one thinks collocations should be included; if one thinks they shouldn't, then any manner of including them is bad), and perhaps now that THUBs are on a more solid footing, it could be re-proposed. - -sche (discuss) 06:48, 10 October 2020 (UTC)[reply]
@-sche, thanks for that bit of history. I appreciate it. I was wondering about your username. Am I right in thinking that it suggests that you're Swabian? -- Dentonius (my politics | talk) 08:17, 10 October 2020 (UTC)[reply]
We would, I hope, definitely keep "tooth paste" if it were written that way. I don't think the purpose of it is at all obvious from "tooth" + "paste". Mihia (talk) 09:26, 10 October 2020 (UTC)[reply]
Let's look at multi-word expressions ("MWEs") from a* paste”, in OneLook Dictionary Search., of which only almond paste appears in Wiktionary:
  1. achiote paste: appears in a specialized Spanish food glossary
  2. alimentary paste: appears in several dictionaries, including MWOnline
  3. almond paste: appears in several dictionaries, but not MWOnline
  4. aluminum paste: appears only in one sci-tech dictionary
  5. Amish Paste: appears only in WP.
  6. anchovy paste: appears in several dictionaries, but not MWOnline

Following the lemming principle, aka keeping up with the Joneses, we would have all of these with the likely exception of Amish Paste because we don't usually include encyclopedias among the lemmings we follow. My personal inclination is to ask why MWOnline doesn't have a given MWE. In this case, the three (omitting Amish Paste) they exclude would have definitions of the form: X paste: ("paste with principal ingredient X"). I believe that almost all English NPs have definitions that can be reduced to such formulas. It should really be only MWEs that have definitions that can't reliably defined by such formulas that merit our attention. But YMMV.

But we certainly haven't done a very good job of following the lemming principle for discovering and including MWEs. Until such time as we have all the definitions for MWEs that the lemmings have, it seems to me that we are better off not frittering away our lexicographic energies on entries that have so little lexicographic merit that no other dictionary or glossary have them. DCDuring (talk) 15:03, 10 October 2020 (UTC)[reply]

  • Support a relaxation of SOP to allow more phrases that are common collocations that might be looked up together. I think that having a citation requirement for such phrases is an excellent gatekeeper against absurd numbers of them being made. I would add that we could easily categorize phrases that are non-idiomatic collocations, so that it would be easier to review them. bd2412 T 17:21, 10 October 2020 (UTC)[reply]
Unfortunately a citation requirement will not help in this respect. I can cite you enough instances of "big fat pencil" to satisfy requirements, so we should have an entry for "big fat pencil", right? Mihia (talk) 22:49, 10 October 2020 (UTC)[reply]
@Mihia, just to point out the first obvious thing: "big fat pencil" => adjective + adjective + noun. Whereas "world map" => noun + noun. Am I saying no adjective + adjective + noun types? No. When you say "big fat pencil", I think this is what you're talking about: image But, ahem, we can use something similar to Occam's Razor (the law of parsimony). Parsimony test: Can we remove a part of it and still describe the same thing? Searching for fat pencil, we find image. I would say that "big fat pencil" doesn't say anything more than "fat pencil". In this case, I would be in favour of the creation of fat pencil. -- Dentonius (my politics | talk) 04:31, 11 October 2020 (UTC)[reply]
Please tell me you're not being serious right now? I'm one of those SoP guys who would absoluty hate it if Wiktionary suddenly became a playground for all sorts of utterly useless collocations. What more information do you need than fat + pencil? It's hard enough as it is managing legit entries, making sure they keep a certain kind of standard, that they don't get vandalised by anons, that they provide value for native speakers and learners of the English language, and the discussion currently taking place here aims at opening the floodgates. I'm far from a conservative type of person, but laxing the rules completely (N.B. I encourage revising rules now and again) like this will not help Wiktionary become anything other than a slightly more advanced version of UrbanDictionary. Strong oppose on my part. --Robbie SWE (talk) 09:39, 11 October 2020 (UTC)[reply]
I'm being perfectly serious. A thousand years from now when there's still Wiktionary and there's no longer any of us or any fat pencils, the person reading may well imagine that we're talking about a pencil of a larger size. But it sure would be nice to have an accompanying picture on that page. As for your concerns about the proliferation of bullshit pages, we could simply copy the Germans. On their version of Wiktionary, you have to provide references for everything. If you don't have any kind of standard reference, the Germans require that you provide five separate references for your entry, the five quotation rule. On their version of Wiktionary, every new entry must be reviewed by someone with a privileged role (probably not practical here since our Wiktionary is huge). Nevertheless, what I'm trying to say is: yes, you can decipher meaning from words but you'll never really truly know them until you explore all their aspects (image, audio, etc.) I'll give you an example from my dialect, Jamaican Creole: mout a massy. I could tell you that mout is mouth, a is have, and massy is mercy, so: "mouth have mercy." You could probably guess that this is a chatterbox. But even then, you'd never truly know what these words sound like when you put them together, how you use them, etc. etc. We need to stop thinking that Wiktionary is just for us. It's also for generations yet unborn who are totally removed and disconnected from anything we think is common knowledge. -- Dentonius (my politics | talk) 10:10, 11 October 2020 (UTC)[reply]
You've made it perfectly clear by your comment that this proposal is not a genuine improvement of this project. It's your personal crusade to prove a point. In reference to your Jamaican Creole example, it would possibly be considered a phrase and be included as such. I have nothing more to add to this discussion. --Robbie SWE (talk) 10:31, 11 October 2020 (UTC)[reply]
@Robbie SWE, to suggest a thing like that is rude. I specifically joined Wiktionary to help improve it, to grow it, and to help it become the dream dictionary I know it can be. Sunt foarte dezamagit. -- Dentonius (my politics | talk) 12:16, 11 October 2020 (UTC)[reply]
Oppose Dentonius's intitiative per Robbie SWE. I am for the sanity of this dictionary and keeping it a dictionary, not a collection of collocations. I do support phrasebook and translation hubs, however, which have a clear and defined purpose and have nothing to do with this proposal. --Anatoli T. (обсудить/вклад) 10:43, 11 October 2020 (UTC)[reply]
I second what Robbie SWE said. And Chuck nailed it below: "We've lost as many editors over loosening the rules as tightening". Of course our crusader here doesn't understand that. PUC18:15, 15 October 2020 (UTC)[reply]

I think a lot of these problems come from our poor treatment of translations. I am not interested in having millions of individual "collocation" entries. I am interested in better ways to organize translations- forcing every translation for a single language onto one line does not leave room for these sorts of nuances. DTLHS (talk) 17:31, 10 October 2020 (UTC)[reply]

Where the spurious "translation hub" entries are concerned, I agree that translations are somewhat to blame, but I would also mention the "coalmine" nonsense, another of my pet hates, whereby we are forced to keep clear SoP phrases because someone found an instance where a writer could not spell properly, and our related present inability to recognise, or cater for the fact that, sum-of-parts applies to solid words such as "coalmine" as well as it does to spaced phrases such as "coal mine". Not that I am saying I have ready solutions. Btw, I am not clear what you mean by "onto one line"? Mihia (talk) 22:58, 10 October 2020 (UTC)[reply]

There are set of people here who have been here for a long time. First, I would like to thank them for all their hard work. I have seen what happens to neglected Wikis. Those who started a Wiki in my dialect have long abandoned it and it's now a haven for spammers and malware. English, however, is in no such danger. There are many eyes here looking out for the well-being of this project. Once again, to those who have been here for a long time, who have made our Wiktionary the success that it is today: Thank you.

However, I get the impression that they have long lost whatever fire burned inside of them originally. I pay attention to their words and their arguments amount to: that sounds like extra work for me. That would mean more patrolling. That's all it is and it's sad. Because this Wiktionary will not stay stagnant. The longer it's here, the more it will grow. So you need to ask yourselves: Why don't you have more active users?

People come and people go, sure. After poring through some of the archives here, I get the impression that there are people here who have left because of what they viewed as a rigid (almost fundamentalist) devotion to the ideas we've discussed above. There are no technical reasons why the things we've discussed here can't be done. For some, it's work avoidance. For others, they have this view of what a dictionary should be and fear change. In a sense, they're declaring over my dead body.

Now, the old guard are certainly influential people. They have been here long enough to become autopatrollers, administrators, bureaucrats, etc. Their word carries weight and I've seen the cliques they've formed. I would simply ask them: Do you plan to take Wiktionary to the grave with you?

There is a native speaker bias when considering SOP terms. For the native speaker, it's simply A + B = (A + B). For the native speaker, nothing emerges as clearly as it does for an idiomatic term. Take the word bath robe. One could say that it's simply bath + robe = bath robe and there's no new information. I disagree as you can see from my earlier statements. In real life, I've helped many people to learn foreign languages. The bath robe example could even lead to confusion. That learner might believe (initially) that we're talking about a: swimming dress, a dress you keep in your bathroom, etc. Are you in disbelief? I'm not. I've taught many people. I know the kinds of mistakes they make with languages.

I want to invite you all to think about the future of this Wiktionary, 10 years from now, 100 years from now, 1,000 years from now. Don't cripple the dictionary for your convenience. We're all here to work. The more permissive you are, the more of us there will be to help you. Let's keep the people who sign up and stop driving them away. - Dentonius (my politics | talk) 13:04, 11 October 2020 (UTC)[reply]

First off, permissiveness for the sake of permissiveness is not a good approach. From what I've seen on this wiki there is plenty of discussion and good-faith attempts to reach consensuses when changes to policy are proposed. I don't think that people are offering disagreement to proposed changes only in order to "avoid work", or out of some "fundamentalist devotion" to the existing policy, but rather they appear to have good-faith rationale.
Agreed. I've seen it too. But regarding "work avoidance", it's hard not to think that that's a part of it when the people themselves write that (not quite in those words, of course). -- Dentonius (my politics | talk) 14:43, 11 October 2020 (UTC)[reply]
Secondly, I think a big reason why there aren't more active users is because many people just want to jump right in and start editing without the effort of learning the standard layout and practices. So yes, laxing the standards would encourage more user participation, but at the cost of a lack of standardization across entries which would make the wiki less accessible to the average reader. It's difficult to learn how to work within organized and standardized limits, but it's often worth it.
I think you're right to an extent. However, people are willing to learn the standard ways of doing things. When I first got here, Sarri.greek was one of the first ones to take me under her wings. That's something I'm truly grateful for. She taught me a lot about editing. I think people are actually happy that there is a standard entry layout. What they aren't happy about is seeing their work reverted for no reason or seeing their entries deleted. They also don't get any proper explanations as to why. And, if they're like me, some of these reasons strike them as absurd. In the end, after being shot down one too many times, they just say screw it and leave. Now, as for me, I have no place else to go. Wiktionary has been teaching me for years. There's no foreign language I could learn at this point without the help of Wiktionary (and I still plan to learn a few more). I just wish Wiktionary would grow up and become what it truly can be, or better yet, what it's destined to be. -- Dentonius (my politics | talk) 14:43, 11 October 2020 (UTC)[reply]
Thirdly, on the topic of SOP entries, it does appear to be a fine line. I agree that the logic of A + B = (A + B) seems to apply, for example the phrase blue car is SOP since it merely describes a car that is blue, whereas the phrase car wash conveys meaning more distinct than merely car + wash. However, each term would need to be evaluated separately by native speakers through discussion when disputed. As for the terms tooth paste and bath robe, the far more common terms toothpaste and bathrobe are present, although I would be in favor of the 2-word entries being created as alternative forms. And there do exist many english compounds where the 2-word variation is more common than the 1-word, e.g. vice president vs vicepresident. – Guitarmankev1 (talk) 14:03, 11 October 2020 (UTC)[reply]
Agreed. We have called for a "moderate relaxation" of the SOP rules. To some here, it's egregious. How do you reason with people like that? You can't. It's why I view some of them as fundamentalists. -- Dentonius (my politics | talk) 14:43, 11 October 2020 (UTC)[reply]
@Dentonius: Please don't insert replies within someone's post like that. It makes it harder to keep straight who said what when. Chuck Entz (talk) 16:50, 11 October 2020 (UTC)[reply]

What would we gain by relaxing the overly strict SOP rules a bit (as per what was described above: common collocations; citation requirement; parsimony test):

  • Time
We would spend less time debating the merits of multi-word entries: Is blah blah an SOP? Well, blah..and...blah..and more ..blah.. Some person thinking up some clever weird crap.. Okay, we can add a second definition.. Okay it's not an SOP. That stuff is super annoying. For some of the old guard, that's fun, to figure out ways to get SOPs past the filters. Relax the rules as per common collocations with the constraints described and this dictionary would be so much better. Heck, even the paper dictionaries do this! The time saved could be used for improving the dictionary.
  • Editors
More editors will stick around and help us do our work. A slight relaxation of the rules would mean fewer frustrated people leaving the project.
  • Fewer RFVs
Improved citation requirements imply fewer RFVs. More time saved.
  • Lexicon growth
English would be able to catch up to its other Germanic cousins: Danish, Dutch, German, Swedish, Norwegian. They don't have these conversations because their words look like this: Digitalmediaplayer. Just to reiterate what was said in a conversation which occurred before I got here: it makes no sense that they're allowed to have certain words and we can't simply because our writing system includes more spaces. I think it's really the monolinguals here who have trouble seeing that.

A question to any and all: What drives you crazy about some of the rules here? -- Dentonius (my politics | talk) 15:01, 11 October 2020 (UTC)[reply]

Spoken like someone who doesn't understand the status quo. Your enthusiasm is great, but the problem with changing everything every time someone comes along with a fresh perspective is that someone else will come along later with a different perspective, and then someone else with yet a different perspective. We have over six million entries, so reworking even a fraction of those is a huge, huge project. It's entirely possible that one reworking will be only half done before the next one comes along. Just as in law, the principle of stare decisis is there for a reason.
Besides, nothing you're saying is new. The war of the inclusionists and deletionists (yes, we have names for both sides) has been going on since the beginning of the site. We've debated this over and over again, we've had votes, etc., etc., etc. We've arrived at uneasy compromises that people have worked hard to reach. Everybody has their pet peeves about things they'd like to change, but they're often diametrically opposed.
Basically what you're doing is like showing up at the United Nations and saying "this whole conflict over the status of Israel is a waste of time. Here's what you should do." and expecting everyone to say "wow, I never thought about that, you're right." Good ideas are welcome, but you can't simply ignore the context.
As for your points: the time spent on SOP would just be shifted to deciding whether entries should be included on other grounds. Relaxing the rules also means making things more subjective and unclear, and thus more prone to politics. We've lost as many editors over loosening the rules as tightening- good editors with knowledge that's hard to find, and who did lots of hard work on difficult subjects. Increasing citation requirements means lots of entries that passed will become vulnerable=more rfvs. Growth by itself isn't necessarily good- are we talking about muscle mass or tumors? Catching up with other sites begs the question of whether we want to be like other sites in that respect. We're way behind Urban Dictionary in uncitable BS, but that's a good thing. Chuck Entz (talk) 16:50, 11 October 2020 (UTC)[reply]
WT:REDIRECT lists these as acceptable uses of redirects:
  • Minor variants of phrases where there is little or no chance of the entry title being valid for another language, including inflected forms, should be redirected to the main entries.
  • Other forms of multi-word idioms: for example, burn his fingers may redirect to the pronoun-neutral, uninflected form burn one’s fingers.
  • Sum-of-part terms that are likely to be searched, to the part that the meaning mainly derived from.
I think we can and should use this to avoid complete deletion of entries that are, strictly speaking, SOP, but that are useful and likely search terms for non-native speakers. So instead of deleting fin dall’inizio, we could redirect fin dall'iniziofin da. Or maak je geen zorgen (IMO SOP) → zich zorgen maken. Or only too well (IMO also strictly speaking SOP) → (a yet to be created) only too. Redirect to ... should be a respectable option in RfD debates, next to Keep and Delete.  --Lambiam 12:22, 12 October 2020 (UTC)[reply]
Generally speaking, I am not a fan of automatic redirects, as they are presently implemented, since it may not be clear to the user why something different to what they typed in has been displayed, or what the connection is between the two. Mihia (talk) 23:54, 13 October 2020 (UTC)[reply]
  • With respect to arguments about maintenance costs of having SOP entries, I think that ship has sailed. We now have well over six million entries. I don't know that we have even deleted more than a few thousand entries as SOP, but presuming we have, if we were to restore the 6,000 closest calls or most contentious cases for SOP deletion, that would add less than 0.1% to the corpus. I would say to this end, rather, that we should err more on the side of inclusiveness, to the extent that there is any question that a reader at any level of language skill may reasonably be unclear on the meaning of a collocation. bd2412 T 18:44, 15 October 2020 (UTC)[reply]
    • I don't think your assessment makes sense, User:BD2412. The fact that there are not as many potential SoP, especially in highly looked after languages is thanks to the fact that editors have mostly done their job well. If we allow to make this dictionary a totally collocation/phrase or translation dictionary, it will grow exponentially with no-one to look after the quality of entries. For example, currently Category:Russian multiword terms (mostly idiomatic but may include some PoS, which survived RFD or haven't been through the process) currently stand at 2,084 or only 4.4% of 47,232 Russian lemmas. It's a good and manageable percentage. --Anatoli T. (обсудить/вклад) 01:47, 16 October 2020 (UTC)[reply]
      • We have historically deleted tons of well-formatted entries, and sometimes even well-cited entries, as SOP. bd2412 T 03:31, 16 October 2020 (UTC)[reply]
        • @BD2412: How does well-formatted and well-cited help your argument? A "blue car" and "bedroom window" can be both well-formatted and well-cited (even include etymologies, pronunciations, translations and usage examples) but they are still SoP. Who will take care of endless possible word combinations? Not you but somebody else, I suppose? --Anatoli T. (обсудить/вклад) 04:22, 16 October 2020 (UTC)[reply]
          • The people who created the entries will take care of them. It gets added to their watchlist. They usually care about what happens to their entries. -- Dentonius (my politics | talk) 10:11, 16 October 2020 (UTC)[reply]
          • There is, by definition, not an endless supply of cited word combinations. Nor, for the record, have I proposed to include all citable combinations. A combination of color-and-noun for a physical object would remain outside the scope (though we do have red book, orange squash, yellow card, green pepper, blue corn, and violet wand; I include these not to indicate a relationship between the color and the object, but because even something as straightforward as "green pepper" as the term for a pepper which is green apparently meets our standards for inclusion). No one is advocating for the inclusion of any citable combination of words, but for combinations that could reasonable pose a problem for readers unfamiliar with the language. As to your other question, I've been here for fifteen years, and I imagine I'll be here until the caretaking is taken over by an artificial intelligence (which will probably be less than another fifteen years), so why would I stop taking care of things now? bd2412 T 04:58, 16 October 2020 (UTC)[reply]
            • @BD2412: You listed most entries that are obviously idiomatic or may have passed RFD. Combinations that merit an entry DO get included. The main reason - idiomaticity, or in case of a doubt, there is a vote in RFD discussions, which make entries either pass or fail. You can always challenge the decision when you think it's an unfair decision, can't you (but things don't always go the way we want). I don't see any need to change that. Valid, common possible combinations can easily be cited, so there is an endless supply of that but that is not our current CFI.
            • Re: "No one is advocating for the inclusion of any citable combination of words". They do actually, if you're paying attention. Our CFI are already loose enough to allow inclusion of idiomatic multiword entries, you just need to prove that they are (in cases where it's not obvious). Occasionally idiomatic combinations get deleted or unidiomatic combinations are kept, we are humans and err is human but overall, there is a good balance. --Anatoli T. (обсудить/вклад) 05:35, 16 October 2020 (UTC)[reply]
              • @Atitarev: Thank you. "Our CFI are already loose enough to allow inclusion of idiomatic multiword entries, you just need to prove that they are (in cases where it's not obvious)": exactly.
              Notice also that the time people like Dentonius spend on trying to convince the rest of us to "relax the CFI" (but without proposing any concrete, actionable plan) is time not spent on creating multiword idiomatic entries that no one would think to challenge, even hardboiled deletionists like myself.
              Frankly, it's really maddening. I have created thousands of multiword entries, I could create thousands more, and I'm being called a fundamentalist because I refuse to allow a free-for-all that would be of no damn use to anyone. And I'm being called that by someone who probably has one of the most unnuanced opinions of all around here. And who, incidentally, has been here, what? Three months? And has created how many entries exactly? PUC12:39, 16 October 2020 (UTC)[reply]
"So you need to ask yourselves: Why don't you have more active users?" @Dentonius: I've often asked myself this, but I don't think it's related to a lack of SOP entries. Relaxing SOP rules could have some benefits, maybe in terms of users as "readers" (more findable content, more "inbound links"), but I don't think these users would then start to contribute. The main hurdles to contribution are a mix of technical and cultural/community aspects. – Jberkel 09:56, 16 October 2020 (UTC)[reply]
You might be right. I hope there will be a lot of active editors at some point in the future :-) -- Dentonius (my politics | talk) 10:09, 16 October 2020 (UTC)[reply]

"idiomatic" label (again)

[edit]

I raised this once before at Wiktionary:Beer_parlour/2020/May#"idiomatic"_label, but no real decision was reached, so I am raising it again for further input before I change things unilaterally. The label {{lb|en|idiomatic}} creates a link to Appendix:Glossary#idiomatic, where the word is defined as "Pertaining or conforming to the mode of expression characteristic of a language". Yes, this is one definition, but is it the one relevant to this label? Shouldn't this label apply only to "idiomatic" in the sense "not (easily) understandable from the individual words/parts"?

It was suggested in the previous thread that the label isn't appropriately applied and may be redundant altogether. I sympathise with this viewpoint. Strictly speaking, wouldn't, or shouldn't, all of our multi-word entries be "idiomatic" in the Wiktionary sense? But for now, short of proposing abolition of the label altogether, I intend to at least fix the glossary definition to explain how we use the term "idiomatic" in our labels, i.e. to mean not (obviously) SoP — unless anyone thinks that our "idiomatic" label actually does, or should, apply to "mode[s] of expression characteristic of a language"? Mihia (talk) 17:52, 9 October 2020 (UTC)[reply]

What's the difference between the label "idiomatic" and "figurative"? -- Huhu9001 (talk) 04:52, 10 October 2020 (UTC)[reply]
"figurative" usually indicates that a meaning has been transferred from the original thing to something perceived to be similar in some way, such as a concrete meaning to an abstract meaning. This needn't be the case with either sense of "idiomatic" (though it could also be, of course). For example, "find out", in the sense of "discover", would be idiomatic (not a predictable phrase from regular meanings of "out"), but not, I would say, "figurative". "Figurative" is often applied to single words, while "idiomatic" (both senses) is more for phrases and word patterns. Mihia (talk) 17:36, 10 October 2020 (UTC)[reply]
I suppose (or hope) everyone would agree that whichever sense is meant should be linked. But it's unclear that "not (easily) understandable from the individual words/parts" is what's meant; a number of uses do seem to intend the "pertaining or conforming to the mode of expression characteristic of a language" meaning, and another swath of uses make no sense at all, neither definition applies (or at best, perhaps "...characteristic..." applies), because the label has been applied to some sense of a single polysemous word. And, with multi-word entries, defining it as "not (easily) understandable from the individual words/parts" highlights how useless it is, since if a multi-word entry were understandable from its parts, then it would generally not be included. My suggestions are: 1) remove inappropriate/daft uses (on single words, etc) and see what it means in whatever cases are left, and/or 2) consider just linking to the entry idiomatic and not any one sense... - -sche (discuss) 07:17, 10 October 2020 (UTC)[reply]
The label {{lb|en|idiom}} also creates a link to Appendix:Glossary#idiomatic. If it is made instead to link to Appendix:Glossary#idiom, “A phrase whose meaning is unapparent or unobvious from the individual words that make it up ...“, the intention will be clearer. Are there entries that are idiomatic (and worth being labelled thus) that are not idioms?  --Lambiam 18:58, 13 October 2020 (UTC)[reply]
  • Thanks for the comments. The more I think about this, the less clear I am about what we should do. Presumably the very great, or even overwhelming, majority of all phrases in Wiktionary are "idiomatic" in the "non-SoP" sense, otherwise, as has been mentioned, why would we include them? Add in an additional possible use for phrases "Pertaining or conforming to the mode of expression characteristic of a language", and it's hard to see a includable phrase that would not fall within either category. My feeling is that the "idiomatic" label may have originally been intended to (or ought to) apply to "highly idiomatic" phrases and expressions, such as at the drop of a hat or carry the can or look up (e.g. in the sense "look up the word in a dictionary"), and not to more feebly non-SoP entries such as book in or beauty queen to which it sometimes seems to be applied. Problem here is how on earth would we define "idiomatic enough", so as to restrict the label to a sensible subset of phrases? Mihia (talk) 17:37, 20 October 2020 (UTC)[reply]
    There are some 8,600 English lemmas that have one or more definitions labelled "idiomatic". The label has different meanings. In [[according to]] the label appears on the true preposition definition, apparently to distinguish it from the two non-preposition definitions (which nonetheless appear under the preposition PoS), with are arguably simply "accord(ing) + to". Alternatively, the label might mean that the labeled sense is more common in current spoken English than the others. In [[chestnut]] it appears next to the "old joke" definition (but not the definition relating to a certain plate-like thing that sometimes occurs on horse's legs), to distinguish it from the tree, wood, and nut definitions. But is the "old joke" more idiomatic or just harder to connect to the early nut/tree/wood definitions? At [[good for nothing]] (and [[good-for-nothing]]) it seems to be there to assert the the term is entryworthy. These are not atypical examples.
The label seems to me to often be used to make a point against deletion of the entry than to convey useful information to normal users. DCDuring (talk) 00:48, 21 October 2020 (UTC)[reply]

My entries

[edit]

Just a note to any administrator on this site, but I give them permission to delete any of the entries that I have made in the past that don't meet the criteria for being on this site to be able to delete them without going through the usual RFV process. I am not going to check back in on this post after I make it, so I hope that my intentions have been made clear. Furthermore, I would prefer not to be notified of any such instances. Thanks! Razorflame 02:01, 10 October 2020 (UTC)[reply]

And once again, some of the old guard here have managed to demotivate and frustrate someone who was here to help. What a pity. -- Dentonius (my politics | talk) 12:06, 11 October 2020 (UTC)[reply]
@Razorflame, for what it's worth. I would like you to stay. -- Dentonius (my politics | talk) 12:08, 11 October 2020 (UTC)[reply]
@Dentonius It may be oversimplifying the situation to say that Razorflame was driven off by the "old guard". In fact it looks like he has not been active since 2014. This was before my time here so I don't know the history, although I have occasionally seen people mentioning him (usually not in a positive way); someone else might be able to explain better. I have only seen a couple of long-term contributors get frustrated and leave, and it seems usually to be due to conflicts with specific other people, rather than due to the "old guard". In general I'd caution against lumping long-term editors into an "old guard"; different contributors have very different views and personalities.
More common is new editors getting frustrated because they want to just jump in and contribute without learning the process for doing so. After they have created a mess, someone who has had to fix the mess complains to them about this, causing them in some cases to get frustrated and leave. I agree there is a lot to learn, but I'm not sure there's a way around this; dictionaries are much more structured than encyclopedias and consistency is vitally important. Perhaps there could be better-established procedures for (a) helping new editors get up to speed, and (b) speaking to them nicely in a way that doesn't "bite" them. If you have ideas and want to contribute in this fashion, I'm sure your contributions would be welcome. Benwing2 (talk) 16:58, 11 October 2020 (UTC)[reply]
You don't understand. Razorflame spent a lot of time adding content in languages he didn't know, so a significant portion of his edits are simply wrong. Worse, they were mostly in languages that very few people know, so finding the errors takes time that knowledgeable people could have spent on adding their own content. Imagine the damage someone could do to Jamaican Creole if all they had was a word list online somewhere and whatever wikis there are in the language. Yes, he got blocked several times, but not because he was challenging the status quo- he was blocked for recklessly adding incorrect junk.
Considering how young he was at the time he was active, I read this as looking back over the mistakes he made and wanting to make things right, but not wanting to stay now that he's moved on to other things in his life.Chuck Entz (talk) 17:15, 11 October 2020 (UTC)[reply]
He did try to improve after a lot of criticism and engaged with User:Stephen G. Brown (no longer here to help) to help him with Kannada pronunciations before the transliterations was fully automated. Yes, he tried to make entries in languages for which there were poor resources. Even if he got one thing right (such as correct transliteration), he produced sophisticated words, which were hard to verify for the right sense. I mean how do translate the English word "get" into a language you don't know? In my opinion, good editors start with small things, make edits, which are easier to manage and verify. Editors do add translations and entries in languages they don't speak but they spend a lot of effort and go extra mile to get them right. In any case, I don't know why he left. I know he was improving. --Anatoli T. (обсудить/вклад) 00:31, 14 October 2020 (UTC)[reply]
As an editor who's been active in Esperanto and Ido, two languages in which Razorflame had been very prolific in the past, I can testify that what Chuck Entz says is correct, with attestation being a particular issue. I am sure that most of the Ido terms that I have sent to RFV and have been deleted had been created by Razorflame, to whom I bear no ill will by the way. There is a systemic problem in constructed languages that most existing dictionaries and vocabularies are prescriptive and include many unattested lemmas, so many newcomers bump their noses when they add words in those languages. I've been there as well; for Ido the main problem is Dryer's dictionary by the way. ←₰-→ Lingo Bingo Dingo (talk) 20:11, 16 October 2020 (UTC)[reply]

I'd like to note a few things that other users have said: What Chuck Entz has said is absolutely correct. I was a teenager when I made the majority of my edits to the English Wiktionary, and it is true that the majority of my edits were made in languages that I did not fully understand, however, as Anatoli did state, I did try to improve and I tried to engage with other people who knew about the languages in which I was making edits, such as Stephen G. Brown with Kannada transliterations before they were fully automated. I also engaged with users of the Ido Wiktionary, and I always asked them about attestation before I posted the entries over here on the English Wiktionary. I did everything that I could to try to make things work here, but in the end, I felt that my time was better spent elsewhere, and I've grown out of this phase in my life. My original intent of this post was that I do take responsibility for my mistakes in the past, and that instead of having them dragged out in public through the RFV process that I would rather anything that is found to be incorrect in any of my contributions in the past, that I would rather they be removed silently as there is no need for anything to be public regarding them any longer. The final thing I want to say is that I was not driven away by anything or anyone in particular (with two exceptions which I will keep to myself), and I have moved on with my life for the most part. I might eventually come back to this project and continue my work on Kannada with simple words that I can find attestation for, but I don't foresee that to be until next year at the earliest. I also want to stress that I AM aware of the attestation issues that I have had in the past, and I will give a lot more due diligence to finding reliable attestation for any new words I add in Kannada in the future. I will also work with other Kannada editors and will bounce new entries off of them BEFORE I post them to make sure that they are attestable. Either way, this won't be for at least 4-6 months. Thank you for taking the time to read this response, and I will see you guys then! Razorflame 19:36, 6 November 2020 (UTC)[reply]

Western Armenian as a separate language

[edit]

Here, I am proposing the addition of Western Armenian as a separate to the already existing [Eastern] Armenian section. While I acknowledge that this issue has been raised previously, there is new and upcoming interest in the development of Western Armenian, especially by the Calouste Gulbenkian Foundation. Therefore, I would like to present the issue again.

Down below are eight different arguments for the inclusion of Western Armenian as a separate entity. These are all lexicographical in nature, but arise from para-lexicographical issues. While dictionaries like the Malkhasian dictionary don’t make a distinction, Wiktionary is much more than a dictionary. Currently the only (!) Western Armenian dictionary online is Nayiri. This will make Wiktionary an alternative dictionary that will include words that have not been integrated into Western Armenian dictionaries quite yet. (Work on Western Armenian is quite limited).

1. Grammatical Argument: Noun and verb declensions in Western Armenian are different than Eastern Armenian. Currently, Wiktionary only provides Eastern Armenian tables for both noun and verb declensions, which hinders learning of Western Armenian as a second language.

2. Phonological Argument: Phonetical differences are many, including in the transcription of foreign words, as seen in the IPA table on Wikipedia (Link: https://en.wikipedia.org/wiki/Help:IPA/Armenian). When foreign words, including toponyms, are spelled differently, this becomes confusing for Western Armenian speakers. Take the city of Chicago as an example. As in Russian, in Eastern Armenian, the first phoneme is /t∫/ (“ch” sound) (i.e. Չիկագո), but in Western Armenian is /∫/ (“sh” sound) (i.e. Շիքակօ). Notice that 4 out of the 6 letters are different.

3. Orthographic Argument: Apart from the phonetic differences that contribute to orthographic variation, Western and Eastern Armenian use different orthography standards, the classical (դասական) and modern (արդի) orthography standards. Although the classical variation does exist, it is given as an alternative, when in fact it is the main orthographical form for 1 million speakers of Western Armenian.

4. Pure Lexicographic Argument: Lexicographic differences arise in both languages. This one exists in many languages and therefore is not as significant, but to consider Western Armenian as a separate language will help prioritize these differences. Examples include chess (շախմատ vs. ճատրակ) and egg (հաւկիթ vs. ձու)։

5. Etymological Argument: Etymologies of foreign words are not reflective of Western Armenians. Many foreign words have entered into Eastern Armenian through Russian, while French and Turkish is the source of many foreign words into Western Armenian. As a result, the “etymology” section of entries is not reflectie of Western Armenian, and many might make erroneous judgments on word origins.

6. Colloquialism Argument: Similarly, colloquial expressions are a separate issue that arises between the two varieties. There are hundreds of words where the two languages diverge. One example is the word “swimsuit”. The proper Armenian word is լողազգեստ, but the words կուպալնիկ (Eastern) and մայօ (Western) are used. The former is derived from Russian while the latter is either from French directly or from French through Turkish.

7. Lexical-Cultural Argument: Western Armenian has priorities of its own. Since its cuisine and its customs are different, there are words that don't deserve the same attention in Eastern Armenian, as in Western Armenian. One famous example is the word դարձանուշ, a sweet that used to be offered to guests in Istanbul. While this can easily exist under the General Armenian page, many Wiktionarians of Armenian probably don’t even know what it is. Therefore, a Western Armenian section will give priority to these words. This will acknowledge the historical divide between the Diaspora (Western) and Soviet Armenia (Eastern), quite like the divide between Turkey (Turkish or Turkey Turkish) and Soviet Azerbaijan (Azeri or Azerbaijan Turkish).

8. Other: Other subvarieties of languages such as varieties of Arabic (e.g., Levantine Arabic, Gulf Arabic) are represented. These varieties also only have ISO 639-3 codes, but are still treated as separate when necessary. As for now, Western Armenian has its own separate on Wikipedia, and is not as development as its [Eastern] Armenian counterpart, it is still significant. If Wikipedia found that Western Armenian deserves its own page, a similar proposal for Wiktionary should be considered.

Pompyxmori (talk) 02:52, 10 October 2020 (UTC)[reply]

They're very similar, overall, and most of your concerns are political: you don't like that a dialect spoken by fewer people does not have the pride of place compared to a dialect spoken by more people. This is not a linguistic concern; the two dialects are mutually intelligible, and splitting them would entail massive duplication of effort. If you really care about Western Armenian on Wiktionary, the best way to rectify it is for you to spend time adding entries, pronunciation, and inflectional templates for Western Armenian to our Armenian entries. @Vahagn PetrosyanΜετάknowledgediscuss/deeds 03:49, 10 October 2020 (UTC)[reply]
There is little Western-Armenian-specific content in Wiktionary not because there is no separate code for it, but because Wiktionary's only dedicated Armenian editor speaks Eastern Armenian and focuses on it. You are welcome to add Western Armenian content under the code hy. I can help you create pronunciation and inflection templates. If there is a Western word or sense you want to add and do not know how and where to treat it, let me know. --Vahag (talk) 04:53, 10 October 2020 (UTC)[reply]

Metaknowledge, most of the reasons are not political, they are linguistic (reasons 1-6), except the last two; also, your comment concerning a "pride of place," is not appreciated. As I mentioned, the Calouste Gulbenkian Foundation, the leading foundation of Portugal, is also promoting the same efforts. I am dismissing your comment as misinformed. In theory, the languages are mutually intelligible, but in practice, this greatly varies across domains. Thus, one ought to reconsider the claim of mutual intelligibility, which is rarely the last word. Although I agree more Western Armenian entries should be added in the meantime, this case should not end here. Things change. People realize new things. For the time being, I urge you to read the story of Western Armenian Wikipedia (https://www.evnreport.com/raw-unfiltered/international-recognition-for-the-western-armenian-language). Pompyxmori (talk) 05:29, 10 October 2020 (UTC)Pompyxmori[reply]

Wikipedia is a different thing. It is composed of texts, which can be in either of the standard literary variants of Armenian. In Wiktionary we deal with lemmas, which are the same 95% of the time; the rest are dealt with context labels. If we separate Standard Western Armenian from "Armenian", then we also need to separate Standard Eastern Armenian, the 50+ dialects and Early Ashkharhabar (from 1600 until the codification of the literary variants in the 19th century). Keep contributing under "Armenian" for a few months and let us see if there are practical problems for handling Western Armenian. --Vahag (talk) 06:15, 10 October 2020 (UTC)[reply]

I do appreciate the help, Vahagn. For now, I would like to put Western Armenian pronunciations for new entries. For Eastern Armenian, do you generate the IPA or manually insert it? Pompyxmori (talk) 06:18, 10 October 2020 (UTC)Pompyxmori[reply]

I generate it automatically using Module:hy-pronunciation and Template:hy-pron, with manual override where appropriate. Come to Template talk:hy-pron to discuss adding functionality for Western Armenian. --Vahag (talk) 06:34, 10 October 2020 (UTC)[reply]

Disabled WT:NORM tag

[edit]

I've finally disabled the abuse filter that adds the WT:NORM tag. New edits will no longer have this tag added to them (but edits that currently have it will keep it). Minorax pointed out that it takes up a lot of space in Special:AbuseLog, making it hard to find edits that trigger more important abuse filters. I've been thinking of doing this for almost a year, soon after I created the filter, because it's caused people to think their edits were problematic, and the conditions it detects aren't all that terrible and they should be detected by dump analysis and fixed by bots because there are far better things for humans to spend time on. But seeing the abuse log made me finally do it.

If anyone really wants the abuse filter enabled, please speak up. But if you just need a list of WT:NORM-violating pages, I can create that from the dump or by examining recent changes. — Eru·tuon 05:04, 10 October 2020 (UTC)[reply]

Hooray!!! Thank you! Mihia (talk) 09:27, 10 October 2020 (UTC)[reply]
In the frozen land of Nador they were forced to eat the normalization abuse filter. And there was much rejoicing. Vox Sciurorum (talk) 12:46, 10 October 2020 (UTC)[reply]
What a relief! DonnanZ (talk) 10:14, 14 October 2020 (UTC)[reply]

Label for Chinese non-predicative adjectives (非謂形容詞)

[edit]

Should we tag these words with {{lb|zh|non-predicative}} (link to Appendix:Chinese parts of speech?)? Some of these adjectives are tagged with "attributive" (linked to Appendix:English nouns), but I think it's not accurate. — This unsigned comment was added by 沈澄心 (talkcontribs) at 15:30, 10 October 2020 (UTC).[reply]

I don't know Chinese, but do you mean "non-predictive" or "non-predicative"? Mihia (talk) 17:21, 10 October 2020 (UTC)[reply]
Oh. I misspelled it. -- 07:50, 11 October 2020 (UTC)[reply]
(Notifying Atitarev, Tooironic, Suzukaze-c, Justinrleung, Mar vin kaiser, Geographyinitiative, RcAlex36, The dog2, Frigoris, 恨国党非蠢即坏, Thedarkknightli, Michael Ly): -- 13:32, 11 October 2020 (UTC)[reply]
I think it is good. Maybe more generally, we can link those labels to a Appendix:Glossary of Chinese for jargons dedicated to Chinese grammar. Some information about the recently discussed "inner structures for Chinese words" can also be put there. 恨国党非蠢即坏 (talk) 13:43, 11 October 2020 (UTC)[reply]
@沈澄心, the "attributive" label may be slightly more concise. How do you feel about "attributive" vs. "non-predicative"? Are they substantially different to warrant two distinct labels? --Frigoris (talk) 14:14, 11 October 2020 (UTC)[reply]
@沈澄心: I also agree with @Frigoris. Are there other non-predicative uses of adjectives that are not attributive? — justin(r)leung (t...) | c=› } 18:09, 11 October 2020 (UTC)[reply]
@Frigoris, Justinrleung: the text "attributive" is OK, but the link is not appropriate though. And what to label the predicate adjectives that are not used in "first pattern" (described in w:Chinese adjectives, e.g. 她很漂亮) but in "second pattern" (e.g. 他是的)? -- 05:39, 13 October 2020 (UTC)[reply]

Medefaidrin

[edit]

An IP added a translation for this lect to !. We currently don't recognize it or its language code, dmf, so it caused a module error- and I'm not quite sure what to do with it.

On the one hand, according to w:Medefaidrin, it's an artificial language with no native speakers, and we apparently don't have an entry for the name. On the other, it has some use as a religious language, and has managed to get not only a language code, but a Unicode block for its script. Chuck Entz (talk) 18:47, 10 October 2020 (UTC)[reply]

We should include the code for it, at least as an appendix-only language, because it has a valid ISO code (as of just this year, which explains why we didn't have it before), and seems to have seen some use. (This reminds me to look through the 2019-2020 code changes and see if there's anything else we need to add or discuss.) Now, whether it should have mainspace entries or be in an appendix requires more thought. - -sche (discuss) 02:49, 14 October 2020 (UTC) (edited after refreshing myself on Wiktionary:Criteria_for_inclusion#Constructed_languages, and that we do exclude some minor conlangs) - -sche (discuss) 20:05, 14 October 2020 (UTC)[reply]
Balaibalan is only attested in a dictionary, and apparently was not used, so I think it can be left excluded. - -sche (discuss) 20:05, 14 October 2020 (UTC)[reply]
In the absence of further input, I noted zba Balaibalan as excluded (for now) on WT:LT since we seem to exclude minor constructed languages, and tentatively included dmf as appendix-only. If it should be allowed in mainspace (or disallowed entirely) let's discuss further. - -sche (discuss) 20:07, 21 October 2020 (UTC)[reply]

Template:ko-etym-native is giving non-Middle Korean forms as Middle Korean

[edit]

Middle Korean ends in c. 1600, but Template:ko-etym-native is currently designed to treat everything up to 1895 (!) as Middle Korean. Could someone fix this please? No Korean form first attested after 1600 should be in categories like Category:Korean terms inherited from Middle Korean.--Karaeng Matoaya (talk) 09:55, 11 October 2020 (UTC)[reply]

I would prefer having Early Modern Korean as an etymology-only language. Actual EMK lemmas can continue to be treated as obsolete or dialectal Modern Korean lemmas, of course, as done in e.g. 옥슈슈 (oksyusyu).--Karaeng Matoaya (talk) 11:00, 11 October 2020 (UTC)[reply]
@Karaeng Matoaya What is your proposal exactly for how to fix the issue? What should the template display when the source is > 1600? BTW one source is from 1601, not sure if that counts as Middle Korean. Also, there's a comment in the template text:
"The shorthand parameter for first attestations will be deprecated soon in favor of creating actual entries or citation pages for Middle Korean"
"See discussion at Wiktionary:Beer parlour/2019/September#First attestations in the etymology section"
Also, are you sure that every one of the "first attested" sources reflects contemporary speech? It seems quite possible to me that a post-1600 source might be quoting texts from an earlier era, in which case the "Middle Korean" category might be valid. Benwing2 (talk) 21:58, 11 October 2020 (UTC)[reply]
@Benwing2 The template should display "Early Modern Korean" instead of "Middle Korean", and the automatic category should be Category:Korean terms inherited from Early Modern Korean. The rest of the template text should stay the same.
The current template (including that template text) is creating confusion for editors not as experienced with MK, leading to them creating erroneous Middle Korean entries that are actually Early Modern Korean and are sometimes outright illegal according to the vowel harmony rules of MK. Just today I had to fix this for 옥슈슈 oksyusyu.
On your second point, yes, we're quite sure that most post-1600 first attestations represent contemporary speech. There was a break in the scribal tradition associated with the Japanese invasions of Korea between 1592 and 1598, and seventeenth-century editions of Middle Korean works already make "corrections" of the sort that show that educated Koreans were no longer familiar with characteristically Middle Korean grammar or vocabulary. The 1601 work would be EMK, as a post-Japanese invasion work.--Karaeng Matoaya (talk) 23:09, 11 October 2020 (UTC)[reply]
@Karaeng Matoaya OK, I'll fix it. You mentioned above "I would prefer having Early Modern Korean as an etymology-only language". It already is, using the code ko-ear. Benwing2 (talk) 23:26, 11 October 2020 (UTC)[reply]
@Karaeng Matoaya OK, fixed. See 이루다 for an example. It's currently impossible to get a category like Category:Korean terms inherited from Early Modern Korean added by normal means because Early Modern Korean is an etymology language variant of Korean and a language can't inherit from itself. Benwing2 (talk) 05:36, 12 October 2020 (UTC)[reply]
Thank you so much! Wow, 이루다 (iruda) is a mess, I should fix it.--Karaeng Matoaya (talk) 07:16, 12 October 2020 (UTC)[reply]
@Karaeng Matoaya Can you take a look at 드러나다? It references a nonexistent reference work "kk" which is causing an error. Thanks! Benwing2 (talk) 03:41, 13 October 2020 (UTC)[reply]

Optional Russian Locative

[edit]

Going through a few entries in Russian, I noticed that Russian words with an the non-obligatory locative case (e.g. both в це́хе and в цеху́ are correct, while only в шкафу́ is correct) are annotated inconsistently.

  • мыс lists мы́се, мысу́
  • хлев lists в хлеву́* (*optionally in footnotes)
  • цех only lists цеху́ (explanation that в це́хе is also correct could be written in the footnotes)

Which of these approaches should be taken to ensure consistency and clarity? 1998alexkane (talk) 21:36, 11 October 2020 (UTC)[reply]

@1998alexkane, Atitarev Good question. I'm thinking we should use a footnote. Forms like мы́се and це́хе aren't really the locative case, they're just the prepositional. Also if we put them under the "locative" category, my script to auto-create Russian forms will tag them as being locative in addition to prepositional, which isn't correct. Now, I could add a special check for this, but I think the footnote approach is better. Benwing2 (talk) 21:48, 11 October 2020 (UTC)[reply]
@Benwing2, 1998alexkane:. It's not easy. Generally, if locative is listed, it means, it's allowed but it doesn't mean that prepositional is not. Also the prepositional may be used for a different sense only, cf. о го́де, в году́. BTW, I would use в хлеву́ more often than в хле́ве, it sounds more natural, so not sure the footnote is right. I am OK with footnotes as the approach. --Anatoli T. (обсудить/вклад) 22:52, 11 October 2020 (UTC)[reply]
@Benwing2, Atitarev: Given some of the intricacies of the usage of the locative, it probably makes sense to use the usage notes section rather than a small footnote. In cases where it is not as complicated (e.g. the locative variant is rare, dated, or colloquial), I think a footnote in the declension box makes sense. 1998alexkane (talk) 16:54, 12 October 2020 (UTC)[reply]
@Benwing2, 1998alexkane: It depends how much info we really want to put into declension tables. It can't be automated, so someone would need to go and insert term-specific footnotes. It doesn't have to be long. Perhaps a user/learner perspective would be interesting. For example, "в хлеву́" is more common and natural than "в хле́ве" , same with "в цеху́"/"в це́хе". Do we need to provide footnotes for that? What about "о го́де"/"в году́", "о сне́ге"/"в снегу́", "на двери́"/"при две́ри", "на дому́"/"на до́ме" where prepositional/locative have different usages? Let me know your thoughts. --Anatoli T. (обсудить/вклад) 00:39, 13 October 2020 (UTC)[reply]
@1998alexkane, Atitarev: All this info would be useful for me. We should definitely adopt the perspective of the language learner and use footnotes/usage notes as appropriate. Benwing2 (talk) 01:27, 13 October 2020 (UTC)[reply]
@Benwing2, Atitarev: I agree with Benwing for the most part. However, I don't feel it is necessary to mention cases like "о го́де"/"в году́", "о сне́ге"/"в снегу́", "на двери́"/"при две́ри" since the locative in -у́/-и́ (as in двери́) is only used with the prepositions в/на (never with о or при). Noting these general grammatical rules belongs in the Russian noun appendix. — This unsigned comment was added by 1998alexkane (talkcontribs).
@Benwing2, 1998alexkane: OK, let's see if we can achieve what we want with short footnotes. I've just added a footnote to цех (cex) - в цеху́* as "Preferred." (the meaning being it's preferred over the prepositional form "в це́хе"). Does it actually help? --Anatoli T. (обсудить/вклад)
@1998alexkane, Atitarev: This does help. I think it would be even clearer if it said "preferred over the prepositional form" or something like that. Benwing2 (talk) 07:32, 13 October 2020 (UTC)[reply]
@Benwing2, Atitarev: I think even better would be "preferred in conversation" since, at least according to a quick Ngram search, "в цехе" appears to be more common in writing. 1998alexkane (talk) 13:49, 13 October 2020 (UTC)[reply]
@Benwing2, 1998alexkane: 1998alexkane is right, actually. "в цеху́" rolls easier off the tongue but "в це́хе" may be more common in writing. The same deal is with о́тпуск (ótpusk) (the entry has so many SoP red links, which is Wanjuscha's habit). --Anatoli T. (обсудить/вклад) 22:19, 13 October 2020 (UTC)[reply]
@Benwing2, 1998alexkane: I can help put terms with locatives in a couple of categories, so that footnotes could be added but I don't feel like going through entries myself, so if you give me short lists, I will have a look, so they can be updated similar to о́тпуск (ótpusk) or цех (cex). --Anatoli T. (обсудить/вклад) 00:38, 14 October 2020 (UTC)[reply]
@Atitarev, 1998alexkane: The list of Russian nouns with locatives can be found here: Category:Russian nouns with locative. I converted the list into a single page, here: User:Benwing2/ru-nouns-with-locative Maybe you can add short notes by some of them, e.g. indicating whether the locative form is mandatory (with в or на), preferred or optional. Benwing2 (talk) 01:06, 14 October 2020 (UTC)[reply]
@Benwing2, 1998alexkane: Aha, OK, thanks. It's not such a big list. I'll get to them over time. --Anatoli T. (обсудить/вклад) 01:11, 14 October 2020 (UTC)[reply]
@Benwing2, Atitarev: Another good reference for this may be the online version of Русская Граматика (1980) § 1182 and § 1193. 1998alexkane (talk) 01:52, 14 October 2020 (UTC)[reply]
@1998alexkane, Benwing2: Thanks, this is helpful. I can use that. --Anatoli T. (обсудить/вклад) 04:36, 15 October 2020 (UTC)[reply]

Remaining Kurdish lemmas

[edit]

@Calak, Vahagn Petrosyan, Balyozxane, Şêr I moved almost all the "Kurdish" lemmas to Northern Kurdish or Central Kurdish. The vast majority of them got moved to Northern Kurdish per User:Balyozxane's investigations. For the remainder (mostly in Arabic script), I tried to move them by hand into either Northern Kurdish or Central Kurdish. What remains are of the following types:

  1. Words borrowed from Armenian, created by User:Vahagn Petrosyan. These are cited as coming either from a Kurdish-French dictionary from 1879 by Jaba and Justi or from the Hayerēn armatakan baṙaran from 1926-1935 by Ačaṙean. They are written in a highly nonstandard Arabic script (as compared with Sorani conventions). Do these really belong? Per Vahag, we can't even identify which Kurdish language they come from.
  2. Some unverifiable Latin-script words created in 2006 by User:Drago, who has not been active since 2006. They include ren (hip), spar (errand), tab (patience), tas (cup) (this one looks like French}, Tiwana ((a male name)), varsin (freedom), xwigî (extortion).
  3. A few miscellaneous unverifiable terms: ئائل (a'li, unworthy; unacceptable; incorrect); برج (birc, tower; zodiac; constellation) (this one comes with an extensive etymology but it's not in normal Sorani spelling); سڕ (sirr, numbness).

For (2) and (3), I think we should submit them to RFV. For (1), I'm not sure.

One more question has to do with Arabic-script lemmas in Northern Kurdish. Should these exist? I gather that Northern Kurdish was in fact written in Arabic script until 1932, but (a) I'm not sure it was spelled the way the lemmas have it (which follow Sorani conventions), (b) I'm concerned having them as lemmas adds undue weight (per User:Chuck Entz). An alternative is not to have them as lemmas but just list them as alternative spellings on the corresponding Latin-script pages. Benwing2 (talk) 04:08, 13 October 2020 (UTC)[reply]

Category (1) contains real Kurdish words and should not be deleted just because they are in a non-standard spelling. A knowledgeable Kurdish editor could normalize them to the Latin script. They are all Northern Kurdish, since that is the dialect that borrows from Armenian. --Vahag (talk) 06:02, 13 October 2020 (UTC)[reply]
Don't delete Category (1). Delete all entries in Category (2) and just ئائل (a'li) in Category (3).--Calak (talk) 08:34, 13 October 2020 (UTC)[reply]
Move this Category:Kurdish proper nouns to kmr.--Calak (talk) 10:10, 13 October 2020 (UTC)[reply]

@Calak, Vahagn Petrosyan, Balyozxane, Şêr, Chuck Entz, Metaknowledge Thanks everyone. There are now no more Kurdish lemmas and no more Kurdish non-lemma forms. Should we delete the ku language code or add an abuse filter to prevent people from creating new entries? Benwing2 (talk) 01:30, 14 October 2020 (UTC)[reply]

There's a few more that aren't in the correct categories. DTLHS (talk) 01:33, 14 October 2020 (UTC)[reply]
I support removal of the code once it is not being used anywhere on the site. @-sche as well. —Μετάknowledgediscuss/deeds 01:35, 14 October 2020 (UTC)[reply]
We would also need to fix translations tables, descendants sections, etymologies, etc which use this code before deleting it, e.g. in medicine (translation), jineology (etymology). Try insource:ku (although this catches chaff) or search a database dump for "|ku", "=ku" etc; there seem to be a few thousand such entries. After that, yes, if we're not going to have entries in it, remove it as a full code. Do we want to keep it as an etymology-only code for any cases where we don't know what variety of Kurdish a word in some other language is borrowed form? (We have some terms currently derived from generic "Kurdish", including the curious Category:Central Kurdish terms derived from Kurdish.)
Btw, one thing to consider is leaving a short "comment" indicating that the code has been deleted, to deter well-meaning but under-informed readdition: I found this to be helpful after someone added back...some code, "fat" I think; and in any case you can see what I did if you Ctrl-F "fat" in Module:languages/data3/f. (At the risk of getting off topic, it might be helpful if someone went through WT:LT and systematically added pointers along the lines of "foo" has been removed, see WT:LT for any removed codes that don't already have a note of some kind.) - -sche (discuss) 02:31, 14 October 2020 (UTC)[reply]
@-sche Thanks for pointing out the issue with translation tables in particular. Fixing all of them will be difficult; in some cases the entry has "Sorani" or "Kurmanji" by it, and fixing those can be automated, but the remainder have to be done by hand by someone knowledgeable in the Kurdish languages. We can probably assume that the ones in Latin script are Kurmanji, but it appears the ones in Arabic script may be either (or occasionally even Southern Kurdish). This suggests to me that for the moment we should create an abuse filter to block new entries and translation table items and such. Benwing2 (talk) 02:50, 14 October 2020 (UTC)[reply]
@Mahagaja You added the various instances of Central Kurdish lemmas inheriting from "Kurdish". What was your intention behind this? Benwing2 (talk) 04:03, 14 October 2020 (UTC)[reply]
I have no particular recollection of doing that; I was probably cleaning up uses of {{etyl}}. I must have noticed that Kurdish is listed as an ancestor of Central Kurdish and figured in that case, it must have inherited terms from Kurdish (which I thought was defined as the macrolanguage rather than as a specific dialect). If that's wrong, feel free to fix the etymologies. —Mahāgaja · talk 06:05, 14 October 2020 (UTC)[reply]
@Mahagaja, Benwing2 I cleaned the category Category:Central Kurdish terms derived from Kurdish. They were all Northern kurdish cognates.--Balyozxane (talk) 15:09, 14 October 2020 (UTC)[reply]
@Balyozxane Thanks! I'll see if I can clear the remaining members of Category:Terms derived from Kurdish. Benwing2 (talk) 00:48, 15 October 2020 (UTC)[reply]

-sche is right, still we need ku code for etymology. For example Persian آسو is from which dialect of Kurdish (with regard to Kurdish ō > Persian u)?--Calak (talk) 08:56, 14 October 2020 (UTC)[reply]

Etymology-only languages have to be varieties of some existing language. What language would ku be an etymology-only variant of? —Mahāgaja · talk 10:36, 14 October 2020 (UTC)[reply]
Ah, that's true, it would need to be a family code. (We have done this before, move a macrolanguage code over to being a family code, including recently IIRC.) - -sche (discuss) 10:40, 14 October 2020 (UTC)[reply]

Redirects from hyphenated entries

[edit]

What's the policy on redirecting from typographic variants where the only difference is a hyphen vs. a space? For example, fishing-frog hard redirects to fishing frog. Do we have an established policy on this? Is it preferable to use soft redirects instead? Andrew Sheedy (talk) 14:48, 13 October 2020 (UTC)[reply]

I don't see the point. Why not just create alt-form pages? On fishing frog you follow the link and end up on the same page. – Jberkel 15:36, 13 October 2020 (UTC)[reply]
I'm not sure I follow... Currently, there is a mix of alt-form pages (soft redirects) and hard redirects. I'm asking if we have a preference, for consistency's sake. Andrew Sheedy (talk) 02:41, 15 October 2020 (UTC)[reply]
Sorry, I misunderstood the concept of soft-redirects. Perhaps we could automatically identify these redirect loops and then check where the hard-redirects are actually appropriate, and convert the rest. – Jberkel 09:43, 16 October 2020 (UTC)[reply]
My understanding is that we prefer soft redirects to hard redirects, unless the hyphenated form is merely the "attributive" form, in which case there is not supposed to be a 'form of' entry (but I would prefer a hard redirect rather than just nothing to point someone to the lemma). - -sche (discuss) 02:37, 14 October 2020 (UTC)[reply]
OK, thanks. That's what I've understood to be the case as well, but since I've come across hard redirects for this sort of thing quite frequently, I wasn't sure if there was a maybe a policy that had been implemented that I didn't know about. Andrew Sheedy (talk) 02:41, 15 October 2020 (UTC)[reply]
In many of the instances I've seen, the redirects are either very old (from before practices were standardized), or the result of page-moves where someone forgot or neglected (or was unable, as a non-admin) to tick the 'suppress creation of a redirect' box. - -sche (discuss) 02:57, 15 October 2020 (UTC)[reply]
OK. I'll go ahead and convert them to soft redirects when I see them (if they're attestable, of course). Andrew Sheedy (talk) 04:09, 15 October 2020 (UTC)[reply]

2019-2020 ISO code changes

[edit]

This year, the ISO adopted a number of changes (changes we've made or rejected have been struckthrough):

  • They created "ckm" (Chakavian, a form of Serbo-Croatian I presume we will not be adding except perhaps as an etymology code), "fif" (Faifi, "an Old South Arabian language spoken by about 50,000 people"; @Fay Freak if you have knowledge/interest/input), "nsb" (Lower Nossob, extinct but attested), "dmf" (Medefaidrin, discussed in another section, above), "zba" (Balaibalan, also mentioned above) (for "dmf" + "zba" see #Medefaidrin) "gmr" (Mirning), "xnm" (Ngumbarl), "wlh" (Welaun).
  • They retired "xtz" and created "xpl" Port Sorell Tasmanian (we have this as "aus-psl"), "xpb" Northeastern Tasmanian (our "aus-pye" = "Pyemmairre"), "xpz" Bruny Island Tasmanian (our "aus-bru"), "xpd" Oyster Bay Tasmanian (our "aus-par" = "Paredarerme"), "xpf" Southeast Tasmanian (our "aus-set"), "xpx" Southwestern Tasmanian ("aus-too", "Toogee"), "xpv" Northern Tasmanian ("aus-tom", "Tommeginne"), "xpw" Northwestern Tasmanian (we have this as "aus-pee" under the fantastic name "Peerapper"), "xph" North Midlands Tasmanian (Tyerrernotepanner, which we were discussing but had not added yet). (We were already working on this: 1, 2; sometimes I wonder if some of the people who request codes read WT:RFM, heh.) @Metaknowledge. We also have one lect name they don't seem to have covered(?), "aus-lsw" for "Little Swanport Tasmanian", and we were discussing a couple others at link 2.
  • They retired "sdm" and created separate codes for "gef" Gerai, "ebc" Beginci, and "sdq" Semandang.
  • They changed the names of "adb" (which I raised the outright spuriousness of at RFM; sometimes I wonder if...), "huc", and "moe".
  • They retired "aoh" (which I already flagged as spurious, but had not yet removed), "ayy" (Wikipedia says "ays" and "dyg" are also spurious, but ISO did not retire them, and we have not retired any of the three yet), "bpb" (unattested, already called out by us), "cca" (undemonstrated), "cdg" (a spurious language we already retired), "dgu" (a spurious language we already retired), "kjf" (ostensibly an Indo-Iranian lect, apparently only the Turkic Khalaj language exists, "klj"; we already handled this), "lmz" (Lumbee, on the grounds that it is unattested), "plp" (Palpa language (Indo-Aryan), which is attested but arguably spurious, a form of "ne"), "tbb" (apparently not a language), and "zir" (considering it to be the same as "scv"; @Meta again, if you have knowledge/input).
  • They created "csp" and "cnp" for Southern and Northern Ping Chinese (Pinghua); my limited understanding is that we include codes for Chinese lects even if they don't get separate headers, because they are used in some templates(?) (e.g. we have "wuu"). Some of our Chinese editors should figure our whether we need/want to do anything here; I see we have Pinghua as a single variety in Module:zh-dial-syn.
  • They merged "gli" and "drr" into "kzk" (we already did this). They merged "kxl" into "kru" (and changed the latter's name). They merged "nxu" into "bpp". They merged "thw" into "ola", whereas we have already merged both into "bo". They merged "xrq" into "dmw".

I expect we should mostly follow these changes, where they're not following moves we ourselves already made, except that I'll leave "Ping Chinese" up to our Chinese editors, I know we're not adding Chakavian as a full language, and I also wouldn't make any of the language renames except on a case-by-case basis because we have somewhat different criteria from them for names. If anyone has input, excitement or dismay about any particular change, speak up. In cases where they granted codes to things we already have exceptional codes for, it would be helpful for someone to change the codes over by bot (edit: if there are a lot of entries, but there aren't in the cases I checked). - -sche (discuss) 04:35, 14 October 2020 (UTC)[reply]

Sometimes I think we should just write up our recommendations so we don't have to wait so long for new ISO codes. Some of these (e.g. the sdm split) look like they'll take some real effort to assess. I'm surprised Faifi didn't have a code already, by the way! Chakavian looks like the only one we obviously don't want, and I'd want to see some evidence that it's needed in etymology sections before using it as an etymology-only code. @Justinrleung for the Pinghua business. —Μετάknowledgediscuss/deeds 05:26, 14 October 2020 (UTC)[reply]
I think we can include the two varieties of Pinghua, just like we include the other varieties. For naming, I think "Pinghua" is more common than "Ping" based on what I've seen. We could also make adjustments to Module:zh/data/dial to differentiate the two varieties, though the classification is still unclear for some dialects. We don't currently have much coverage of Pinghua (just 3 dialects), so it shouldn't be hard to classify. — justin(r)leung (t...) | c=› } 05:43, 14 October 2020 (UTC)[reply]
[Re adding an ety-only code for Chakavian:] Here are etymologies which manually refer to Čakavian or Chakavian: sadza, сажа; šta, što; торить; пряжа (and probably others, there was chaff and my search wasn't exhaustive). nonna has a descendants section with Chakavian. (Other entries qualify alt forms with it, which doesn't benefit from a code, like kći, or else are categorized as Chakavian, like mošnja.) It doesn't seem to be common in etymologies; OTOH it is a distinct, identifiable thing. - -sche (discuss) 10:38, 14 October 2020 (UTC)[reply]
For ease of seeing what still needs to be done, I'm going to strike out the ones that are "done". - -sche (discuss) 06:06, 14 October 2020 (UTC)[reply]
• Hearing first time about “Faifi”, a language so rarely talked about that you won’t be able fast to attest its name, and that only in English. I find no mention of it in any German, French or Arabic sources. Apparently added because talked about in the last two years or something like that. For some reason الْلُغَة الْفِيفِيَّة (al-luḡa(t) al-fīfiyya) redirects to the article on the alleged Himyarite language, which is already argued to be a ghost language, as I have mentioned on a previous discussion. But I find some Arabic mentions of a “Faifi dialect” – لَهْجَة (lahja), لهجة أهل فيفا, لهجات قبائل جبال فيفا, of people who appear to converse with Arabic speakers with differences in minor details.
• Chakavian has been handy for some descendant lists, e.g. Proto-Slavic *arębъ – maybe -sche has only searched the mainspace, but parts of Čakavian are relevant for the Proto-Slavic accent like Slovene; those mentions in etymologies of course are just cognate lists which would ideally go into Proto-Slavic descendant sections. Fay Freak (talk) 14:15, 14 October 2020 (UTC)[reply]
Faifi hasn't been well studied, but it was historically seen as an Arabic dialect with substrate influence. More recent work, including the grammar by Alfaife (who, as his name suggests, is a native speaker), upgrades Faifi to the same status as Razihi: a descendant of the Old South Arabian languages subjected to "hypercontact" with Arabic. It is certainly remarkable that the most divergent features are the phonemic inventory and core features like demonstratives, the definite article, etc, while the lexicon is mostly shared with Arabic. —Μετάknowledgediscuss/deeds 16:42, 14 October 2020 (UTC)[reply]
It is probably a bubble, the main point of interest for the notion of Old South Arabian origin being its novelty, it will stay a suspicion never confirmed. With this Estonian scholar exactly the core features phonemic inventory, demonstratives, definite article are in common with Arabic rather than with Ancient South Arabian: “it is much more likely that both Rāziḥīt and Faifi are local variants of Arabic that are on the one hand extremely archaic while also having underwent some substratal influence of local pre-Islamic languages. This is clear, amongst under things, by the amount of Sabaic lexical items in Yemeni Arabic in general.” Yea, extraordinary claims require extraordinary evidence—who could imagine that under omnidirectional Arabic pressure there Old South Arabian has survived into the bargain in a six-digit-number but unnoticed so far! Both Faifī and Rāziḥī are best treated as Abstandsprachen of Yemeni Arabic, their own dialect branches amongst the descendants of Arabic, with own language codes to make sure the corpora are kept separate, and also not merged with Yemeni Arabic by reason that they both lie afar, right at the overgang from Yemeni to Ḥijāzī (Saudi) dialects—which may be another reason for their distinction. Fay Freak (talk)
I see we actually already had ckm = sh-cha as an etymology code; I've updated the mainspace entries I found to use it; any Reconstruction: Chakavian entries that need to be updated could be. - -sche (discuss) 17:14, 21 October 2020 (UTC)[reply]

Romanization of Gothic ⟨𐌲⟩ = [ŋ]

[edit]

Why does Wiktionary romanize Gothic ⟨𐌲⟩ = [ŋ] as g instead of n? A strict letter-to-letter transliteration which does not accurately represent pronunciation is redundant when the romanization is given side-by-side with the original script. Currently, Greek ἄγγελος (“angel”) is romanized as ángelos rather than ággelos, but Gothic 𐌰𐌲𐌲𐌹𐌻𐌿𐍃 (“angel”) is romanized as aggilus rather than angilus. Given that Gothic and Greek orthography are in this respect one in the same, I think it would be desirable to introduce some consistency in their romanization.

So you'd have, for instance:

And so on. Rhemmiel (talk) 06:31, 14 October 2020 (UTC)[reply]

It's really a matter of tradition. All scholarly works on Gothic I've ever seen romanize ⟨𐌲⟩ as ⟨g⟩ regardless of whether it stands for [ɡ] or [ŋ], whereas scholarly works on Greek romanize ⟨γ⟩ as ⟨n⟩ when it stands for [ŋ]. So people coming here who are used to romanized Gothic from other resources will expect to find drigkan and aiwaggēljō, not drinkan and aiwangēljō, however inconsistent it may seem. Incidentally, we do the same thing for ⟨𐍅⟩, romanizing it as ⟨w⟩ both when it stands for [w] and when it stands for [y] (e.g. 𐍃𐍅𐌽𐌰𐌲𐍉𐌲𐌴 (swnagōgē), not synagōgē or sünagōgē). That's also consistent with other scholars' transliteration systems. —Mahāgaja · talk 07:00, 14 October 2020 (UTC)[reply]
The difference here is that Wiktionary gives the original script and the romanization side-by-side. Gothic scholarly works are typeset entirely in the Latin alphabet and as a consequence a one-to-one character mapping is required in order that scholars be able to reconstruct the original Gothic spelling being referenced. Since Wiktionary is not bound by this constraint and is used in a very different capacity as Gothic scholarly literature (namely, as an general audience etymology-heavy dictionary), I still feel that a Greek-type romanization system would better suit our purposes. Rhemmiel (talk) 10:37, 14 October 2020 (UTC)[reply]
Oppose, that's what the pronunciation section is for. 212.224.239.136 10:44, 14 October 2020 (UTC)[reply]
Basically every source (scholarly or otherwise) uses <g> here as Mahagaja noted, which works just fine. The given reasons for changing this system on Wiktionary don't seem particularly compelling. — Mnemosientje (t · c) 14:32, 14 October 2020 (UTC)[reply]
To throw in another (maybe not too relevant) datapoint, this is also consistent with how we treat Old Cyrillic, so e.g. in the quote at херовимъ (xerovimŭ) we get а҅гг҄ели (a҅ggʹeli, angels) transliterated with ⟨gg⟩ despite the first ⟨г⟩ representing [ŋ]. — Vorziblix (talk · contribs) 21:01, 14 October 2020 (UTC)[reply]

Pronunciation of 'z' in Finnish

[edit]

I noticed the Finnish entry for Azerbaidžan has the IPA /ˈɑtserbɑi̯dʒɑn/ [ˈɑts̠e̞rˌbɑi̯dʒɑn], which I've never heard in my life. Whenever Azerbaijan has been mentioned in the news and whenever I've talked with people about it, the <z> has been pronounced as identical to /s/ or very close to it with an inconsistent difference.

It's true Finns often pronounce <z> as /ts/ or even [t͡ʃ~tʃ] or whatever, but in names like Azerbaidžan, Kazakstan, Uzbekistan, etc. that sounds just as grating as eg. Belgia as [ˈpe̞lkiɑ]. Even the Wikipedia article on Finnish phonology gives azeri as an example of the very marginal use of <z>, pronounced as /s/. If you went on ultraprescriptivist hyperdrive, it'd have /z/ and that's what the Finnish Wiktionary gives as one of the options (the other being with /s/, of course) but practically no one would pronounce the /z/ differently from /s/ at least consistently.

One solution could be /ˈɑzerbɑi̯dʒɑn/ [ˈɑs̠e̞rˌbɑi̯dʒɑn], prescriptivist phonemically and descriptivist phonetically, but it might only cause confusion. Well, in any case, the <z> shouldn't be /ts/ in the case of Azerbaidžan, Kazakstan, Uzbekistan, etc. I won't fix it myself because, like I said, I'm not entirely sure what it should be replaced with; maybe just go with /z/ even if it's hyperprescriptivist, or...? VHGW (talk) 14:38, 14 October 2020 (UTC)[reply]

Pinging recently-active Finnish native speakers @Hekaheka, Surjection, Tropylium. - -sche (discuss) 21:13, 14 October 2020 (UTC)[reply]
"z" is a problematic letter in terms of Finnish pronunciation. It is often pronounced as "ts" as in Azorit (Azores) or Venezuela, but as VHGW comments, that's not always the case. I don't think there's one "canonized" pronunciation, but to my ear the first "z" does not sound like "s" but rather the same as the "ž" further down. Luckily, this is easy to fix. Writing {{fi-pronunciation|ažerbaidžan|r=ɑidʒɑn|h=A‧zer‧baid‧žan}} yields --Hekaheka (talk) 21:46, 14 October 2020 (UTC):[reply]
  • IPA(key): /ˈɑʒerbɑi̯dʒɑn/, [ˈɑ̝ʒe̞rˌbɑ̝i̯dʒɑ̝n]
  • Rhymes: -ɑidʒɑn
  • Hyphenation(key): A‧zer‧baid‧žan
I've definitely heard and used /utspekistan/ and wouldn't be too surprized to hear a spelling pronunciation along the lines of something like /atserbaitsan/ either, but yeah the prescribed standard pronunciation is just /aserbaitšan/; perhaps /azerbaitšan/ as the hyperprescriptivist option. There most certainly isn't any [dʒ] in there though, this seems like the most obvious inaccuracy in this to me. --Tropylium (talk) 02:15, 15 October 2020 (UTC)[reply]
I basically agree with this (except for the /dʒ/) and have changed the page accordingly. Perhaps IPA(key): /ˈɑʒerbɑi̯dʒɑn/ could be mentioned, but I'd still consider it to be a mistake, probably caused by the /dʒ/ later on in the word (Finnish speakers aren't that used to this variety of sibilant sounds). — surjection??07:01, 15 October 2020 (UTC)[reply]
Shouldn't we try to refrain, as far as feasible, in "standard" pronunciation in order to not be overly confusing for our eventual English-speaking users? They should be assured that they don't get laughed at if they follow our pronunciation guidelines. I mean , we don't and shouldn't have the pronunciation IPA(key): /'olumpːialaiset/ replace ' with ˈ, invalid IPA characters (') for olympialaiset (well, currently we have none). Along these lines, I would erase IPA(key): /ˈɑtserbɑi̯dʒɑn/ or at least label it nonstandard. --Hekaheka (talk) 09:43, 15 October 2020 (UTC)[reply]
Surjection I see that you changed the way the pronunciation -template handles the letter "z", but did it go right? Isn't it so that "z" still is, as a rule, pronounced "ts"? Check e.g. "venezuelanpassio" in kielitoimistonsanakirja.fi. Also, I would say Firenze, La Spezia, Arizona and Azorit are pronounced with "ts", but then on the other hand, Zimbabwe, Swazimaa and Zaire I have usually heard with "z" or even "s". Mozambique we even write as Mosambik. The question is, which should be the rule and which the exception. --Hekaheka (talk) 10:10, 15 October 2020 (UTC)[reply]
I had to change it because there would otherwise be no easy way to represent /z/ through {{fi-pronunciation}}. The pronunciation as /ts/ can be specified manually as ts (as I have done appropriately for all of the entries that used it; note that zz still becomes /ts/). Since <z> is only present in some loanwords, it's feasible to check them manually. — surjection??10:32, 15 October 2020 (UTC)[reply]
After reading this topic, I can't tell with certainty what pronunciation is right for these words but it's good that all valid pronunciations are now listed at e.g. Azerbaidžan. --Anatoli T. (обсудить/вклад) 10:10, 15 October 2020 (UTC)[reply]
There is a reason I listed it last - if you feel that pronunciation is too rare to be listed, feel free to remove it entirely. — surjection??10:33, 15 October 2020 (UTC)[reply]

Suggestion for new bot

[edit]

Just in case any bot-writers are looking for an interesting project, something that would be useful in my opinion would be a process to automatically resync the order of translations to the order of English senses, which is a pain to do manually, especially for entries that have dozens of senses. I understand, of course, that translation headings do not always exactly match, word for word, the English definitions, so some "fuzzy" matching would be necessary, and if no "good enough" match is found then the bot would have to give up, or at least give up on that entry. Anyway, thought I'd mention it. Mihia (talk) 19:30, 14 October 2020 (UTC)[reply]

WT:Votes/2020-09/Removing Old English entries with wynns ends in a week. Pinging active users in Category:User ang who have yet to vote.

@Burgundaz, Frigoris, Leasnam, Leornendeealdenglisc, DerRudymeister, EncycloPetey, Garnetskull, Hk5183, Malku H₂n̥rés, Reordcraeft, Rua, Stardsen, StrongestStrike, Talonpedia, ThaesOfereode, TheSilverWolf98, Widsith --{{victar|talk}} 20:00, 14 October 2020 (UTC)[reply]

Bermud(i)an English

[edit]

Hi all. I have seen that the category for words specific to the vernacular spoken in Bermuda is called "Bermudan English," which I propose should be changed to "Bermudian English" in order to match with the Wikipedia article. Cheers. SisyphusOfTheMoors (talk) 02:05, 15 October 2020 (UTC)[reply]

Also, it probably should be in Category:North American English rather than Category:Caribbean English. Daleusher (talk) 07:24, 19 October 2020 (UTC)[reply]

As we recently did something I don't really understand with the Kurdish language, Wiktionary:About Kurdish is now longer useful. We should probably rewrite the page from scratch. --Daleusher (talk) 22:57, 18 October 2020 (UTC)[reply]

@Daleusher Yes. We need to rewrite it. What we did is to split "Kurdish" lemmas into Category:Northern Kurdish lemmas and Category:Central Kurdish lemmas. This is because "Kurdish" is not a language but a family of languages. (For example, Northern Kurdish aka Kurmanji has four cases and three genders and is written in the Latin script, whereas Central Kurdish aka Sorani has no cases and no genders and is written in the Arabic script.) Benwing2 (talk) 03:33, 19 October 2020 (UTC)[reply]
I speak Central Kurdish and my dialect has both gender and case. Many people know so little about Kurdish.--Calak (talk) 15:09, 19 October 2020 (UTC)[reply]
WT:Language treatment also needs to be updated, and Module:languages/data2 and Module:families/data need to be updated to make ku a family rather than a language. —Mahāgaja · talk 15:49, 19 October 2020 (UTC)[reply]
@Calak Apologies, I'm going by what Wikipedia says, presumably referring to the standard form of Central Kurdish (Sorani). I didn't realize there are dialects with gender and case, although it makes sense because Kurdish is a dialect chain. BTW I have some questions/concerns about Kurdish phonology as we handle it. For one, it appears that Northern Kurdish has aspirated stops; if so, this isn't handled properly by Module:ku-pron (which is broken in this respect) and isn't reflected in Wiktionary:Kurdish transliteration. Also, per Wikipedia, Central Kurdish has 9 pure vowels but our transliteration pretends it has only 8, using a to represent long /ɑː/ and not distinguishing short /a/ from /ɛ/, both of which are written e. This presumably means that Module:ckb-pron is also broken. Benwing2 (talk) 03:52, 20 October 2020 (UTC)[reply]
BTW I've now earned the dubious honor of blocking WF. Benwing2 (talk) 04:00, 20 October 2020 (UTC)[reply]
The honor lies not in the blocking, but in the request to block. Ultimateria (talk) 17:01, 21 October 2020 (UTC)[reply]

WP is not a reliable source, forget it. All Kurdish dialects (NK, CK, SK) have 8 vowels (like MIr.). All NK accents don't have aspirated consonants, so it has no reflection in standard NK.--Calak (talk) 17:52, 20 October 2020 (UTC)[reply]

Japanese onomatopoeias

[edit]

Japanese onomatopoeias like どき, びく, ぶら, ざわ, たぷ may merit there own entries. So some questions:

  • What are their POS?
  • Because some of them are rarely used alone today, but rather in some modified forms/derived terms like たぷたぷ, たっぷり, should they have some {{lb}} to clarify this?

-- Huhu9001 (talk) 10:50, 19 October 2020 (UTC)[reply]

Korean has a Template:ko-root of which seems very relevant here. See e.g. 곰실 (gomsil) for an illustration of how it works.--Karaeng Matoaya (talk) 16:02, 19 October 2020 (UTC)[reply]
At least for etymology purposes, I would very much support the creation of entries for these elements. These have been very productive, but as noted by Huhu9001, they are seldom found in isolation.
I quite like Karaeng Matoaya's suggestion. It's perhaps a minor thing, but I see that Ideophonic root is not included in the list of allowed POS headers over at Wiktionary:Entry_layout#Part_of_speech. There's Ideophone, and there's Root, but no Ideophonic root. ‑‑ Eiríkr Útlendi │Tala við mig 17:51, 19 October 2020 (UTC)[reply]
@Eirikr Neither is "dependent noun", which are important in Korean, e.g. (geot). More importantly, the standard dictionaries of Korean actually classify dependent nouns separately as 의존 명사 (依存 名詞, uijon myeongsa). So that's definitely a POS header that needs to be added.
Ideophonic roots behave differently from other Korean roots, as they take frequentative suffixes and can always be reduplicated to further mark frequentative action. By contrast, 후줄근후줄근 (hujulgeunhujulgeun) is completely meaningless, showing that that particular root isn't an ideophone. So a separate header probably makes sense for them. But admittedly the monolingual dictionaries of Korean do not distinguish between ideophonic roots and non-ideophonic ones in their own POS classifications.--Karaeng Matoaya (talk) 18:05, 19 October 2020 (UTC)[reply]
@Eirikr: If we are to invent a new POS, it needs to be added to t:ja-pos. -- Huhu9001 (talk) 04:41, 20 October 2020 (UTC)[reply]

Add Contemporary Latin

[edit]

From Wikipedia: New Latin: “between c. 1375 and c. 1900s”; Contemporary Latin: “since the end of the 19th century”. Currently “Contemporary Latin” is not supported in modules to use in labels and etymology sections (which link to the Wikipedia article when “New Latin” is used, which says the period ended in c. 1900s but has to be used incorrectly for terms only used after c. 1900s because “Contemporary Latin” is not supported). J3133 (talk) 05:50, 20 October 2020 (UTC)[reply]

Notifying users interested in Latin: (Notifying Fay Freak, Brutal Russian, JohnC5, Benwing2, Lambiam, Mnemosientje): J3133 (talk) 08:04, 20 October 2020 (UTC)[reply]

Contrary to that Wikipedia definition, 21st century Latin is of course also New Latin. Wikipedia here just ignores it because it is rather dead since Arcadius Avellanus and original Latin works increasingly do not appear. If a word is contemporary Latin in this sense, it is either private language of someone, so the inclusion is rejectionable, or the property of it being Contemporary Latin is mostly obvious anyway because of the thing denoted. Fay Freak (talk) 13:58, 20 October 2020 (UTC)[reply]
@Fay Freak: Even if Wikipedia is wrong putting a New Latin label to a term only used after c. 1900s will link it to Wikipedia which says New Latin ended in c. 1900s, which makes the label contradictory. J3133 (talk) 14:32, 20 October 2020 (UTC)[reply]
Some nice IP fixed that Wikipedia entry in this relation 😂. Fay Freak (talk) 16:26, 20 October 2020 (UTC)[reply]
@Fay Freak: I think terms only used in Contemporary Latin should need three citations instead of one because any one person can make new Latin words, especially because it is no one’s native language, comparable to one person making new terms in a constructed language (to compare, Esperanto terms need three citations). J3133 (talk) 16:41, 20 October 2020 (UTC)[reply]
@J3133: That’s what I meant with “the inclusion is rejectionable”. Although I do not share this opinion. It would exclude contemporary Latin altogether (since see, without Stephanus Berard nobody has even written longer Latin prose at all in the last quarter of a century; it’s not a constructed language and the user number is really lower than them all, although as a natural language it has more legitimacy than them). I am content with one use – as opposed to including a proposal for a term in a dictionary of Latin neologisms, as is so common, which is half-baked. If someone has managed to make one use durable that’s already an accomplishment. Fay Freak (talk) 17:18, 20 October 2020 (UTC)[reply]
@Fay Freak: I did not claim that Latin is a constructed language (that does not make the terms people make in Latin have more legitimacy than the terms people make in Esperanto, etc.). My point is any person who is not fluent in Latin could make any word, grammatical or otherwise. J3133 (talk) 17:28, 20 October 2020 (UTC)[reply]
If one would be claimed to be more legitimate then it is Esperanto because it has more native speakers. J3133 (talk) 17:35, 20 October 2020 (UTC)[reply]
It does not work like Esperanto nor Klingon, I don’t know why one is obsessed with equating Latin to planned languages.1 The view that properly adapted Latin is artificial is alien to me. One can as well compare it with Lower Sorbian or Manx. Any one Lower Sorbian speaker can also make new Lower Sorbian words if desired, grammatical of course – I don’t know what “otherwise” means here.
1: Actually I do know. Though you may be incapable to own it, it is heavily based on English nativization, deriving from the fact that Anglosaxons habitually pronounce Latin words in perverse ways. If one even pronounces one of these international terms like a Roman, for instance Musculus erector spinae in an English sentence, one is charged with artificiality. But the reverse is the case, and he who speaks Latin only reverts the things to their natural states. Fay Freak (talk) 19:17, 20 October 2020 (UTC)[reply]
“Obsessed” would be your responses to my example, even after the claim that constructed new words in this dead language are more legitimate than one with native speakers. J3133 (talk) 19:25, 20 October 2020 (UTC)[reply]
I have not claimed this. I have claimed the whole language is more legitimate than Desperanto – which latter one constructs, no matter the native speakers the Zamenhof sect has spawned as strawmen (“think about the children”). So it’s logical that Latin material is esteemed higher. Constructed languages have higher requirements because they are an evil – figuratively speaking an agglomeration of malignant tumors that accretes and overgrows everything like cancer and holds the documentators of language up to ridicule and uglification by reason of their users lacking the natural restrictions of pudity. Fay Freak (talk) 19:51, 20 October 2020 (UTC)[reply]
I will not discuss, which is futile, your absurd claims any more because, evidently, your point is insulting Esperanto. It is obvious who is “obsessed”. J3133 (talk) 20:01, 20 October 2020 (UTC)[reply]
Absurd it is how much somebody seeks to pigeonhole the Latin language without understand a soupçon of it. Assuming your languagebox is exhaustive, the insult is to the detriment of Latin because of a slant towards Esperanto. I always respect people who count more languages with them than I do, but I could only warn against engagement in planned auxiliary languages, because what attitude towards language must one have to take it that easy? We call persons with such attitudes, in German Dünnbrettbohrer. People again who, though they believe themselves progressive, mostly leave a lot of scrap for others to deal with by cutting off the complicated parts. They make things fast and they make things bad. And those are those from which the dictionary is protected by the known special criteria for constructed languages, because we need particular filters. The characters of the language communities and hence their corpora differ essentially. Fay Freak (talk) 23:27, 20 October 2020 (UTC)[reply]
@J3133 I believe it's already more or less the norm for New Latin terms to require three citations, though the votes about that did not pass. At least one New Latin sense that had fewer than three citations has been deleted in fact. A special policy for contemporary Latin is therefore not needed. And yes, engaging Fay Freak in full rant mode is certainly a waste of your time. ←₰-→ Lingo Bingo Dingo (talk) 15:17, 23 October 2020 (UTC)[reply]

Move English Section to top of the page

[edit]

Currently the 'Content' box is on top of the page. Here are my arguments for why it should be moved:

  • From a user experience design, whenever people go to a website, they expect to see text or image as the first thing on the page. The English Wikipedia is a common example of that, but other examples are any news article.
  • If the content box is long, the user actually has to scroll down to see the definitions that they came for: example. This is inconvenient.
  • Pretty much all dictionaries have the definition/pronunciation/ as the first content under the term. When a user goes to the wiktionary, they likely expect a similar experience to other dictionaries they have visited in the past. An exception from this might be if the experience offered is superior in terms of usability, which I argue it is not based on my other points.

I therefore suggest that the English section is moved to the top instead, with the content box underneath it. I think this makes sense, given that this is also the english wiktionary. QuantumWasp (talk) 02:48, 21 October 2020 (UTC)[reply]

This is not an English dictionary, but rather a multilingual-to-English dictionary. Users may be coming here for terms in Swahili or Afrikaans or Quechua. The "contents" box at the top provides a quick means to see whether the desired section is even present on the page.
For regular users who create accounts, we have various other options and gadgets, such as the Tabbed Languages gadget. You might try that, and see if that presents things more to your liking. ‑‑ Eiríkr Útlendi │Tala við mig 04:56, 21 October 2020 (UTC)[reply]
I am inclined to leave the table of contents (TOC) at the top, but I do think this gets at or brings up some things which have been brought up before and which we should consider, which include moving the TOC to the right and allow content to be displayed on the left 'alongside' it for computer users (but probably not mobile users), and maybe moving definitions higher (e.g. above etymologies). (Also, can we reduce the amount of empty white space that exists, e.g. next to headers?) - -sche (discuss) 06:42, 21 October 2020 (UTC)[reply]
Moving the ToC to the right side by default (for those devices that support it) has seemed like a good idea to me from my first year editing. But doesn't work on phones and may not work even on tablets.
I believe that there are many users of Wiktionary that have no interest in FL content as well as those whose FL interest is limited to translations. If we ever get a mobile phone app, we should probably consider someway of allowing users to limit what has to be navigated so navigation aids like ToC are not necessary. A monolingual English Wiktionary would be one version with an obvious user base. It could be even made available without translations to reduce bandwidth requirements. DCDuring (talk) 14:09, 21 October 2020 (UTC)[reply]
@DCDuring: What about images, links to other Wikimedia projects, etc. that are on the right side? Would they be moved to below the TOC? Some pages have a long TOC, which would move them to below all content of the English entry. J3133 (talk) 15:00, 21 October 2020 (UTC)[reply]
I usually put links to WP, Wikispecies, and Commons at the bottom of the English sections. Images usually appear after the ToC. Images can be in a gallery, which can appear where otherwise text would appear. DCDuring (talk) 02:23, 22 October 2020 (UTC)[reply]
  • I agree that it is offputting and uninviting to users to see a huge TOC or other content irrelevant to them instead of the information that they want. I would support moving the TOC to the side, if it can be done nicely, or perhaps even collapsing large TOCs by default. Generally speaking, I would support efforts to make sure that the definitions relevant to the user's desired language (or English by default), or the start of these at least, are visible on the first page that people see on their device. Mihia (talk) 18:33, 27 October 2020 (UTC)[reply]
Re-reading the thread, I arrive again at the thought that the main concerns would appear to be addressed by the Tabbed Languages gadget. Viewing a single language's entry for any given grapheme (headword spelling) has the potential to greatly reduce the TOC of that page -- say, [[a]] or [[]] or [[ص]]. ‑‑ Eiríkr Útlendi │Tala við mig 19:00, 27 October 2020 (UTC)[reply]
I think that this "gadget" is very unlikely to be discovered by ordinary casual users. Even if someone should stumble across it at Special:Preferences#mw-prefsection-gadgets, there seems to be no information about what it is or what it does. Generally speaking, I believe that options, preferences, gadgets etc. tucked away on subsidiary screens will only be noticed by a small minority of hardcore users. For the rest, it should just "work well" by default. Mihia (talk) 19:22, 27 October 2020 (UTC)[reply]
Apologies for not more clearly stating my thought -- we have the functionality to display to users exactly what they want, in the Tabbed Languages gadget. Perhaps we should find a way to make this gadget's behavior the default for all users? ‑‑ Eiríkr Útlendi │Tala við mig 17:05, 28 October 2020 (UTC)[reply]
I hope one could opt out. DCDuring (talk) 21:17, 28 October 2020 (UTC)[reply]
I will say, I think the move to replace separate "Synonyms", "Antonyms", "Coordinate terms" sections with simple lines directly below each definition is helping to reduce the amount of unnecessary whitespace (around headers, etc), reduce the length of the TOC, and also help with situations where there are a lot of definitions and users have to scroll back and forth between them (or figure out "jump" templates exist) and also are liable to change or reorder a definition without changing the synonyms section, etc. I also grow ever more sympathetic to the idea of moving etymologies down below definitions, since only a very few entries have multiple ety sections. - -sche (discuss) 21:04, 27 October 2020 (UTC)[reply]
A less revolutionary approach to the overlong-etymology section problem is to use show-hide bars or, better, a control like what we use for quotations to hide speculative etymologies, long lists of cognates, and any other material of relatively specialized interest. DCDuring (talk) 21:17, 28 October 2020 (UTC)[reply]
Is there any way we could do that via CSS and/or sitewide JavaScript, rather than having to edit each entry? ‑‑ Eiríkr Útlendi │Tala við mig 22:05, 28 October 2020 (UTC)[reply]
@Eirikr What the the 'that' that you are referring to: one or more of -sche's proposals, mine, or something above? DCDuring (talk) 03:13, 29 October 2020 (UTC)[reply]
@DCDuring: Specific to my comment replying to yours just above: I'm curious if there is any way we could use CSS or JavaScript to implement some kind of hiding of page sub-sections, as you (DCDuring) suggest, without having to edit all entries to add in fold-away templates. ‑‑ Eiríkr Útlendi │Tala við mig 23:43, 29 October 2020 (UTC)[reply]
My technical foo in that regard is negligible, possibly nil or worse. I do know that for the affix-see and quotations controls, the content that is hidden is distinctively marked in wikitext. That is not true of the text that is IMO excessive in etymology sections. I don't know whether CSS allows one to resequence text as -sche would like with respect to the various semantic relations, but it doesn't seem plausible. DCDuring (talk) 16:00, 30 October 2020 (UTC)[reply]

Bot Infestation

[edit]

A heads up for admins: I just blocked blocked 97 accounts that created or attempted to create bogus user pages in the past couple of months. The block reason: "Unauthorized Bot". These user pages are basically a way to plant links to sites that search engines will find and thus improve the sites' search engine rankings. They're dressed up with just enough randomly selected details to keep people from noticing and deleting the links.

If you see a user page that seems to be someone telling about his or her self, but the name they give doesn't match the user name, or the gender or geographic details don't match, delete the page immediately and block the account. For some reason the bot often seems to be inserting lots of text with links, which the edit filter picks up, but then either doesn't save it, or only saves the fake personal details without the links. Here are some examples from the abuse filter logs:

  • User:JoannaEchevarria:
    I am 37 years old and my name is Edwin Westfall. I life in Frederiksberg C (Denmark).<br><br>My web blog ... [http://(redacted) fume hood for chemical laboratory]
  • User:AbelGuest29:
    Ηello! <br>My name іs Abel and I'm a 21 years old girl from Canada.<br><br>My blog post: here - [http://(redacted)/groups/tips-to-purchase-the-right-makita-drill-for-2020/ just click the following webpage]
  • User:MellisaBurroughs:
    My name is Mellisa аnd I am studying Design and Technology ɑnd Industrial ɑnd Labor Relations аt Immenschlag / Austria.<br><br>my site; [https://(redacted)/ Toronto airport limo]
  • User:GeorgiannaTxh:
    I'm a 43 years old, married and work at the university (Earth Sciences).<br>In my spare time I try to learn Chinese. I have been twicethere and look forward to go there anytime soon. I love to read, preferably on my kindle. I really love to watch NCIS and Family Guy as well as documentaries about anything scientific. I like Volleyball.
  • User:TerrieGlass:
    My name is Mirta (40 years old) and my hobbies are Darts and Roller skating.<br><br>Also visit my webpage ... [https://goo.gl/maps/(redacted) Locksmith-Sarasota-FL-USA -PH +1941(redacted phone number)]
  • User:CyrusMartins6:
    Hi there! :) My name is Cyrus, I'm a student studying Continuing Education and Summer Sessions from Decker Lake, Canada.<br><br>Feel free to surf to my web blog :: [https://scholar.google.com/citations?user=(redacted)&hl=en judi slot pulsa]

In case you're wondering: yes, there really is a user page at Google Scholar with nothing but links to Indonesian gambling sites.

This should give you an idea of what to look for, but there's lots more like this in the logs for Abuse filter 21- really bad AI written by someone with no English skills. Chuck Entz (talk) 08:10, 21 October 2020 (UTC)[reply]

Gonna do that for my Google Scholar. —AryamanA (मुझसे बात करेंयोगदान) 02:44, 27 October 2020 (UTC)[reply]

Should Middle Korean lemmas be lemmatized at isolated or connective forms?

[edit]

@Quadmix77, B2V22BHARAT, LoutK

Middle Korean orthography was generally surface realization-based. For instance, the Middle Korean word for "flower" was phonemically /kot͡s/. But because the word-final allomorph of /-t͡s/ was [-s], in isolation the word was typically written (Yale: kwos). When connected to a vowel-initial suffix, the actual phoneme resurfaced and was accordingly transcribed. So "flower-NOM" was written 고지 (Yale: kwo.c-i). The isolated form would be (Yale: kwos), and the connective and phonemic form (but rarely actually encountered in this specific form) would be (Yale: koch).

We don't have too many MK entries so far, but the lemmatization has been very ad hoc:

MK lemmatizing inconsistencies
English Isolated Connective/phonemic
autumn ᄀᆞᅀᆞᆯ (kozol) ᄀᆞᅀᆞᆶ (kozolh)
chicken's egg ᄃᆞᆯᄀᆡ알 (tolk-oy al) ᄃᆞᆯᄀᆡ앓 (tolk-oy alh)
to love ᄃᆞᆺ다 (tos-ta) ᄃᆞᇫ다 (toz-ta)
town ᄆᆞᅀᆞᆯ (mozol) ᄆᆞᅀᆞᆶ (mozolh)
object honorific marker ᄉᆞᆸ (-sop) ᅀᆞᇦ (-zoW)
pink color 졀ᄯᅡ빗 (cyelsta pis) 졀ᄯᅡ빛 (cyelsta pich)
sky-blue color 텬쳐ᇰ빗 (thyenchyeng pis) 텬쳐ᇰ빛 (thyenchyeng pich)

The last pair is particularly egregious because the morpheme involved is the same.

Where should the lemmatizations be at?--Karaeng Matoaya (talk) 10:41, 21 October 2020 (UTC)[reply]

For verbs, the situation is even more complicated. Korean verbs/adjectives are conventionally lemmatized, by all dictionaries, together with the suffix (da). 먹다 (meokda, “to eat”), 자다 (jada, “to sleep”), 희다 (huida, “to be white”), etc.
This works without major issue for Korean, but in Middle Korean, /-ta/ hides the actual phonemes of the verb stem. For example in Middle Korean, the stem of 곱다 (Yale: kwop-ta) was /koβ-/ which, according to the regular allomorphy rules governing /β/, surfaced before consonant-initial suffixes as [kop̚] and only before vowel-initial suffixes as [koβ].
So the phonemically faithful and linguistically valid lemmatization of 곱다 (Yale: kwop-ta) would be at 고ᇦ다 (Yale: kwoW-ta), except that 고ᇦ다 is illegal according to the surface realization-based Middle Korean orthography and could never have been attested in that form.--Karaeng Matoaya (talk) 10:51, 21 October 2020 (UTC)[reply]
And just a note to whoever's unaware that Modern Korean standardizes at the connective forms, not the isolated allomorphic forms. So the connective forms would theoretically be preferred if we wanted to stress the connections between Middle and Modern Korean.--Karaeng Matoaya (talk) 11:05, 21 October 2020 (UTC)[reply]
@Quadmix77, what is your motivation for the lemmatization of the last two pairs?
IMO, for nouns the phonemic form should be used (like modern Korean), and for verbs/adjectives the isolated form should be used (also like modern Korean, cf. words with irregular conjugation like 무겁다.) —Suzukaze-c (talk) 02:59, 22 October 2020 (UTC)[reply]
@Suzukaze-c, LoutK: The MK ancestors of Modern irregular verbs aren't really comparable to modern ones, however. 짓다 (jitda) is truly irregular now because there's no consistent phonological process by which /s/ and can alternate, and you can't derive all its conjugations just from the infinitive stem. But Middle Korean 짓다 (Yale: ciz-ta) was actually regular because the allomorphy of /z/ is a phonologically conditioned process, and you could have accurately predicted the realizations of all its conjugated forms just from the infinitive form 지ᅀᅥ (Yale: ciz-e). So it's actually more like e.g. Modern 젖다 (jeotda), which is a regular verb despite the phonologically conditioned allomorphy of /t͡ɕ/ between [t̚] and [d͡ʑ] and is accordingly written phonemically with in Modern Korean orthography.
I think we should aim for consistency here; if we lemmatize Middle Korean at 짓다, we should also lemmatize Middle Korean at 젓다 instead of 젖다, and at 둡다 instead of 둪다 (modern 덮다 (deopda, “to cover”)). So consistency with Modern Korean is lost either way.--Karaeng Matoaya (talk) 08:33, 22 October 2020 (UTC)[reply]

@Karaeng Matoaya: For regular words lemmatized without a suffix, It would be better to lemmatize in its phonemic form as seems like that's what dictionaries identify as the 옛말 (yenmal) (personally, I don't like this word, not the most useful and intuitive word for this IMO, sort of a vague description). For example, the 옛말 (yenmal) of 가을 (ga'eul, “autumn”) is identified as ᄀᆞᅀᆞᆶ (Yale: kozolh) rather than ᄀᆞᅀᆞᆯ (Yale: kozol) and it does reveal its actual phoneme with no information lost. For cases where the Isolated and Connective form spellings differ, it could be simply listed in the Alternative forms section as its isolated form.

Perhaps for the words that must be lemmatized with the suffix (Yale: -ta), we could make a compromise and lemmatize the legal form that comes before /-ta/ even if may not be phonemically faithful and linguistically valid. This way, we can stylistically conform with the standards of lemmatizing current Korean verbs/adjectives. The actual phoneme could be listed and identified in the appropriate sections (Alternative forms, Etymology, Pronunciation, etc.).

So the Middle Korean lemmatization of 곱다 (Yale: kwop-ta) would still be at 곱다 (Yale: kwop-ta) and the stem 고ᇦ (Yale: kwoW-) could be identified in the appropriate sections of 곱다 (Yale: kwop-ta) as already done in ᄃᆞᆺ다 (Yale: tos-ta).

But really, It is obvious that following lemmatization standards of Korean orthography of 분철 (分綴, buncheol) nature does not fit Middle Korean orthography of 연철 (連綴, yeoncheol) nature. It's not like the Korean word for "colour" (bit) has different spellings depending on its position, - (bich-) and (bit), or even (bit). It is always (bit), unlike Middle Korean's (Yale: pich) and (Yale: pis). This has already created inconsistencies in 졀ᄯᅡ빗 (Yale: cyelsta pis) and 텬쳐ᇰ빛 (Yale: thyenchyeng pich) where 졀ᄯᅡ빗 (Yale: cyelsta pis) would ideally be moved to 졀ᄯᅡ빛 (Yale: cyelsta pich) and the same problem with ᄀᆞᅀᆞᆯ (Yale: kozol).

Speaking of Middle Korean lemmatization, I was also wondering about your thoughts on which "version" of Sino-Korean words/readings should be lemmatized for the main Middle Korean entry. Although we don't have much MK entries and even less are of Sino-Korean origin, as Sino-Korean words are also well attested in MK texts, there should be a standard for it as well. For example, the entry ᄍᆞᆼ〮 (Yale: ccó) for the reading of derives from the Dongguk Jeongun from 1448 or an earlier record which was the standard — for the "correct pronunciation" (as opposed to the "incorrect" colloquial pronunciation) — spelling for the character of that early period.

But, I would rather have ᄌᆞ (Yale: co), which is the form recorded in Hunmong Jahoe from 1527, listed as the main entry, and ᄍᆞᆼ〮 (Yale: ccó) listed as its synonym or alternative form. Following this principle, the Middle Korean word for 중국 (中國, jungguk) as the main entry would not be the iconic 듀ᇰ귁〮 (Yale: tyungkwík) from Hunmin Jeongeum, but rather, It would be listed as a synonym or alternative form of 듀ᇰ국 (Yale: tyungkwuk). I think the version from Hunmong Jahoe and its direct descendents should be considered standard even if the Dongguk Jeongun form is attested earlier, as using this older form would create many problems. To start with, using (Yale: -) to represent a silent 종성 (終聲, jongseong, “final consonant”) as shown in ᄍᆞᆼ〮 (Yale: ccó). —LoutK (talk) 03:52, 22 October 2020 (UTC)[reply]

@LoutK, I wholly agree that the Hunmong Jahoe forms should be lemmatized.
I also think we should have standards for the inclusion of Sino-Korean words, because it seems distinctly unhelpful to include every glossed word that is attested in an inherently Sinicized 諺解 / 언해 text like Hunmin jeongeum. Rather, I think a decent criterion is being at least one of the following two:
  • Attested with a meaning different from that typical in Chinese: 즁ᄉᆡᇰ (Yale: cyungsoyng) for 衆生
  • Attested at least once in pure Hangul, not as an attached reading of a provided Chinese character: 차ᇰ (Yale: chang) for , (Yale: chyen) for
With that in mind, I'm not sure ᄍᆞᆼ〮 (Yale: ccó) should be here at all, especially when in the original text it's effectively a ruby gloss of !
What do you think?--Karaeng Matoaya (talk) 08:54, 22 October 2020 (UTC)[reply]
@Karaeng Matoaya: Yes, indeed. I think Hanja readings from Dongguk Jeongun or any "동국정운식 한자음" such as ᄍᆞᆼ〮 (Yale: ccó) should be restricted to the etymology sections of individual Modern Korean Hanja entries or listed as an alternative spelling of attested Hunmong Jahoe form Sino-Korean words and never actually be lemmatized.
Readings from Dongguk Jeongun do make great connections with the Middle Chinese reconstructions for etymological purposes (seems like Samun Seonghwi from 1751 also uses these to etymologically classify readings), and as it pertains to the Modern Korean "Hanja" entry, which does not guarantee to be an actual word but rather a 형태소 (形態素, hyeongtaeso) — therefore classified a "Hanja" and not a noun — they do rightfully belong there as Modern Hanja dictionaries also provide readings of a provided Hanja, basically doing the same thing.
In fact, individual Hanja readings of a provided Hanja from any 언해 (諺解, eonhae) (although I don't include these in Hanja etymology sections), 운서 (韻書, unseo), 자회 (字會, jahoe), and 유합 (類合, yuhap) or any historical Hanja Dictionaries should be restricted to etymology sections of individual "Hanja" entries (or perhaps a separate Middle Korean "Hanja" entry?) as it carries on in the same vein.
So, the criteria for attestation standards of Sino-Korean word in Middle Korean should be an attestation in pure Hangul like you said, as that verifies its existence in 언문 (諺文, eonmun) whether literary or in 입말 (immal) and not literary Chinese glossed in Middle Korean. This could be lemmatized using the form recorded in Hummong Jahoe or from other records with the same form where Middle Chinese /ȵ/ is still written as (Yale: z), as their forms have a consistent translation into Modern Korean except for a few exceptions. Any other spelling would be listed as an alternative.
For terms with a meaning different from that typical in Chinese should definitely be lemmatized as it wouldn't really belong anywhere else, especially if it didn't survive through Modern Korean, similar to some of your "Korean Classical Chinese" entries. This anomaly could be simply mentioned with an explanation in the etymology section.
Also, there should be a consensus on which Hanja form to use when glossing these Sino-Korean words. For example, for the Hanja form of (cheong, “blue”), the orthodox form in Modern Korean is which is the orthodox form found in the Kangxi dictionary, while Hunmong Jahoe lists this as , written before the creation of the Kangxi dictionary. There are already inconsistencies in some Hanja glosses in Modern Korean entries, and to name one.
The problem is that Hunmong jahoe lacks many common characters that might make up a big part of these attested words. So, it may be unreliable to use it as the standard "Hanja" form of Middle Korean. Also, one must consider the fact that different MK texts seem to have no standard "Hanja" form. So, it may be more intuitive and easier to use the orthodox forms of Modern Korean found in the (인명용 한자표 / 人名用 漢字表) even if it may be historically inaccurate.—LoutK (talk) 16:34, 22 October 2020 (UTC)[reply]

@LoutK, Suzukaze-c I have standardized all MK noun entries at the phonemic forms. I think we should still try to decide on what to do with the verb/adjective stems, and verbal suffixes. We have three options as I see it:

Option Pros Cons
Lemmatize at phonemic stems Consistent across verbs, consistent with treatment of nouns, distinguishes all minimal-pair stems Less familiar for modern readers, would lemmatize at unattested forms
Lemmatize at phonemic stems except for ㅸ (= ㅂ) and △ (= ㅅ) Familiar for modern readers Inconsistent, fails to distinguish some minimal pairs
Lemmatize at allophonic forms Consistent across verbs, would lemmatize at attested forms Less familiar for modern readers, inconsistent with treatment of nouns, fails to distinguish many minimal pairs

Thoughts?--Karaeng Matoaya (talk) 16:43, 23 October 2020 (UTC)[reply]

As an analogy from a very different but also synthesizing language, Navajo has entries for roots that are not attested. For examples, see stative verb łigai (to be white) and its root -GAII (white; hot), or active verb naalnish (to work; to function) and its root -NISH (to work).
That said, Navajo verb entries are still lemmatized at attested verb forms. One option for Middle Korean would be to lemmatize at attested forms, and also create entries for the underlying root forms, which could better illustrate phonemic shifts and related terms. ‑‑ Eiríkr Útlendi │Tala við mig 17:38, 23 October 2020 (UTC
@Eirikr Think I worded that wrong—I didn't mean lemmatizing at the bare stems. In all three options, we lemmatize at the grammatically possible -ta form. What's different is whether we write that form using the morphophonemic orthography of modern Hangul, or a phonetic orthography which was actually used in Middle Korean texts.
So option 1 (modern orthographic principles, phonemic precision) would have:
젖다 (Yale: cecta), 지ᇫ다 (Yale: cizta), 앗다 (Yale: asta) (modern 젖다, 짓다, 앗다)
Option 2 (illogical but reader-friendly):
젖다 (Yale: cecta), 짓다 (Yale: cista), 앗다 (Yale: asta)
Option 3 (fidelity to MK orthography rules):
젓다 (Yale: cesta), 짓다 (Yale: cista), 앗다 (Yale: asta)
Cheers,--Karaeng Matoaya (talk) 19:19, 23 October 2020 (UTC)[reply]
@Karaeng Matoaya: It seems like the NIKL and Korea University use Option 2 as a compromise to this problem.
NIKL identifies the origin of Modern Korean 젖다 (jeotda) as:
Middle Korean 젖다 (Yale: cecta), using the modern orthography.
NIKL identifies the origin of Modern Korean 짓다 (jitda) as:
Middle Korean 짓다 (Yale: cis-ta), using the MK orthography.
NIKL identifies the origin of Modern Korean 곱다 (gopda) as:
Middle Korean 곱다 (Yale: kwop-ta), using the MK orthography.
Using Option 2 would certainly make it easier to link attested MK words in etymology sections of Korean entries to the main MK entry, as one only needs to copy the MK word from their 어원 (語源, eowon) or 옛말 (yenmal) section.
So, ㅸ/△ stem words could be lemmatized using this principle without creating much confusion. However, the phonemic stem and irregularity should definitely be stressed in its appropriate sections to distinguish minimal pairs, for example, stressing the stem 고ᇦ (Yale: kwoW-) for 곱다 (Yale: kwop-ta) and the stem 지ᇫ (Yale: ciz-) for 짓다 (Yale: cis-ta).
But again, for consistency's sake, it would be neater to bring down all non-ㅸ/△ stem words to their MK orthographic form as well, as consistency with nouns is lost anyways. So Option 3 could be the way to go in the end.
It would seem like Option 1 should have solved all of this, but it doesn't seem to be a common practice for MK lemmatization in other dictionaries, and I wouldn't know how to feel if 고ᇦ다 (Yale: kwoW-ta) is lemmatized as the main entry.
Perhaps, the ultimate compromise would be creating the main MK entries using Option 3 for consistency with MK attested forms, and then creating a phonemic form of entry using Option 1. The ideal entry is well exemplified by ᄃᆞᆺ다 (Yale: tos-ta), but in this case, ᄃᆞᇫ다 (Yale: toz-ta) would be created separately to be its phonemic form of entry.
Adding the phonemic form of entries would also be a good way to "catch" non-ㅸ/△ stem words lemmatized in their MK attested form as the main entry, as many MK words listed in etymology sections of Korean entries use the standard of Option 2 as shown in 젖다 (jeotda).—LoutK (talk) 10:22, 24 October 2020 (UTC)[reply]
@LoutK: Your "ultimate compromise" seems like a very good solution. So far the only affected verb would seem to be 깇다 (kichta), which has been moved to 깃다 (kista) with a "phonemic form of" entry like you said. ᄃᆞᇫ다 (tozta) has also been created.--Karaeng Matoaya (talk) 10:39, 24 October 2020 (UTC)[reply]

Important: maintenance operation on October 27

[edit]

-- Trizek (WMF) (talk) 17:12, 21 October 2020 (UTC)[reply]

tables of contents

[edit]

I think it would be better if the only text in the tables of contents was the names of the languages. Jumping directly to different sections within a language entry saves a negligible amount of time, and the table of contents becomes something you have to scroll through for many short words. Dngweh2s (talk) 02:07, 22 October 2020 (UTC)[reply]

Have you looked at any entries like [[cat]]? The English section itself is very complex. We do have a good number of users who are only interested in the English section. DCDuring (talk) 02:32, 22 October 2020 (UTC)[reply]

Planning to rename 'terms derived from the PIE root FOO' -> 'terms derived from the Proto-Indo-European root FOO'

[edit]

@Erutuon, Victar I am planning to bot-rename "PIE" to "Proto-Indo-European" in categories like Category:Yola terms derived from the PIE root *h₂el- (grow). This is for consistency with other categories and to avoid using potentially unfamiliar abbreviations in category names. Benwing2 (talk) 03:14, 22 October 2020 (UTC)[reply]

I will fix up {{PIE root}}, Module:category tree/PIE root cat and Module:auto cat to generate and handle the new names. Benwing2 (talk) 03:15, 22 October 2020 (UTC)[reply]
@Benwing2: That's fine with me. I think {{PIE root|lang|root}} should also be replaced with the more versatile and standardly formatted {{root|lang|ine-pro|*root-}}. @Rua --{{victar|talk}} 04:19, 22 October 2020 (UTC)[reply]
@Victar I agree. I am also planning on replacing Module:category tree/PIE root cat and Module:category tree/root cat with a general poscatboiler handler to handle all 'terms derived from the LANG root FOO' categories. Benwing2 (talk) 04:22, 22 October 2020 (UTC)[reply]
@Rua Sorry, forgot to ping you. Benwing2 (talk) 04:23, 22 October 2020 (UTC)[reply]
@Benwing2: {{root}} is already built and ready to go. --{{victar|talk}} 04:25, 22 October 2020 (UTC)[reply]
@Victar It's all done. Benwing2 (talk) 18:49, 24 October 2020 (UTC)[reply]
@Benwing2, Mahagaja: Except for Proto-Brythonic Ἀργεντοκόξος, where the term is different from everything else we have in that language in being both attested and in a different script (the other attested ones are in the Latin script). It has a |sort= parameter that's causing a module error. I'm mentioning this because proto-language entries aren't typically in non-Latin scripts, so you may not have recognized that it was related. Chuck Entz (talk) 23:06, 24 October 2020 (UTC)[reply]
@Benwing2: Awesome. That's so much better. Thanks, Benwing! --{{victar|talk}} 23:26, 24 October 2020 (UTC)[reply]
@Chuck Entz I added support for sort= in {{root}}. Benwing2 (talk) 23:33, 24 October 2020 (UTC)[reply]
@Benwing2 Does {{PIE root see}} also work for 'Template:FOO root see'? Kutchkutch (talk) 11:22, 26 October 2020 (UTC)[reply]
@Kutchkutch I'll have to look into it. What other languages are you interested in? Benwing2 (talk) 12:49, 26 October 2020 (UTC)[reply]
@Benwing2: Thanks for looking into it. For words in Indo-Aryan languages, the Sanskrit root that a word is derived from may be given in Wiktionary entries (e.g. मेज्चे (mejce)) and in other dictionaries (e.g. {{R:CDIAL}}). So, it might be helpful to have the equivalent of {{PIE root see}} in Sanskrit root entries such as ज्ञा (jñā) without having go to the PIE ancestor *ǵneh₃-. Perhaps the same idea could be extended to other languages in CAT:Roots by language with descendants. Kutchkutch (talk) 17:15, 26 October 2020 (UTC)[reply]

Currencies

[edit]

Should we having entries like Indian rupee or Malaysian ringgit? For currencies that use dollar as their unit, we often allow the specific currencies, like Canadian dollar, Hong Kong dollar and US dollar. — justin(r)leung (t...) | c=› } 06:29, 22 October 2020 (UTC)[reply]

Personally I'm not thrilled about having Canadian dollar, Hong Kong dollar, and US dollar, unless they're needed as translation hubs. I'd be especially opposed to Malaysian ringgit since there is no other flavor of ringgit AFAIK. —Mahāgaja · talk 08:07, 24 October 2020 (UTC)[reply]

부아 archaic or dated — Edit warring

[edit]

Hi. I'm curious about you guys' opinion on whether 부아 (bua) is an archaic or dated Korean word for 'lung'.

Here are some of the recent internet articles using 부아 (bua) as a lung term.

    • 2003 March, 신지영, 우리말 소리의체계: 국어음운론 연구의 기초를 위하여 [The Sound System of Korean Language: For the Fundamentals of Korean phonology Research], 한국문화사, →ISBN:
      우리말의 경우는 언어학적 의미를 가진 모든 소리들이 하나도 예외 없이 폐에서 내뿜는 기류를 이용하여 만들어지는 부아 날숨 소리이다.
      Urimarui gyeong'uneun eoneohakjeok uimireul gajin modeun sorideuri hanado ye'oe eopsi pyeeseo naeppumneun giryureul iyonghayeo mandeureojineun bua nalsum soriida.
      In the case of Korean, all sounds with linguistic meanings are, without exception, pulmonic egressive sounds, made by utilising airflow coming out from the lungs.
    • 2013 August 2, Sung Ki-ji, 몸에 관한 토박이말 [A native word for the body.]‎[2]:
      예를 들어, 숨 쉬는 기관인 폐에 대해서도 우리말인 ‘허파’, 또는 ‘부아’가 아직 널리 쓰인다. 분한 마음이 울컥 솟아나는 것을 “부아가 치민다.”라고 하는데, 이는 부아 곧 폐가 부풀어 올라 가슴이 꽉 막히도록 화가 가득 찬 느낌을 표현한 말이다.
      Yereul deureo, sum swineun gigwanin pyee daehaeseodo urimarin ‘heopa’, ttoneun ‘bua’ga ajik neolli sseu'inda. Bunhan ma'eumi ulkeok sosananeun geoseul “buaga chiminda.”Rago haneunde, ineun bua got pyega bupureo olla gaseumi kkwak makhidorok hwaga gadeuk chan neukkimeul pyohyeonhan marida.
      For example, the Korean word '허파' or '부아' is still widely used for the lungs, which are respiratory organs. The swelling of resentment is said to be “a 부아 swelling.” This is a phrase that expresses a feeling full of anger in which the lungs swell up and the chest is completely blocked.
    • 2014 April 9, Song Dong-seok, [머니위크]한의사가 쓰는 生生건강법 [[Money Week] How to live healty written by a Korean Medicine Doctor.]‎[3]:
      부아는 무엇일까? 부아는 폐를 나타내는 순 우리말이다.
      Buaneun mueosilkka? buaneun pyereul natanaeneun sun urimarida.
      What is 부아? 부아 is a pure Korean word for lungs.
    • 2014 May 28, Lee Kwang-woo, 김해의 선거 … 보골채우는군요 [Elections in Gimhae ... You're provoking.]‎[4]:
      보골채운다'는 말을 아십니까? '보골'은 '허파'의 경남지역 방언입니다. 허파는 부아와 같은 말입니다.
      Bogolchae'unda neun mareul asimnikka? bogol eun heopa ui gyeongnamjiyeok bang'eonimnida. Heopaneun buawa gateun marimnida.
      Do you know the saying, '보골 채운다'? '보골' is the Gyeongnam dialect of '허파'. 허파 is the same word as "부아."
    • 2020 October 15, Kwon O-gil, [생물이야기] 폐의 건강 진단해주는 '폐활량'<1150> [[Biological Story] Lung capacity, which diagnoses the health of the lungs.]‎[5]:
      물론 운동을 하는 사람의 폐활량이 보통 사람보다 많고, 25세에 가장 폐활량이 제일 크다가 점점 줄어들어 60세 정도에 20~30% 감소하고, 늙을수록 점점 준다. 이렇게 부아도 늙는다.
      Mullon undong'eul haneun saramui pyehwallyang'i botong saramboda manko, 25see gajang pyehwallyang'i jeil keudaga jeomjeom jureodeureo 60se jeongdo'e 20~30% gamsohago, neulgeulsurok jeomjeom junda. Ireoke buado neungneunda.
      Of course, the lung capacity of the person who exercises is higher than that of the average person; lung capacity is the largest at the age of 25, and it then gradually decreases as a person ages, with 20-30% of lung capacity lost when one reaches 60 years of age. Like this, the lungs get old.
  • Obsolete: no longer in use; found only in very old texts. Examples: zyxt, yclept (although some dictionaries mark "yclept" as "archaic")
  • Archaic: no longer in general use, but still found in some contemporary texts (eg, the Bible). Examples: thou (in the sense of "you"), œconomy
  • Dated: still in use, but only by older people and considered old-fashioned by younger people. Examples: wireless, groovy, gramophone, gay (in the senses of "bright", "happy", etc)

What do you guys think? Please use Support or Oppose in the comment below. Thanks. B2V22BHARAT (talk) 12:44, 22 October 2020 (UTC)[reply]

Support I'm for Dated, because it's still in use, but mostly by older people and professional field and considered old-fashioned by younger people. Obviously, 부아 (bua) already has lung meaning, but it's just the fact that 허파 (heopa) and (pye) are more frequently used. In the phonetics, 부아 (bua) is utilized to mean lung. For example,

Therefore, 부아 (bua) as lung is not archaic term. It's just dated in favor of 허파 (heopa) and (pye). B2V22BHARAT (talk) 12:49, 22 October 2020 (UTC)[reply]

This was already covered in my version as {{lb|ko|archaic|_|except as translation of "pulmonic" in|_|phonology}}. Even that is being quite generous to such a small field.--Karaeng Matoaya (talk) 14:16, 22 October 2020 (UTC)[reply]

I think archaic is correct because 바른손 (bareunson) I never used it before. So Karaeng Matoava is correct. B2V22BHARAT (talk) 14:55, 22 October 2020 (UTC)[reply]


(this section was originally a separate section that was combined.)

@B2V22BHARAT

User:B2V22BHARAT and I have been edit warring at 부아 (bua) for about two days, the main issue being whether the etymological "lung" definition of this word is obsolete, normal, or anything in between. There have been about twenty reverts between the two of us over a fifty-hour period. I currently support "archaic" and B2V22BHARAT supports "dated".

What follows is my side of the dispute. The standard Korean dictionaries (the National Institute of the Korean Language dictionary and Korea University dictionary) do not use the same obsolete-archaic-dated-rare distinctions we have. The NIKL dictionary has 옛말 (yenmal, “old speech”), which in practice means all attested forms no longer included in the standard language (from fifteenth-century Middle Korean to early twentieth-century usages), and 예스럽게 (yeseureopge, “in an old fashion”), which runs the gamut from effectively obsolete usages like 노릇바치 (noreutbachi) to dated but generally understood forms like 갔니라 (gannira, “indeed, he went”). These dictionaries also often fail to mark words that any Korean would reasonably consider as at least dated as such, and the NIKL in particular is often criticized for prescriptive and conservative tendencies not reflecting the current state of the language.

In any case, the NIKL and the KU only mark "lung" as a "secondary definition" of 부아 (bua), but not as 예스럽게. This is despite the fact that I have personally never heard anyone use this word outside the "anger" meaning, and B2V22BHARAT seems to agree: "I honestly have never used 부아 (bua) as a word for lung when I graduated from middle and high school in Korea, but the dictionary says that 부아 (bua) is synonymous with pulmonary and therefore I only wrote it as it is".

So I asked B2V22BHARAT to find citations for the "lung" usage (not just mentions), and so far they have found only one, excluding a technical compound usage in phonetics. To go down the list of citations in the current entry 부아 (bua):

  1. 예를 들어, 숨 쉬는 기관인 폐에 대해서도 우리말인 ‘허파’, 또는 ‘부아’가 아직 널리 쓰인다. 분한 마음이 울컥 솟아나는 것을 “부아가 치민다.”라고 하는데, 이는 부아 곧 폐가 부풀어 올라 가슴이 꽉 막히도록 화가 가득 찬 느낌을 표현한 말이다.
    For example, with regards to the respiratory organs of the lung, the native Korean words hepa or bua are still widely used. People say bua-ga chiminda when they are struck by a sudden feeling of frustration; this means to be so angry that the bua (that is the lungs), swell up and stuff the chest.
    This is from an article published by a language purist society, seeking to demonstrate "native" (i.e. non-Chinese, although 부아 (bua) is clearly a loanword anyways) Korean words relating to internal organs. In any case, this is describing the etymology of definition 1 ("anger") and does not constitute usage as "lung". The fact that this needs to be explained to the audience at all already shows that bua is not generally understood by Koreans.
  2. 부아는 무엇일까? 부아는 폐를 나타내는 순 우리말이다.
    What is bua? Bua is a native Korean word referring to the lungs.
    Not a usage. Again, the fact that this needs to be explained to the audience shows that bua is archaic.
  3. '보골채운다'는 말을 아십니까? '보골'은 '허파'의 경남지역 방언입니다. 허파는 '부아'와 같은 말입니다. 부아는 '노엽거나 분한 마음'이란 뜻입니다. 따라서 '보골채운다'는 '화를 돋운다'의 방언인 셈입니다.
    Do you know the expression bogol chaeunda? Bogol is South Gyeongsang dialect for heopa. Heopa is the same thing as bua. Bua means "an angry or frustrated mental state." Thus, bogol chaeunda is dialect for "it makes [me] angry".
    B2V22BHARAT has added this as a citation for the "lung" meaning (excising the very sentence where the writer defines bua from the quotation provided), but clearly the full quote does not support the "lung" definition.
  4. 이렇게 부아도 늙는다.
    Like this, the lungs too get old.
    This is a genuine usage, which is why I conceded that bua is not "obsolete" as I originally said and amended my position to "archaic".

In fact, I would say that the fact that B2V22BHARAT could find only one non-mention usage of bua as "lung" despite what appears to be not little effort shows in itself that this term is at least archaic and not generally used by Koreans.--Karaeng Matoaya (talk) 13:01, 22 October 2020 (UTC)[reply]

I will note here that the first few pages of hits on Google and Naver for 부아가 (buaga), 부아는 (buaneun), 부아를 (buareul), 부아도 (buado) (the postpositions here differentiating real usage from compounds) are:
  • 부아가 (buaga): Google is all about the "anger" definition except this article, in which a quiz show asked what part of the body bua referred to and people began looking it up. Again showing that this isn't generally used. Naver is all about the "anger" definition, or a Thai restaurant which shut down two months ago.
  • 부아는 (buaneun): Google is mostly articles explaining the etymology of this word, often in connection to the quiz show. Apparently someone thought bua was the diaphragm. Naver is split between the etymology, and the Thai restaurant.
  • 부아를 (buareul): Mostly the anger definition, but there are some results on both Google and Naver talking about the restaurant as well. Nothing about the lung in the first few pages outside another etymology article.
  • 부아도 (buado): Apparently homophonous for a small islet with a magnolia forest, but there are some results relating to the "anger" definition.
"Dated" is too generous for the word.--Karaeng Matoaya (talk) 13:31, 22 October 2020 (UTC)[reply]
And north of the 38, this book by Kim In-ho, a leading North Korean linguist of the late twentieth century, treats the "lung" meaning as an entirely obsolete etymological curiosity.--Karaeng Matoaya (talk) 13:38, 22 October 2020 (UTC)[reply]
The matter seems to be settled (?), but FWIW I support Karaeng Matoaya. —Suzukaze-c (talk) 01:27, 23 October 2020 (UTC)[reply]

Further disputes

[edit]

@B2V22BHARAT

Subsequent to this dispute, B2V22BHARAT added the following usage note to the entry for the Korean word 염통 (yeomtong, “heart meat”):

염통 (yeomtong), a native korean word for 'heart' is rarely used in isolation, especially outside linguistic works, but is generally encountered in idomatic expressions as the heart of animals.

As they themselves—a native speaker—must clearly be aware, 염통 is not restricted to idiomatic usage (and is obviously never encountered in linguistic works). It is simply a very common Korean word for "heart as culinary ingredient". For random examples off the first page of Google hits, see this article about chicken heart skewers or this online store selling 300 grams of Australian beef heart or this YouTube video about beef heart.

This usage note is actually a passive-aggressive reference to the usage note I added to 부아 (bua, “(idiomatic) anger”), which is actually true:

부아 (bua) is rarely used in isolation, especially outside linguistic works, but is generally encountered in idomatic expressions as the subject of verbs having to do with a burst of emotion such as 치밀다 (chimilda), 돋다 (dotda), 내다 (naeda).

(They have since amended their usage note to "rarely used in isolation" and removed the most conspicuously passive-aggressive parts, but the fact remains that "rarely used in isolation" is also false and the entire usage note is superfluous.)

This addition of a clearly false usage note was accompanied by the edit summary "I'll be watching your edit from now on", which is frankly quite ominous.

I reverted B2V22BHARAT's edit with the following edit summary:

this level of passive aggressiveness is disruptive to the project and unbecoming of you. 염통 as "heart meat" is certainly not restricted to idiomatic constructions and the note is clearly a reference to the 부아 dispute. if you make similar edits of this sort i will bring this to beer parlor again.

To which B2V22BHARAT restored the false statement with the following edit summary:

I don't know why you're angry. Wiktionary is not your own space. The difference of opinion between you and me is natural. Because we're all human. Don't think of editing here by yourself. You and I are in a competition. I will supervise you from now on and so will you. And if you have a problem, talk to Beer Parlor. Don't threaten me.

If B2V22BHARAT is a native speaker, which I believe they are, they are well-aware that the widespread usage of 염통 (yeomtong, “heart meat”) in non-idiomatic contexts is not a "difference of opinion" but a difference of fact verifiable by a single visit to a butcher's shop, and "You and I are in a competition. I will supervise you from now on and so will you" is far from a productive attitude for a collaborative project.

I have no desire to enter an edit war again and would like to request arbitration of some sort.--Karaeng Matoaya (talk) 17:11, 23 October 2020 (UTC)[reply]

B2V22BHARAT's aggressive attitude ("Your 뇌피셜 is over. You'll either get banned from Wiktionary. Get out of here.") and slavish hewing to Naver is inappropriate. —Suzukaze-c (talk) 08:03, 24 October 2020 (UTC)[reply]
@B2V22BHARAT, Karaeng Matoaya Wiktionary doesn't really have a formal arbitration process; in cases like this either a compromise is worked out, or one or both sides will end up blocked for awhile. Edit warring is not good on the part of either user, and just ends up inflaming the other. This happened once, for example, with two admins, who both ended up de-sysoped for awhile. I have had some positive interactions with User:Karaeng Matoaya and haven't interacted personally with User:B2V22BHARAT, and it looks to me like User:B2V22BHARAT's attitude is plainly inappropriate, but I can't verify the correctness of either editor's Korean contributions, and if both have been edit-warring, this reflects badly on the part of both of them. I would second User:Atitarev's suggestion that both editors refrain from touching each other's edits for awhile, until things have cooled off a bit. Benwing2 (talk) 18:27, 24 October 2020 (UTC)[reply]
@Benwing2 My impression is that B2V22BHARAT is following me around as they said they would do in their edit summary. I do assume they're mostly acting in good faith, but to recap the last few edit wars/confrontations:
  • [6]: Entry I created yesterday; they inserted plagiarized definitions instead of my definitions and have since been reverted by someone else.
  • [7]: Entry I created yesterday; they inserted definitions which I think are also partly plagiarized (identical order of translations), but it's not harmful.
  • [8]: Entry created by Atitarev with inexact definition (see Talk:서사시 for details). I changed it to a more exact definition—I'll admit that I shouldn't have deleted the "epic" definition entirely—and B2V22BHARAT reverted four hours later. An edit war ensued, with them making translations of cited quotations that suggest to me that they don't fully understand the connotations of epic in English. I eventually rewrote the entry to a compromise solution.
  • [9]: Entry I created yesterday. The Helpful Mystery IP that adds lots of Chinese medical terms added an etymology after a few hours. I wasn't super sure about the etymology and would have raised an issue about it on the talk page, but five hours later B2V22BHARAT changed the etymology to one they cited from an unreliable source, a culture section of a local magazine. They did not seem to understand why this is an unreliable source. As this was part of the flurry of reverts of my edits listed above, I just reverted back to the no-etymology version instead of bringing it to the talk page (as I would have if B2V22BHARAT's edit hadn't been part of this wider trend). An edit war ensued. The etymology issue was resolved with the help of the Helpful Mystery IP in Talk:아니꼽다, who actually provided academic sources for the etymology.
  • [10]: I fixed the etymology template here, and after half an hour B2V22BHARAT came to "fix" something that didn't need to be fixed; he changed the target of the Chinese source to an alternative form instead of the form we lemmatize at. Again, this wasn't actively harmful, so I let it be.
  • [11]: This entry is linked from 비단, another entry where I fixed the etymology template, which I assume is how they found this very obscure entry. They made a misinformed edit which I fixed.
  • [12]: I made a rash edit which they were right to revert, but when I fixed my own edit they began insisting on having 피리 as the target of the soft redirect, when the standard Korean dictionaries all give 필률 instead. This edit suggests that they don't understand how the Naver Korean dictionary is actually formatted. In the middle of this edit war I switched to making edit comments in Korean because I assumed communication would be easier that way. This turned out not to be the case, with them leaving quite rude edit summaries. The edit war ended with a compromise solution.
  • [13]: While not following me per se, B2V22BHARAT announced to me in the Korean edit summary at the end of the above edit war that he would be reverting the etymology here, so it's still intentionally seeking out confrontation. The etymology section links to an Old Korean reconstruction which provides the source for the reconstruction, but they claimed "no source(Reference)", apparently unaware of this fact. When I ported the sources from the reconstruction because he asked me to, they reverted me, and when I restored the sources they asked for, they reverted me again with a condescending edit summary. I have currently restored the sources and the correct etymology.
These are all the edit wars/confrontations since the past fifteen hours or so, and IMO all of them can be traced back to B2V22BHARAT following me around. Perhaps there will be more tomorrow.--Karaeng Matoaya (talk) 19:58, 24 October 2020 (UTC)[reply]

Number and definiteness on (English) determiners

[edit]

I would like to mark the English determiners for definiteness, number, and countability. Determiners like the and that mark NPs as definite. Most other determiners, including the numbers, marke NPs as indefinite. This would be useful information to include. Further more determiners are picky about the number and countability of the NPs they determine, as seen in the following table.

Number: determines only
sing count sing non-count plur count plural non-count
button furniture buttons police
a x
both x
much x
many x x
all x x x
the x x x x

What would be a good way (or ways) to include this information?--Brett (talk) 19:49, 22 October 2020 (UTC)[reply]

Arguably it would best fit in one or more of the (non-gloss) definitions of each determiner and/or labels associated with them. If that doesn't seem natural, usage notes are the other possibility. I don't think we need feel compelled to be consistent across entries because there aren't enough of these for users to form specific expectations for how the grammar is presented. DCDuring (talk) 16:26, 23 October 2020 (UTC)[reply]
How about this?--Brett (talk) 18:48, 23 October 2020 (UTC)[reply]
But it has to fit with the existing definitions. In the case of the, for example, does the phenomenon apply to all 10 existing definitions in Etymology 1 or just some of them? If it applies to several of them we may be forced to put all the information under Usage notes. Some of the 10 definitions already seem to have number/(un)countability in the definition, sometimes explicitly. DCDuring (talk) 00:32, 24 October 2020 (UTC)[reply]
In the case of the, definiteness would apply to all 10, but countability/number is limited by some of the definitions. For this, there is an indefinite use, but number and countability are constant. For other determiners, there's more contextual variability. So, usage notes?--Brett (talk) 01:48, 24 October 2020 (UTC)[reply]
Maybe so. If others don't like it, they will probably weigh in when they notice this discussion. DCDuring (talk) 17:35, 24 October 2020 (UTC)[reply]
Usage notes are probably the way to go. Something like the above table could potentially be stuck into a template and inserted into the usage notes section of all the determiners in question. (Note however, that all going with singular non-count nouns like furniture is allowed only in limited circumstances, e.g. all furniture is man-made; usually it would be all the furniture.) Benwing2 (talk) 18:39, 24 October 2020 (UTC)[reply]

Derivations from multi-word terms in non-Latin alphabets

[edit]

Etymology for borage ends with

Arabic أَبُو العَرَق (ʔabū l-ʕaraq, literally father of sweat)

What is the right way to make this link to the two Arabic lemmas أب and عَرِقَ (ʕariqa) while retaining transliteration and translation all together? Without transliteration I can do

(on Bahr al Ghazal) From Arabic الغزال بحر, literally "gazelle river"

but it would be better to have automatically generated transliteration. And the less I have to interact with Arabic script the fewer mistakes I will make so any template wizards are encouraged to make {{der}} magically do the right thing with a spaced phrase. Vox Sciurorum (talk) 18:54, 23 October 2020 (UTC)[reply]

@Vox Sciurorum: Arabic أَبُو العَرَق (ʔabū al-ʕaraq, literally father of sweat)?--Karaeng Matoaya (talk) 19:10, 23 October 2020 (UTC)[reply]

Re: Chinese "idiom" POS

[edit]

@恨国党非蠢即坏 has been suggesting that we get rid of "idiom" as a part of speech in Chinese entries, as in Wiktionary:Beer parlour/2020/June#Chinese "idiom" POS, Wiktionary:Beer parlour/2020/March#Part of speech "idiom" and Wiktionary:RFDO#Template:zh-idiom. They have been creating chengyu entries with headers other than "idiom", which does not follow current norms for chengyu (and other kinds of idioms). It is true that it would be useful to have some more information as to how chengyu can be used within a sentence, and we are indeed losing this grammatical information. However, IIRC, the argument for not classifying chengyu as verb, noun, adjective or adverb is that chengyu can sometimes be hard to classify. (Notifying Atitarev, Tooironic, Suzukaze-c, Mar vin kaiser, Geographyinitiative, RcAlex36, The dog2, Frigoris, 沈澄心, 恨国党非蠢即坏, Michael Ly): — justin(r)leung (t...) | c=› } 05:03, 24 October 2020 (UTC)[reply]

IMO those chengyu hard to classify are in the middle of the continuum from verbs to adjectives. There is no such problem for noun chengyu. 恨国党非蠢即坏 (talk) 05:13, 24 October 2020 (UTC)[reply]
My understanding is the chengyu is a special type of idiom in that it must be 4 characters long. But there are other kinds of Chinese idioms too. If you go by the English sense of the term "idiom", yanyu and xiehouyu can be considered idioms too. You can always use multiple classifications, I presume. In fact, I'd advocate for adding yanyu and xiehouyu to the POS marker, where you can also indicate verb, noun, etc. The dog2 (talk) 05:18, 24 October 2020 (UTC)[reply]
In case anyone doesn't know, the category cat:Chinese chengyu or cat:Chinese idioms can be added by writing
|cat=n,cy
|cat=n,id

in {{zh-pron}}. So you need not change the POS to give those categories. 恨国党非蠢即坏 (talk) 06:01, 24 October 2020 (UTC)[reply]

If we do change the Idiom header to Noun, Verb, etc., it would not be sufficient just to have the "idiom" info in {{zh-pron}}, which only does categorization. We should probably label it using {{lb|zh|chengyu}}, {{lb|zh|xiehouyu}}, etc. (or with {{tlb}}) to make it more obvious and useful to readers. — justin(r)leung (t...) | c=› } 06:07, 24 October 2020 (UTC)[reply]
@Tooironic: E.g. 乱臣贼子, 镜花水月. 恨国党非蠢即坏 (talk) 15:13, 24 October 2020 (UTC)[reply]
Choosing the right PoS for a Chinese entry is sometimes a problem, so maybe we should have more examples and proposed handling, e.g. in this format
  1. (Example1) - header: verb/adjective/noun/yanyu (new)/xiehouyu (new); label: (none)/{{lb|xiehouyu}}/{{lb|chengyu}}.
  2. (Example2) - header: verb/adjective/noun/yanyu (new)/xiehouyu (new); label: (none)/{{lb|xiehouyu}}/{{lb|chengyu}}.
Everybody could put their support/oppose against examples. It's hard to decide without concrete examples. New PoS can also be added, if they are really justified and supported by the majority. What do you think? --Anatoli T. (обсудить/вклад) 07:30, 24 October 2020 (UTC)[reply]
As I have stated above. The only problem is that some chengyu lie in the fuzzy area between verbs and adjectives. I think it is completely acceptable to just call them "verbs" or "adjectives". Or else we will need to invent a method to strictly determine whether a Chinese word is a "verb" or an "adjective". 恨国党非蠢即坏 (talk) 15:13, 24 October 2020 (UTC)[reply]
As for those like yanyu, xiehouyu. I suggest to put them like this: {{head|zh|proverb}} {{tlb|zh|yanyu}}. 恨国党非蠢即坏 (talk) 15:19, 24 October 2020 (UTC)[reply]

Oh, I have forgotten my examples. If this is what you have meant:

  1. 罪魁祸首 - header: noun; term-label: chengyu
  2. 信口开河 - header: verb; term-label: chengyu
  3. 美轮美奂 - header: adjective; term-label: chengyu
  4. 猪八戒照镜子——里外不是人 - header: proverb; term-label: xiehouyu
  5. 树活一张皮,人活一口气 - header: proverb; term-label: yanyu
恨国党非蠢即坏 (talk) 16:01, 24 October 2020 (UTC)[reply]
Even the classifications "chengyu", "xiehouyu", "suyu", "yanyu", etc. are not easily defined, let alone any finer distinctions. There is little need in classifying them further, especially when you are essentially putting European concepts of part of speech onto the Chinese language. The current system should remain as it is. ---> Tooironic (talk) 22:06, 24 October 2020 (UTC)[reply]
Oh. This is the first time I've heard that nouns, verbs, adjectives, etc. are "European concepts" and do not exist in the Chinese language. 恨国党非蠢即坏 (talk) 04:19, 25 October 2020 (UTC)[reply]
Of course, part of speech is literally a European concept. I am all for using it for the ease of Wiktionary users, but we need to understand its limitations when imposing it on Chinese — for example, most verbs can act as nouns in Chinese, many adjectives can be used as adverbs, what is often considered a noun in European languages is considered a pronoun in Chinese, etc. As regards chengyu, what you describe above is discussed natively in Chinese as usage (用法), not part of speech, e.g. 聯合式 chengyu, that can 作主語, 作賓語, etc. ---> Tooironic (talk) 01:52, 1 November 2020 (UTC)[reply]
Thanks for providing examples. Your approach sounds good and we may do without adding new PoS. We got by so far. Parts of speech are not necessarily a "European concept" but it's hard to fit Chinese (and some other languages) into it. They make sense with a sentence (they behave differently and have a specific position) but not necessarily when defining individual words, especially single character words where they usually have a broad sense. That's why most Chinese dictionary do not bother adding part of speech labels onto Chinese words but may add them to the other language for clarity. --Anatoli T. (обсудить/вклад) 04:30, 25 October 2020 (UTC)[reply]
@Atitarev: That's because most Chinese single character words are affixes (or morphemes), not words. Wiktionary has been doing a very poor job in differentiating Chinese single character "affixes" and "words", which was partly the result of a unified Chinese L2. 恨国党非蠢即坏 (talk) 04:40, 25 October 2020 (UTC)[reply]
AFAIK, many Chinese dictionaries also do not distinguish between single character bound morphemes and free morphemes. RcAlex36 (talk) 05:15, 25 October 2020 (UTC)[reply]
@恨国党非蠢即坏: Oh, I get it. It's again someone else's fault if something is missing. You have apparently been poisoned by some newcomers' negative rhetoric who do little more than complaining about this dictionary and its contributors. I dare you to find a published dictionary - paper or electronic, which is able to provide the variety of information about Chinese terms but I am out of this discussion for now. --Anatoli T. (обсудить/вклад) 06:01, 25 October 2020 (UTC)[reply]
@Atitarev: I am kind of surprised because I seem to have offended you for some reason I don't know. Nor do I know who the "newcomers" you are talking about are or what trouble you have had with them. Are you implying I am one of those "good-for-nothing newcomers"? My opinions are based on my own thoughts, not that of others. 恨国党非蠢即坏 (talk) 06:27, 25 October 2020 (UTC)[reply]
@Atitarev: I think your reaction is a little bit unwarranted. I don't think @恨国党非蠢即坏 is trying to bring up that issue again. There are just things that are admittedly more difficult with a unified Chinese L2, but the benefits definitely outweigh these little things that can be fixed with labelling and better formatting. The issue of bound vs. free morphemes is not always clear-cut; take 鴨 for example. In Mandarin, this word could be free in formal contexts, but in everyday usage, it's mostly bound, with 鴨子 being the most common way of referring to the animal. It also depends on which variety of Mandarin we're looking at. So the matter doesn't go away with separate L2 headers unless we separate every register of every dialect, which is absurd. — justin(r)leung (t...) | c=› } 06:38, 25 October 2020 (UTC)[reply]
@恨国党非蠢即坏, Justinrleung: I may have overreacted but I dislike this "very poor job" comments, which are not true. If anything is wrong/missing, go and add/fix it. No need to blame Wiktionary. What you do is what you get. --Anatoli T. (обсудить/вклад) 08:08, 25 October 2020 (UTC)[reply]
@Justinrleung: I am not a big fan of SOP. My main concern is about the inconsistency of giving POS to some Chinese words but not to the others. 恨国党非蠢即坏 (talk) 18:45, 26 October 2020 (UTC)[reply]
@恨国党非蠢即坏: Did you mean you're not a big fan of POS? I'm a little confused. — justin(r)leung (t...) | c=› } 04:19, 1 November 2020 (UTC)[reply]
@Justinrleung: Stripping all Chinese words of POS is OK to me. Either all Chinese words have POS or none of them has any POS. No such "four characters good, two characters bad" inconsistency. 恨国党非蠢即坏 (talk) 04:55, 1 November 2020 (UTC)[reply]
@Justinrleung: Sorry, only just now did I notice I misspelled it as "SOP". 恨国党非蠢即坏 (talk) 09:00, 1 November 2020 (UTC)[reply]

Who are the contributors to the Wiktionary?

[edit]

Hello, since several months, it is possible to find out from which countries contributions are made to the Wikipedia encyclopaedia. The Wikimedia Foundation's development team is currently working to make this data available directly from stats.wikimedia.org. To understand what I am talking about in a concrete way, you can consult table 6 of this page where the number of contributors by country who have added content to the French Wikipedia is reported.

Recently, I proposed that the same data be made available for all projects other than Wikipedia. The ticket has been closed for (legitimate) privacy reasons; each request should be made on a project-by-project basis so that privacy issues are evaluated taking into account the potential benefits of having such data available.

Before we open a new ticket, I would like to know what you think about this and what possible benefits/uses could be derived from such data if they were available to the Wiktionnaire. To me, I had in mind above all to have an idea of where the contributions come from throughout the English-speaking world. This would give a numerical idea of the English-speaking areas of the world that are under-represented on the Wiktionary because of the small number of contributions. Pamputt (talk) 05:58, 24 October 2020 (UTC)[reply]

I don't want people to know where I'm editing from. My plan is to go through my wiki-life completely anonymously. Candle-holding servant (talk) 22:19, 25 October 2020 (UTC)[reply]
If you look at the table given as an exemple in my message above, you will see that the user name neither the precise counts when there are few contributors are given. Pamputt (talk) 06:44, 26 October 2020 (UTC)[reply]

I have created T266643. Feel free to comment if needed. Pamputt (talk) 09:19, 28 October 2020 (UTC)[reply]

Sounds useful, don't see what privacy issues there could be (if it's country level). Maybe if you live on a small island. @Pamputt: the link "table 6 of this page" does not work. – Jberkel 11:14, 28 October 2020 (UTC)[reply]
the link has been fixed. Pamputt (talk) 11:29, 28 October 2020 (UTC)[reply]
Thanks. Skimming through the tickets, the privacy risk seems to be exposing edit patterns from countries with censorship or other restrictions, which have to be filtered from the dataset. – Jberkel 15:35, 28 October 2020 (UTC)[reply]
Indeed, but I think the solution found for Wikipedia data can be applied to the Wiktionary data; I do not think there is something Wiktionary specific compared to Wikipedia for this topic. Pamputt (talk) 14:41, 29 October 2020 (UTC)[reply]

Vandalism in Korean entry

[edit]

A user named Karaeng Matoaya keeps reverting Korean edits that do not comform to Standard Korean dictinonary and brings his own headcanon ideas such as Jurchen language and connect it to the Korean words. Even if I disclose the source, Karaeng Matoaya keeps reverting my edit with sample biased examples that are obviously not the representative examples of that word's particular meaning and usage in Korean Dictionary. I don't know what to do about this user. He doesn't like to admit his mistakes and keep pusing his idea, which is detrimental to Wiktionary's collaborative project. B2V22BHARAT (talk) 07:49, 24 October 2020 (UTC)[reply]

@B2V22BHARAT: We have no obligation to submit to other dictionaries. —Suzukaze-c (talk) 07:54, 24 October 2020 (UTC)[reply]
@Suzukaze-c: Then what about his canon idea, such as Jurchen under the etymology of Korean? At least, I'm bringing my ideas backed up by scholars in 국립국어원 and Standard Korean Dictionary and internet articles written by renowned writers. We don't know who the Karaeng Matoaya is and I can't trust him. His edits so far have been disruptive and untrue. If I had not engaged in this matter, 부아 had been obsolete term, and the pronunciation header would have been dialectal instead of standardized one, etc. B2V22BHARAT (talk) 08:06, 24 October 2020 (UTC)[reply]
What Karaeng Matoaya is doing is all speculative, with no reference to Formal Korean Dictionary or historical etymology of Korean words. He's just here trying to create his own etymology built out of his head. Are you really trying to let this guy edit Korean etymology? Of course, some of his sayings are true. But most of them are not formal and official. Very dangerous and disruptive. B2V22BHARAT (talk) 08:13, 24 October 2020 (UTC)[reply]
If Karaeng Matoaya is writing based on reference, I'll admit it. But he doesn't cite the reference!! At all!!! How can you trust him?????? B2V22BHARAT (talk) 08:30, 24 October 2020 (UTC)[reply]
@Karaeng Matoaya, B2V22BHARAT: Hello. There's some nasty edit-warring happening between you two here and there. I don't know enough Korean to decide who is right and who is wrong and would take me some time to check all the {{diff}}'s but can we take a break here and stay from each other's edits if possible? There's plenty to do here without killing each other. --Anatoli T. (обсудить/вклад) 08:57, 24 October 2020 (UTC)[reply]
That's disruptive editing, not vandalism. Vandalism is always in bad faith. Glades12 (talk) 13:48, 24 October 2020 (UTC)[reply]

Disruptive behavior from User:B2V22BHARAT

[edit]

Sorry to clog up the beer parlor again, but this has really been tiring me out.

For previous discussion on this, see #부아 archaic or dated — Edit warring, #Further disputes.

Subsequent to his declaration that "you and I are in a competition. I will supervise you from now on and so will you", User:B2V22BHARAT has continued to make antagonistic and disruptive edits of the sort we already saw on 염통 (yeomtong), often accompanied by hostile edit summaries.

  • In 아니꼽다 (anikkopda), he added an unfounded etymology citing the culture section of a local newspaper, which is clearly not a valid linguistic source. This etymology is incompatible with Middle Korean allomorphy, as the final /h/ of (Yale: anh) should resurface when followed by /i/, and also involves the appearance of an unexplained /s/. I removed this etymology, saying as much in the edit summary, and B2V22BHARAT reverted me with a quite rude entry. This to me shows that B2V22BHARAT is totally incapable of dealing with Korean etymology sections, as the allomorphic behavior of h-final nouns is something you learn in the first month if you're learning Middle Korean. (Update: B2V22BHARAT is now edit warring over the etymology and continues to cite culture sections in various local dailies, claiming support from the National Institute of the Korean Language. I can find nothing about such an etymology for this word in the NIKL's dictionary or Q&A section, or in opendict.korean.go.kr.)
  • In 화사하다 (hwasahada), he changed the definitions to ones ported straight from Naver's Korean-English dictionary. My choice of definitions for that entry was intentional, both to avoid potential plagiarism by copying the Naver dictionary and because (if I can flatter myself) I think I have a good grasp of English connotations and "gorgeous" is not the best fit for 화사하다 (hwasahada). Despite being self-admittedly not good in English, B2V22BHARAT has reverted to the Naver definitions apparently out of the desire to revert me.
  • In 서사시 (seosasi), I provided an explicit academic citation of how this word is used in Korean literary studies: as "narrative poetry" which may or may not be of an epic scale. Naver translates this as "epic poetry", but this is based on Aristotle's division in the Poetics of all poetry into epic, lyric, and verse drama, not how the word is actually used today in English with its heroic or divine overtones. B2V22BHARAT is reverting this for some unknown reason.

In all three entries (perhaps with the exception of 화사하다, but definitely in the case of 아니꼽다 (anikkopda) and 서사시 (seosasi)) B2V22BHARAT's edits have been disruptive, and given the edit summaries I see no reason to believe this is being done in good faith.--Karaeng Matoaya (talk) 08:01, 24 October 2020 (UTC)[reply]

@B2V22BHARAT
And kindly refrain from plagiarizing definitions from Naver:

In the last example, a noun is being translated as a verb.--Karaeng Matoaya (talk) 13:00, 24 October 2020 (UTC)[reply]

@Karaeng Matoaya If you think that I made a mistake, then go on and change it. Wiktionary is a collaborative project and not your diary. Don't tell on others about it. Also, it is not plagiarism to simply bring the meaning of the English word.
@B2V22BHARAT You did not "simply bring the meaning of the English word", you even kept the order the Naver definitions are presented in. In any case I cannot "go on and change it" because you will in all likelihood edit-war me.--Karaeng Matoaya (talk) 13:15, 24 October 2020 (UTC)[reply]
@Karaeng Matoaya It's just the English translation of a word, not plagiarism!! And like I said, if you present appropriate reference, I don't do edit-war!! How many times do I have to say it!! You never present your sources. You only show it after I revert it. Why?? You want to brag what you studied about? B2V22BHARAT (talk) 13:17, 24 October 2020 (UTC)[reply]

Automatically opened sections

[edit]

For mobile users, can we add an option in Special:Preferences (or something like that) to keep the sections hidden whenever a page is loaded rather than automatically showing them all? In longer pages (both entries and forums), I find it extremely irritating to always have to choose between the following:

  • Adding "#[relevant section]" (with underscores, because my browser is stupid and thinks I want to search for something otherwise) to the URL
  • Closing all headers one at a time while slowly scrolling down to find the language or topic I'm looking for. Glades12 (talk) 13:42, 24 October 2020 (UTC)[reply]
I also get annoyed by really long pages. Chinese words were especially annoying on mobile because I had to wait for many pages of images of ancient bone script forms I don't care about to load. (Maybe that's fixed now?) And the forums are still difficult. Vox Sciurorum (talk) 13:35, 27 October 2020 (UTC)[reply]
Just to note, though it may be obvious, that merely keeping sections hidden may still entail their being downloaded. Mihia (talk) 17:50, 27 October 2020 (UTC)[reply]

Add Swedish plural conjugation to the verb template, with a disclaimer that they are archaic.

[edit]

I quote: "Wiktionary describes usage, it does not prescribe nor proscribe it, and adheres only to its criteria for inclusion, which state that any term or meaning that can be shown to be in sufficiently widespread use may be included. By including or not including a certain term, it by no means accepts or attempts to promote a certain point of view, but is simply documenting, explaining what is or was in use in English or any other language."

These forms are obviously attested; I can find books written in the 1940s that still use them. Modern Swedish is defined as starting in 1523, and so these forms were used for 420 years of this 500 year period. Hence they are a part of the standard language, and although archaic should still be listed in verb templates. A good equivalent to this would be the German genitive case; although it is not used in regular speech, it is still included in the declension tables of nouns.

Currently, the plural forms have to be manually added; for a good example of this see the page gingo, and the basic verb page . Having them as regular parts of the verb template would streamline this process a lot. I especially want to point out the past plurals of strong verbs, like "hinna". The past tense singular of this verb is "hann", but the plural is "hunno". This follows a regular pattern and so could easily be added into the template; it uses the same vowel as the supine, which is "hunnit".

I believe that including these forms would help readers of older Swedish literature, since they occur in most books before 1940, and virtually everything before 1920. They were not officially accepted by the Swedish Academy until 1973; and so excluding them from declension templates can only be seen as bias from editors who want the language to be informal. However, as I quoted in the beginning, the goal of Wiktionary is not to prescribe or proscribe. Hence, for documentation purposes, they should be included.

Mårtensås (talk) 23:04, 26 October 2020 (UTC)[reply]

Pinging recently active editors who are native speakers of Swedish: @Robbie SWE, LA2, Jonteemil, Fringilla, Algentem, Knyȝt, Lundgren8, Mike, VulpesVulpes42, @Dreysman, Lou2shi1, Svenji, Liggliluff, Glades12, Tommy Kronkvist, Smiddle ←₰-→ Lingo Bingo Dingo (talk) 09:08, 27 October 2020 (UTC)[reply]
As they occur in all written Swedish before 1950 I’d it’s common enough to warrant inclusion. However, there are many more archaic verb forms and the question is where to draw the line. If we include plural forms, then it’s natural to include the past subjunctive which is based on the same stem and goes hand in hand with the plural forms, i.e. ginge and finge. If we back up even further then there’s also separate plural imperative forms such as sjungom and kommen, and in some cases even present subjunctive komme etc.
The patterns are quite simple:
  • For all verbs, the plural is the same as the infinitive in the present: vi tala
    • The exception is vara: vi äro
  • For weak verbs there is no difference in the past tense: han talade, vi talade.
  • For strong verbs, there is a special plural stem in the past tense, I’ll divide them into two categories:
    • Category 1, stem same as singular past: verbs like bita, ljuga, fara, gråta: jag bet, vi beto; jag ljög, vi ljögo, etc. (ablaut classes 1, 2, 6, 7)
    • Category 2, stem same as supine: verbs like stjäla, hinna: jag stal, vi stulo; jag hann, vi hunno, etc. (ablaut classes 3, 4)
    • Some verbs may have special past plural stems:
      • vara: jag var, vi voro
      • : jag gick, vi gingo
      • : jag fick, vi fingo
      • ge: jag gav, vi gåvo
      • be: jag bad, vi bådo
      • förgäta: jag förgat, vi förgåto
--Lundgren8 (t · c) 09:42, 27 October 2020 (UTC)[reply]
My standpoint is that if it occurs consistently in works labeled Swedish, it should be included in the dictionary. Because of this I believe the 1st person plural should be included, since it appears consistently, and not just in set phrases, in the first written work of modern Swedish, the Gustaf Vasa New Testament from 1523, and many others from this period. Again, all of these would be labeled as archaic, but there is no reason to exclude them. All that does is make the Dictionary less complete, since these forms were used, and there is ample attestation.
“Consistently” is the keyword for inclusion here; it’s why I don’t advocate for including cases in the noun declension tables; although they appear in the 1523 NT, they are used sporadically and inconsistently, and mostly in set phrases, which can be made as separate pages (man ur huse or i sinom tid. There is not a single work of written Swedish that consistently employs case, whereas there are many thousands which employ the plural forms. As for the 2nd plural form, it’s used in the 1917 Bible translation, and I don’t see the value in excluding it; Wiktionary isn’t paper, and these forms were very frequent and regular for the majority of the history of Swedish literature
For me, the subjunctive forms like ginge or vore should obviously be included. Vore is still used by most native speakers, and I can find pop songs from the 60s and 70s which use the ones like stode or finge.
This is all for strong verbs, however. I’m not sure they need to be included for weak verbs, since the plural conjugation there is very regular. You use the infinitive, and everything else is the same as the singular (Han har, vi ha; han hade, vi hade).
--Mårtensås (talk) 13:13, 27 October 2020 (UTC)[reply]
I don't think we need to clutter pages with inflection tables for archaic forms. If I see the word är or vore and I want to know what it means, I look it up under är or vore and find a link back to the lemma form. If I want to learn an inflection of vara for my own writing, I only care about modern forms. Vox Sciurorum (talk) 13:33, 27 October 2020 (UTC)[reply]

I agree that the archaic forms should be included in the standard tables for the sake of the completeness of the dictionary. —VulpesVulpes42 (talk) 14:18, 27 October 2020 (UTC)[reply]

Comment: We also include relatively-archaic subjunctive forms in Dutch verb tables, e.g. at zeggen. They don't clutter the table too much at all. Looking at the simplicity of the current verb tables for Swedish, I see no reason why including a few archaic forms should be a problem, but I don't know the details well enough to judge. — Mnemosientje (t · c) 16:58, 27 October 2020 (UTC)[reply]
Based on the above description, I am inclined to support listing these, with appropriate tagging. If there are any many obsolete forms which were once common, it might even make sense to separate them into their own separately collapsible/expandable table, i.e. have a conjugation section with two tables in a row, one for modern conjugation and one for obsolete forms. But if there are not many, just incorporating them into the main table with appropriate tags/labels should work. The current situation (across all languages) is a bit haphazard as far as which languages have complete tables and which have stubby ones. - -sche (discuss) 18:34, 27 October 2020 (UTC) typofix 21:28, 27 October 2020 (UTC)[reply]
This is an interesting idea. However, you need to limit it to the forms that were actively used during the 20th century, so you don't go many centuries back. Lundgren8 listed all the relevant cases above. You would need to update 200 articles or so. This is reasonable. The many verbs where the plural form is the same (e.g. arbetade) should not need any edit, so the forms that are needed should be added by means of a named parameter, e.g. "plural=gingo". Possibly, you should also add "subjunctive=ginge" (or let the template deduce it from the given plural).
Above, Mårtensås mentions "1st person plural", but what is this? The 20th century plurals are the same for 1st (we), 2nd (you) and 3rd person (they): Vi gingo, ni gingo, de gingo. There are older forms which differ by person, e.g. "låtom" (let us, a 1st person plural imperative) and "I gån" (you go, a 2nd person plural present tense). I recommend that you don't add these forms to the templates. Unfortunately, these forms were used in the 1917 Bible translation, but were already much antiquated at that time, sounding like the King James' version of the Bible (thou hast ...). --LA2 (talk) 21:23, 27 October 2020 (UTC)[reply]
@LA2
The 1st person plural I’m talking about is the -om/-e ending used in the 1523 NT and KXII Bible. For example in Psalm 78:3 (“The wij hördt hafwe och wetom, och wåra fäder oß förtäldt hafwa.”; “That we have heard and know...”)
But I agree, this should not be included in the regular templates. Mårtensås (talk) 22:22, 27 October 2020 (UTC)[reply]
I think ca.wikt has a good example of this, all automated in a single template. See e.g. ca:aconseguir; the first conjugation table includes modern forms, including dialectal ones considered "standard" (the "notes" column), the second is for colloquial and "non-standard" forms, and the third is for periphrastic forms. Ultimateria (talk) 05:46, 28 October 2020 (UTC)[reply]

I'm not convinced that they should clutter the conjugation tables though. Don't get me wrong, they are valuable in a historical context, but I for one have never – ever – stumbled across these forms in my everyday life (well, except for Håkan Hellström's songs, but he's from Gothenburg so it could just be a dialectal thing). Can they be included under usage notes or as alternative forms in non-lemma entries? Just seems like we're starting to merge Modern Swedish and Contemporary Swedish which I'm not sure is the way to go. --Robbie SWE (talk) 11:26, 28 October 2020 (UTC)[reply]

@Robbie SWE I've encountered teenagers writing poetry in 2020 using these forms, and I encounter them often in songs and books. You not hearing something in your everyday life is not a reason to exclude it from the Dictionary. Furthermore, the label "Swedish" on Wiktionary is the language spoken in Sweden from 1523 onward, and so all forms in this language should be included. Otherwise, we'd have to remove thousands of entries. Read the quote I posted; if you can't be bothered to scroll, I'll post it here: "Wiktionary is not an arbiter of what is good English; correct English, acceptable English, suitable English, or even grammatical. Wiktionary describes usage, it does not prescribe nor proscribe it, and adheres only to its criteria for inclusion, which state that any term or meaning that can be shown to be in sufficiently widespread use may be included. By including or not including a certain term, it by no means accepts or attempts to promote a certain point of view, but is simply documenting, explaining what is or was in use in English or any other language." These forms have been in use for the majority of the period of Swedish literature, and hence should be included. The forms not being used in casual speech by uneducated teenagers in 2020 does not mean they should be excluded, otherwise we'd have to remove all ancient languages like Old Norse (which became Swedish) or Latin (which became French, Spanish etc). We'd also have to remove the German genitive case, and I'm sure there are many other examples in languages I am not very familiar with. Final point: he forms are well attested, and so should be included, regardless of whether you like them or not. Mårtensås (talk) 18:07, 28 October 2020 (UTC)[reply]
Hold on a second there Mårtensås, why the aggressive tone? I did not in any way, shape or form say that those forms shouldn't be included, and I also did not comment this project's mission at all. So please, read my comment again before jumping to any conclusions. With that said, in all honesty, do you see these forms in newspapers, TV, radio, social media or contemporary literature on a daily basis? They're still extremely rare. Regardless, I do believe that they should be included. Just not sure the conjugation tables are the right place. --Robbie SWE (talk) 20:53, 28 October 2020 (UTC)[reply]
Sorry, I wasn't trying to be aggressive.
No, I don't see them in much contemporary content, and I think a big reason for this is that there is a general trend within the Swedish language today to be as informal and every-day as possible, which is why we see even large publications and advertising agencies using forms like "va" or "sa", trying to reflect spoken (Stockholm) slang as closely as possible.
But, my point is that these forms, for the vast majority of the history of Swedish (all the way up until 1950!), have been regular conjugations of verbs. They match the criteria for inclusion in Wiktionary, which is widespread attestation, but right now, most of these regular forms are nowhere on the site, and even if they are added, they're hard to find and inaccessible. Mårtensås (talk) 21:27, 28 October 2020 (UTC)[reply]
It will be funny to see archaic forms on modern verbs. You need an option to switch on and off. Just my two cents. --Anatoli T. (обсудить/вклад) 21:58, 28 October 2020 (UTC)[reply]
This is a good point by Atitarev. Neologisms and newer words should not show anachronistic forms. We do not have to discuss how common they are in contemporary Swedish; they are virtually non-existant today outside of some fixed expressions, having been phased out in most newspapers in the 40s and also slowly in literature starting in the beginning of the 20th century. Nils Holgersson (1906) famously had no plural forms in its dialogue for instance and many authors in the first half of the century didn’t use them. Nevertheless they were common in written material up to 1950. --Lundgren8 (t · c) 10:31, 29 October 2020 (UTC)[reply]

From reading this, to me there doesn't seem to be any real resistance against including plural and subjunctive forms for the verbs where they are applicable, such as (Jag gick, de gingo, om jag ginge), skola (Jag skall, de skola), bära (Jag bar, de buro) etc. The question more seems to be whether they should be in separate conjugation tables, or in the currently existing ones. If we include the first and second person forms (Vi burom, I buren; Vi skolom, I skolen), which I believe we should, since they are also attested, I think they should be in separate tables. I guess the question is how these should be technically implemented, which seems more like a "Grease Pit" discussion. Can I get an agreement on this from the other participants in this discussion? Mårtensås (talk) 12:23, 29 October 2020 (UTC)[reply]

I support adding e.g. gingo and ginge, even to the current template, or to a separate one. Personally, I could do without the -om, -en forms etc.; I think too many forms would clutter the table; we also have things like the pres. subj. vare, old pres. sing. like hörer, old imp. like gack, or even äst for 2nd person, and many our verbs outside of the most frequent ones probably have few attestations of these forms anyway. For instance, hoppa is a common verb, but attested hoppom is difficult to find. If we include these many forms, a separate table is probably needed, but it’s very easy to yield anachronistic or unattested examples if it’s done automatically. --Lundgren8 (t · c) 13:59, 29 October 2020 (UTC)[reply]
Many years ago a similar discussion was held on sv.wikt and draft tables were created, see e.g. sv:Mall:sv-verb-er-test3. I don’t remember exactly what happened, but I think the discussion got stuck in exactly these two problems (1) how many forms to include, and (2) anachronistic or unattested forms. By limiting the tables to the ”1940 forms” only, i.e. the plural and the past subj., we avoid these problems. --Lundgren8 (t · c) 13:59, 29 October 2020 (UTC)[reply]
I agree with Lundgren8 regarding which forms should be given leeway and I also share the concern that allowing all the forms might give rise to an anachronistic influx in the conjugation tables which will cause more harm than good. Nonetheless, I'm starting to come around to the idea that maybe – and that's a big maybe – they can be included in the conjugation tables if we had a function that hid them automatically unless the user actively makes a choice to view them. DEX (the Romanian online dictionary) has a function where you can choose to see incorrect or rare forms of verbs. They appear in another colour, alongside the regular forms, if the user actively chooses that option. If this can be implemented here, then I think we've found a practical solution. --Robbie SWE (talk) 14:53, 29 October 2020 (UTC)[reply]

@Robbie SWE, Mårtensås: I see now that this parameter has been added to the template, but it is not mentioned (afaik) in the documentation how to override it for anacronistic forms. --Lundgren8 (t · c) 15:12, 29 April 2021 (UTC)[reply]

"Pronunciation spelling"

[edit]

May I encourage anyone who is interested to comment on Wiktionary:Votes/2020-10/Use_of_"pronunciation_spelling"_label. There have not been any comments so far; I'm not sure if this means that no one has reviewed it or just that no one has seen any problem with it. Anyway, I would prefer to have some other people look at it before the vote starts. Thanks. Mihia (talk) 17:47, 27 October 2020 (UTC)[reply]

Vulgar Latin pronunciation: keep or delete?

[edit]

User:Ser be etre shi deleted the support for Vulgar Latin pronunciations in Module:la-pronunc on the grounds that it is fundamentally broken and unfixable. I don't necessarily disagree, but I feel it needs a discussion here on the Beer Parlour, particularly as it caused about 360 module errors since none of the pages that used the |vul= parameter were changed. As a result I reverted the change. What do people think? I think the reasoning for deleting this support is that (a) Vulgar Latin pronunciation varied from place to place; (b) it's not our place to be including reconstructed pronunciations of this sort in Latin entries. BTW the particular representation of "Vulgar Latin" used here seems to represent approximately Proto-Western Romance, not including the ancestor of Italian, Romanian or Sardinian, cf. /eːβˈri.e.taːs/, [eβˈre.e.das] for ēbrietās, showing lowering of short i and u (which didn't happen in Proto-Sardinian and only of i in Eastern Romance) and voicing of intervocalic t (which didn't happen in the ancestor of any of Italian, Romanian or Sardinian). Benwing2 (talk) 14:36, 30 October 2020 (UTC)[reply]

BTW if it is decided to delete this support, I will use a bot to remove occurrences of |vul=. Benwing2 (talk) 14:38, 30 October 2020 (UTC)[reply]
@Benwing2 It's not so much that it's "broken and unfixable": in the talk page (of the Lua module!, this is where the discussions are happening for some reason instead of Talk:la-IPA), I described it as a "temporary measure" as we switch to the DERom's Proto-Romance model. I apologize that those various ~360 entries broke due to their vul= argument! The idea was to rewrite the DERom output from scratch, while also likely adding a space for Plautus-era pronunciations (probably as an optional further argument). The existing output is not Proto-Western Romance, but something quite odd that still has Classical Latin nasalization alongside a betacism absent in modern French/Catalan/Portuguese... In the discussion I linked to, Brutal Russian described it (jokingly) as looking like someone's conlang. Pinging @Brutal Russian, @Ain92 and @Excelsius for their interest in this topic, too.--Ser be etre shi (talk) 16:05, 30 October 2020 (UTC)[reply]
Yes, I'm writing a separate module for Proto-Romance, whose pronunciation will be in-line with the one shown in this article, which largely follows the reconstruction in DÉRom (Dictionnaire Étymologique Roman) and various corroborating sources.--Excelsius (talk) 19:24, 30 October 2020 (UTC)[reply]
@Benwing2, Ser be etre shi While the current transcription is certainly incorrect, as I mention in the talk page it can easily be converted into a probable historical pronunciation of Latin, such as that at Pompeii. I don't think it should be deleted, just converted and added to. The point is that "Vulgar Latin" confuses proper and attested Latin with theoretical reconstructions that don't claim to form a separate linguistic system from that of Latin. The most obvious place for the latter to exist is not as mainspace Latin entires, but under "Reconstruction:", currently referred to as "Vulgar Latin". Under the most comprehensive approach that reconstructed Proto-Romance would differ minimally from Classical Latin in phonemic terms, and maximally in the way these phonemes are strung together. Most importantly there would presumably be no phonetic transcriptions of these PRmc entries, because they are only abstractions - hell, we don't even have phonetic transcriptions of medieval Romance varieties, and these are actually attested linguistic systems. Therefore switching away from the Vulgar moniker should only really invovle changing some names, and the whole purpose of it would be maintaining methodological correctness and avoiding confusing the unititiated, even when the word would theoretically be the same, and could be assigned the same phonetic transcriptions as a mainspace entry. I wanted to add "involve removing pronunciation sections from reconstructed entries", but these can be added when chronologically appropriate and thus serve to bridge the gap between Late Latin and PRmc. Any attempts to transcribe Early Medieval Romance would have to have their own module.
Segueing from this, I can probably handle the transcription part, but have just about zero coding skills, so I'd really appreciate it if someone can change the various template labels currently referring to "Vuglar Latin" to "Proto-Romance", as well as make it possible to disable the Ecclesiastical pronunciation!, which doesn't belong at all with some words. Come to think of it, it should also properly be renamed "Italian Ecclesiastical". We could provisionally leave the "VL." argument as an alternative to "prmc", or bot-replace all instances of it (which would be many). "vul=" however will need to go altogether and replaced with e.g. "camp=" for "Campanian Latin" from in and around the bay of Naples. I would also like to elaborate the "Plautine" or classical-conservative pronunciation, but from the previous request it's apparent that I don't know how to disable a transcription, even less add a whole new one. I'm also not sure of "Plautine" is the best label for it, since it incorrectly ties it to a particular author. "Old/Archaic Latin" is definitely too multivalent, confusing and thus to be avoided. "Middle Republican" would be chronologically perfect (264–133 BC according to Britannica), or we could put in a reference to "Roman" and "Conservative" somewhere. Maybe rename "Classical" into "Early Imperial" while we're at it, since many people are under the misconception that "Classical" stands in a diglossic relationship to "Vulgar" instead of referring to a period of literary production.
I also just found the "roa" language code with a couple of pages and the absolutely most ill-informed collection of aliases I've seen on wiktionary yet, something that beautifully exemplifies the problems we're dealing with. It isn't currently recognised in templates, and I don't know if we could do anything with it. Brutal Russian (talk) 00:20, 31 October 2020 (UTC)[reply]
The reason I'm making a separate module is that I found it a bit taxing, actually, to just modify the output from the Latin module until it's correct for Proto-Rom. Even on the phonemic level there were numerous differences, though that naturally depends on what scheme you subscribe to.
A separate module would also allow us to put a reconstructed transcription on entries like this without, as you mentioned, unnecessary Ecclesiastical (and Classical) ones automatically being added.--Excelsius (talk) 03:53, 31 October 2020 (UTC)[reply]
@Brutal Russian My original secret plan was, in fact, to surprise you with a reply to your request to make the Classical output (at the module talk page), in which I'd tell you it was now possible! :D That is, before you got to request it again to me. Alas, I haven't made more changes to the code yet. The option's coming very soon. As for the rest, I basically agree with all your corrections to what I said. In particular, I would love to have "Middle Republican" (I simply prefer it to "Roman Conservative") and "Early Imperial" too, especially the latter instead of "Classical" for the reason you mention. I interpret you as saying the current "Vulgar Latin" output in Reconstruction:Latin entries could be saved by changing its label to "Pompeii(-an?)" and making it produce appropriate Pompeiian output. Is my interpretation correct?--Ser be etre shi (talk) 08:22, 1 November 2020 (UTC)[reply]
@Ser be etre shi Yes, that is what I have in mind, maybe with a name that allows for inclusion of Herculanum and Puteoli among others, eg. 'Gulf of Naples'. J.N. Adams calls it 'Campanian' Latin. We could actually leave the current "Vulgar" arguments as aliases for something more appropriate or straight up make it display an error (xD) under the label template to encourage people to opt for better terminology. "Proto-Romance" would go into etymology templates - is there a way to search for all occurrences inside templates and modules? If you mean my Ecclesiastical request, even the mention of your original secret plan makes me feel fuzzy inside! Brutal Russian (talk) 09:44, 1 November 2020 (UTC)[reply]

Implementing "Removing Old English entries with wynns"

[edit]

Anyone object if I start implementing "Removing Old English entries with wynns", see WT:Votes? Option 1 passed and option 2 didn't, so it's clear the thing to do is implement option 1. The way I'll do this is by looking through each of the wynn-containing entries to make sure there's no useful information on the page, and rescuing any such information by moving it to the corresponding w-containing entry. Only then will I delete the wynn-containing entries. I'll also fix Module:languages/data3/a so that wynn in a link is automatically normalized to w. Benwing2 (talk) 14:52, 30 October 2020 (UTC)[reply]

👍 --{{victar|talk}} 16:18, 31 October 2020 (UTC)[reply]
This is done. The only entry that contained useful information rather than just pointing to the corresponding w entry was ƿ itself. Benwing2 (talk) 02:54, 1 November 2020 (UTC)[reply]

man'yogana for t:ja-readings

[edit]

Some kanji have readings that are only for man'yogana and different from all their later On'yomi and Kun'yomi. These readings are worth covering and we can have a Category:Japanese man'yogana generated from this. Besides, some of these readings survived as Kan'yoon or Nanori. A t:ja-readings with man'yogana readings easily explains the source of these Kan'yoon and Nanori.

Examples: 将(む), 麻(ま)give up. -- Huhu9001 (talk) 06:45, 31 October 2020 (UTC)[reply]

(Notifying Eirikr, TAKASUGI Shinji, Atitarev, Suzukaze-c, Poketalker, Cnilep, Marlin Setia1, 荒巻モロゾフ, 片割れ靴下, Onionbar, Shen233): -- Huhu9001 (talk) 08:00, 2 November 2020 (UTC)[reply]

I'm opposed to the Old Japanese words to be lemmatized, but support each Man'yōgana to be lemmatize. Because some of the readings of Man'yogana do not apply to modern Japanese and there have different phonological structure from modern Japanese. If individual Man'yogana are added to Japanese entries, they will contaminate the non-Old Japanese territories.--荒巻モロゾフ (talk) 09:53, 2 November 2020 (UTC)[reply]
They should be Old Japanese entries. — TAKASUGI Shinji (talk) 11:15, 2 November 2020 (UTC)[reply]
  • I'm generally supportive of the idea of including man'yōgana as a type of reading in {{ja-readings}} output. That said, I am concerned about the potential for mistakes.
In the examples above, the (ma) reading for may indeed be 慣用音 (kan'yōon, customary-use sound, often considered to derive originally from a misreading), but oddly this is in fact closer to the expected reading based on the Middle Chinese pronunciation. (ma) is also the more common reading for this character, and this is the only 音読み (on'yomi, Chinese-derived pronunciation) listed for this character in my dead-tree copy of Shogakukan's Kokugo Dai Jiten. The "official" reading (me) for the 呉音 (goon, literally “Wu sound”, the older layer of borrowed Middle Chinese) is harder to explain. The shift to the plosive (ba) for the 漢音 (kan'on, literally “Han sound”, a younger layer of borrowed Middle Chinese) is not terribly unusual, and is mirrored too in Chinese by one of the Min Nan pronunciations, .
→ Consequently, I don't think there is any need to call this out as man'yōgana -- rather, an explanation for the aberrant (me) reading would seem to be more useful.
Also, I am unfamiliar with used in any way as phonetics for the syllable /mu/, and indeed it is missing from the man'yōgana table over at Wikipedia. This character appears in 漢文 (kanbun, Literary Chinese text) contexts prepended to verbs in the Classical Chinese future-tense or volitional sense of "will, be going to", which is then rendered in 漢文訓読 (kanbun kundoku, literally “Chinese text meaning reading”, reading out a Chinese text in Japanese) as the Old and Classical Japanese volitional / suppositional suffix (mu).
→ However, this does not make a man'yōgana, and indicating that is a man'yōgana for (mu) would be a mistake.
‑‑ Eiríkr Útlendi │Tala við mig 19:36, 2 November 2020 (UTC)[reply]