Jump to content

Wiktionary:Beer parlour/2021/August

From Wiktionary, the free dictionary

temporary right to move pages without redirect

[edit]

I want temporary extended mover right for one week. I will be moving pages like Sanskrit परिच्छेदासः (paricchedāsaḥ), which is Vedic Sanskrit form of a term which is only in Classical Sanskrit, to [[user:Svartava/...]] (So it is not in mainspace). See google books:परिच्छेदासः: the word is completely unattested. The google hits fo the term are from Wiktionary and other Wiktionary-based-sites. Similarly, some non-lemma inflections like डालाभ्याम् (ḍālābhyām), which are possible but completely unattested. They are shown on the main page डाल (ḍāla) (in declension), so anyone who (after this inflection has been moved) searches for this will be guided to the main page and can see which form it is. there are lot of such pages needing to be cleaned up. Related link: https://en.wiktionary.org/wiki/Wiktionary:Requests_for_deletion/Non-English#कदाय,_कदासः,_कदेभिः,_कदेभ्यः,_कदेभ्यस्,_खेदासः,_खेदेभिः,_राज्ञीभ्याम्,_राज्ञ्याम्,_राज्ञ्यौ,_शुख also informing user:BhagadattaSvārtava08:23, 1 August 2021 (UTC)[reply]

@Svartava: see Wiktionary:Whitelist, expires at 23:16, 9 August 2021 Kutchkutch (talk) 23:22, 2 August 2021 (UTC)[reply]
Where was this policy approved? While one could sanely exclude परिच्छेदासः (paricchedāsaḥ) on the basis that noone composes good Vedic Sanskrit using later Sanskrit vocabulary, the only argument I can see for automatically excluding words like डालाभ्याम् (ḍālābhyām) is that noone uses the lemma in compositions nowadays. There are plenty of words around that don't occur in Google books - I have found Google books generally useless for much of the Pali that I am recording. Instead, we have RfV for inflected forms like डालाभ्याम् (ḍālābhyām). It's also been argued that we should allow perfectly possible word forms like this to be generated by bots. I don't like that being done, but we should follow proper procedure. The absence of this form is surely just an accidental gap. --RichardW57m (talk) 14:47, 3 August 2021 (UTC)[reply]
@RichardW57m this page does no good. it is already on the main page. we dont need it. for example, have a look at User:Svartava/कृपाभ्याम् (ins//dat//abl of कृपा (mercy) in dual i.e. 2 mercies). Moving this is, atleast, approved by User:Bhagadatta. (note that the google and/or books results are just examples of how feminine-ā stem terms decline. कृपा being a 2-letter short word is convenient to be used) — Svārtava04:29, 4 August 2021 (UTC)[reply]
@Svartava: I don't like the addition of inflected forms without a reason. Reasons for creation include missatisfied links (orange links are an option only available to registered users), their being alternative citation forms (many treat Pali nominative singulars as citation forms - there's a massive backlog there), and homography with lemmas. They also make sense for organising quotations evidencing their existence. They may also need a separate entry for the pronunciation to be recorded adequately. However, despite my dislike of the unnecessary entries, I still accept that we should follow due process when removing entries. --RichardW57m (talk) 09:44, 4 August 2021 (UTC)[reply]
@RichardW57m i mentioned clearly what i will do. 2 administrators have given the approval.the process would actually take a lot of time,so I am just moving these w/o wasting any more time. you can consider this equivalent to {{speedy}}. — Svārtava10:51, 4 August 2021 (UTC)[reply]
Administrators have no special status in determining which pages to delete. Their job is to implement the consensus of the community. If there is any uncertainty about whether there is such a consensus, it is safer not to delete the page, but instead flag it RFD or RFV.
@Svartava, Sodhaksh: I actually consider it vandalism. The only saving grace is that the pages are still accessible. They can still act at least in parts as decoys for forms in other languages. I agree with Benwing2 that these are not cases for {{speedy}}. How will we ever know whether there was consensus to delete these pages? @Kiril kovachev. I'd be happier with the process if Sodhaksh publicly conceded that he should not have created these entries.--RichardW57m (talk) 12:41, 4 August 2021 (UTC)[reply]
An example of a decoy is Pali अनेन (anena). I though it was helpful when I created it, because stemming algorithms will not recover the lemma from this form. It turns out that it hides the instrumental singular of three Sanskrit words - two corresponding pronouns and the noun अन (ana, breath). The page will ultimately be needed for a lemma, because that word is also a Sanskrit lemma in the MW dictionary. I'm torn between leaving the page as an example decoy, and adding the soft links to the Sanskrit lemmas to the page. --RichardW57m (talk) 12:41, 4 August 2021 (UTC)[reply]
@RichardW57m Why is it that I was mentioned here? I am sorry, I don't think I'm competent enough to add anything meaningful, unless there's a way I can be useful to you. I sadly don't know anything about Sanskrit. What is a decoy? Why is Pali अनेन (anena) a decoy, and what's the significance of that? Sorry for my ignorance. Kiril kovachev (talk) 17:02, 4 August 2021 (UTC)[reply]
@Kiril kovachev: You asked for permission to run a bot adding inflected forms. Svartava was asking for (and has been granted) permission to semi-remove manually added grammatical inflected forms with no alternative forms which a simple test finds to be unattested. The page अनेन, which only lists Pali words, is a decoy because the obvious search tools will not find Sanskrit अनेन - they will report the Pali word instead. One has to explicitly look for a link to अनेन to find the Sanskrit words, which appear in declension tables. --RichardW57m (talk) 12:58, 5 August 2021 (UTC)[reply]

──────────────────────────────────────────────────────────────────────────────────────────────────── RichardW57m, SodhakSH I am a bit confused. What is the official Wiki policy for non-lemma forms? Do we keep those that have any attestation? For example the Sanskrit अनेन has a usage in the Bhagavad Gita - see here. Do we keep it? Asking just so that I don't create unnecessary pages in the future. Rishabhbhat (talk) 15:24, 6 August 2021 (UTC)[reply]

@Rishabhbhat thanks for asking. till now i haven't read any 'policy' on this, but actually unattested forms are pretty useless and unnecessary. e.g. user:svartava/मूर्खताभ्याम्. anena is a very well attested form, it should be there. all you should avoid creating is unattested forms. we kept कृच्छ्रे for now but all of it isnt attested (in literature, not dictionary) like the feminine forms and the dual ones. — Svārtava01:38, 7 August 2021 (UTC)[reply]
@Rishabhbhat, svartava: There is no policy. We seem not to even have a policy for what to do when an inflected form fails an RfV and has its entry or page deleted! Should it be removed from the inflection table? --RichardW57m (talk) 14:07, 9 August 2021 (UTC)[reply]
The concern for 'policy' is reassuring, but I don't feel that the policy on deletion is being followed. --RichardW57m (talk) 14:07, 9 August 2021 (UTC)[reply]
@RichardW57m IMHO we should just speedy-delete impossible forms (e.g. Vedic forms of classical words) and the ones that are just ridiculous (e.g. कृपाभ्याम् - with/for/from two mercies?!?).
But we should keep those that are perfectly possible even if they are not attested, so that anyone searching for an inflection should find exactly what it means.
</opinion>
Rishabhbhat (talk) 04:52, 11 August 2021 (UTC)[reply]
Speedy-deletion is fine for Vedic-only forms of non-Vedic words if we can be sure that they are non-vedic. Can we? Which set of dictionaries gives us the complete coverage of Vedic words? --RichardW57m (talk) 09:10, 17 August 2021 (UTC)[reply]
Is "with two mercies" ridiculous? Very few nouns in English cannot be used in the plural - are there many nouns in Sanskrit that can only be used in the singular? If so, |n=s or the like should be applied to their declension tables. --RichardW57m (talk) 09:10, 17 August 2021 (UTC)[reply]
'With two mercies' is logically improbable and unlikely to ever be used. I don't think there are any rules forbidding its usage, though. However, even I have previously attempted to add as many noun forms as possible - but most were deleted. I gave up after that. Rishabhbhat (talk) 05:15, 19 August 2021 (UTC)[reply]
It does occur in the general sense in English, e.g. "God restored both Ruth and Naomi lives by repaying them with two mercies for each of their troubles (Ruth 2:1–23)", and I get the feeling that that idiom also occurs in Sanskrit. I actually got 4000-oddd raw google hits, although in some contexts (games especially), English 'mercy' seems to have a technical meaning I can only guess at. (It seems to be a specialisation or development of one of our recorded senses.)
@Rishabhbhat: What was the purpose of creating entries for these case forms? Were you just worried that they might come to be shadowed by another page? --RichardW57m (talk) 09:05, 19 August 2021 (UTC)[reply]
Sorry, I don't quite understand what you are referring to. If you are talking about the ones I had created and were moved/deleted, my aim was for each and every form of each word to be clearly defined for what it is - as I mentioned during the discussion for कृच्छ्रे (which was kept). So that anyone coming across the term while reading/researching can easily find it out. Svartava/Sodhak disagrees.
For कृपाभ्याम् I don't get any results on any of the three major search engines except for declension tables. It may exist in English. But it can be kept- it's totally harmless.
Anyway, I get your point. So should I run a bot to mass-create all possible forms? I have a feeling this has been proposed before, but still.
--Rishabhbhat (talk) 09:47, 19 August 2021 (UTC)[reply]
I agree with the view that it suffices to find the form in an inflection table. (I use desktops, so I may be missing an issue with mobiles.) Given that, I view such mass-creation as counter-productive. It would hide some Pali and (ultimately) Prakrit inflected forms (and probably others as well), and so entries for them would have to be added. --RichardW57m (talk) 10:37, 19 August 2021 (UTC)[reply]

permanent right to move pages without redirect

[edit]

@Bhagadatta Any objection(s) to:

diff: can you make the moving right permanent and
for a period of 7 days at WT:Whitelist

? Kutchkutch (talk) 18:03, 7 August 2021 (UTC)[reply]

@Kutchkutch: I'll defer to the direction of any other administrator regarding this. -- 𝓑𝓱𝓪𝓰𝓪𝓭𝓪𝓽𝓽𝓪(𝓽𝓪𝓵𝓴) 01:06, 9 August 2021 (UTC)[reply]

Automatic rhymes

[edit]

So I think it's due time to propose automatic rhymes once again.

The major current problem with rhymes is the fact that one has to add a {{rhymes}} template to the page in question, while also having to manually add it to the appropriate Rhymes: page. This results in those of us editing rhymes being unhappy with the double work, and many others not doing it at all.

Now, I can see two solutions for this problem:

  1. Move all rhymes to categories, which will be automatically populated by pages using {{rhymes}}
  2. Send a formal request to the mediawiki developers to make the Rhyme: namespace work like a category.

What are others' thoughts on this? Notifying @Rua, Dan Polansky, DTLHS, -sche, Atitarev, Ruakh, Erutuon, Vininn126, Fenakhay, Shumkichi. Thadh (talk) 23:24, 2 August 2021 (UTC)[reply]

I remember some people mentioning the problems of it being based on {{IPA}} which is that languages treat rhymes different. So maybe what users can do is create the various pages of rhymes that are then entered to the {{rhymes}} template as they already are. The biggest change is that by doing this, that page would automatically update with the new word. I support the automation of rhymes. Vininn126 (talk) 23:19, 2 August 2021 (UTC)[reply]
There's no need to modify the {{IPA}} template since {{rhymes}} already has the information needed to categorize into a particular rhymes category. The scope of this proposal should be made clear: to utilize the existing information contained in {{rhymes}}, or to automatically generate rhymes directly from the IPA string. DTLHS (talk) 00:57, 3 August 2021 (UTC)[reply]
I think it's easiest to go for the former, while leaving the latter to individual language communities. Thadh (talk) 06:25, 3 August 2021 (UTC)[reply]
Hard agree. It feels like the tools are already there, we just need to optimize them. Vininn126 (talk) 11:52, 3 August 2021 (UTC)[reply]
Moving rhymes to categories makes sense to me. —RuakhTALK 04:21, 3 August 2021 (UTC)[reply]
Agreed. Ultimateria (talk) 16:48, 4 August 2021 (UTC)[reply]
I support the idea, but there's at least one issue I can think of; we'd need some way to group the entries by syllable count, as many rhyme pages currently do, and I'm not entirely sure if the category infrastructure MediaWiki has lets us do that. — surjection??15:42, 5 August 2021 (UTC)[reply]
And if it doesn't, I don't really think it's a deal breaker either way. — surjection??15:57, 5 August 2021 (UTC)[reply]
Couldn't we just make something like {{dialectboiler}} with a parameter for the amound of syllables? That seemed like the most reasonable way to me. Thadh (talk) 16:08, 5 August 2021 (UTC)[reply]
That could work. Something like word with a {{rhymes|en|ɜː(ɹ)d}} would end up in Rhymes:English/ɜː(ɹ)d (or Category:Rhymes:English/ɜː(ɹ)d), and with some kind of additional parameter such as {{rhymes|en|ɜː(ɹ)d|syllables=1}}, also in Rhymes:English/ɜː(ɹ)d/1 syllable (or Category:Rhymes:English/ɜː(ɹ)d/1 syllable). — surjection??18:31, 5 August 2021 (UTC)[reply]
I just now understood what you meant. Yes, that seems like a good idea to me, doing something like lemmas vs POS categories. Thadh (talk) 19:42, 5 August 2021 (UTC)[reply]
This is exactly the sort of thing I was thinking. Vininn126 (talk) 20:17, 5 August 2021 (UTC)[reply]
I've written an initial version of Module:User:Surjection/category tree/poscatboiler/data/rhymes (to eventually merge into Module:category tree/poscatboiler/data/rhymes) for the new rhyme categories. The format is as I described: Category:Rhymes:English, Category:Rhymes:English/əʊθ, Category:Rhymes:English/əʊθ/1 syllable, Category:Rhymes:English/əʊθ/2 syllables. The last two would be a subcategory of the second category, which is itself under the first. It doesn't however support "intermediate" categories such as Rhymes:English/əʊ-. Maybe those should remain as kind of "indexes", maybe just as categories that are added manually, or do we need them at all? — surjection??11:43, 6 August 2021 (UTC)[reply]
@Surjection Maybe those should stay as some sort of index - after all it would be nice to have an easier way to organize them. Vininn126 (talk) 19:15, 7 August 2021 (UTC)[reply]
@Rua, Dan Polansky, DTLHS, -sche, Atitarev, Ruakh, Erutuon, Fenakhay, Shumkichi, Thadh Thoughts on this? Vininn126 (talk) 11:34, 9 August 2021 (UTC)[reply]
The way you group things in categories is by using a sort key. The template could build the sort key from a number representing syllable count followed by the entry name, i.e. multisyllabic would have "5multisyllabic" and syllabic would have "3syllabic". Since the system adds a subheader each time the first character changes, you would have "1" followed by all the monosyllables, etc. Allowing for more than 9 syllables would complicate things, but it wouldn't be that hard to make it work. Chuck Entz (talk) 04:48, 6 August 2021 (UTC)[reply]
That is another option, but it only allows single digits, so I'd argue using subcategories are a better option (and it allows grouping by the first letter as usual). — surjection??09:38, 6 August 2021 (UTC)[reply]
I rather like User:Chuck Entz's solution since it avoids splitting the rhymes into separate per-syllable-count pages. The number of words with > 9 syllables is small (e.g. out of 37,972 pages in Category:Italian terms with IPA pronunciation, there are only 10 with >= 10 syllables) so I wouldn't worry about distorting the solution to accommodate them; we could e.g. put all words with > 9 syllables under the ">" character or similar. The issue with splitting into per-syllable-count pages is that it makes it less convenient to view the rhymes, particularly for the many rhymes where the number of entries is relatively small. Benwing2 (talk) 03:58, 8 August 2021 (UTC)[reply]
You can put them into both the parent category and into the individual syllable categories. DTLHS (talk) 04:01, 8 August 2021 (UTC)[reply]
That is exactly what I had in mind. It'd put the entry under the rhyme category and the rhyme/syllable count category if available. — surjection??16:27, 8 August 2021 (UTC)[reply]
Good point, that would work. And I agree with the general idea of this thread that moving from the current Rhymes: pages to categories would be a good idea. Even pages like Rhymes:English/æ- linking to Rhymes:English/æb, Rhymes:English/æbd, etc could be reproduced as categories if we want (e.g. "rhymes in æ-" containing "rhymes in æb", "rhymes in æbd", etc, or whatever naming scheme is used). As discussed elsewhere, we could also use categories for anagrams. - -sche (discuss) 01:54, 10 August 2021 (UTC)[reply]
FWIW, I have now implemented this in userspace; Module:User:Surjection/rhymes (updated module), User:Surjection/Template:rhymes/documentation (updated template documentation). The poscatboiler code is already live, so these two are all we "need" to implement this change now. — surjection??15:34, 13 August 2021 (UTC)[reply]
I am for deploying it. Vininn126 (talk) 10:18, 16 August 2021 (UTC)[reply]
I am now deploying it — surjection??17:20, 16 August 2021 (UTC)[reply]
I think the categories should be "Finnish rhymes/..." rather than "Rhymes:Finnish". DTLHS (talk) 17:31, 16 August 2021 (UTC)[reply]
That would've conflicted with the existing Category:X rhymes that links to the old (and arguably now obsolete) rhyme index pages. — surjection??17:44, 16 August 2021 (UTC)[reply]
If they're now obsolete you need to clean them up / delete them. DTLHS (talk) 02:33, 17 August 2021 (UTC)[reply]
Cleaning those categories up isn't enough. We'd need to decide what to do with the entire Rhymes namespace - it'll probably have to be subjected to a vote, much like with Index. — surjection??09:43, 17 August 2021 (UTC)[reply]
(Wiktionary:Beer parlour/2021/August#Retiring Rhymes:) — surjection??10:02, 17 August 2021 (UTC)[reply]
I think we should change the link the rhymes template provides so that it links to the category rather than the rhyme page. Vininn126 (talk) 08:30, 19 August 2021 (UTC)[reply]
I agree, but I think that that was the objective of the discussion below Thadh (talk) 08:36, 19 August 2021 (UTC)[reply]
Very late: For the record, the idea of using categories is impractical for Czech rhyme pages since most entries in them are using redlinks. --Dan Polansky (talk) 21:55, 12 August 2022 (UTC)[reply]

Delay of the 2021 Board of Trustees election

[edit]

We are reaching out to you today regarding the 2021 Wikimedia Foundation Board of Trustees election. This election was due to open on August 4th. Due to some technical issues with SecurePoll, the election must be delayed by two weeks. This means we plan to launch the election on August 18th, which is the day after Wikimania concludes.

For information on the technical issues, you can see the Phabricator ticket.

We are truly sorry for this delay and hope that we will get back on schedule on August 18th. We are in touch with the Elections Committee and the candidates to coordinate next steps. We will update the Board election Talk page and Telegram channel as we know more. Best, JKoerner (WMF) (talk) 22:11, 4 August 2021 (UTC)[reply]

Call for Candidates for the Movement Charter Drafting Committee

[edit]

Movement Strategy announces the Call for Candidates for the Movement Charter Drafting Committee. The Call opens August 2, 2021 and closes September 1, 2021.

The Committee is expected to represent diversity in the Movement. Diversity includes gender, language, geography, and experience. This comprises participation in projects, affiliates, and the Wikimedia Foundation.

English fluency is not required to become a member. If needed, translation and interpretation support is provided. Members will receive an allowance to offset participation costs. It is US$100 every two months.

We are looking for people who have some of the following skills:

  • Know how to write collaboratively. (demonstrated experience is a plus)
  • Are ready to find compromises.
  • Focus on inclusion and diversity.
  • Have knowledge of community consultations.
  • Have intercultural communication experience.
  • Have governance or organization experience in non-profits or communities.
  • Have experience negotiating with different parties.

The Committee is expected to start with 15 people. If there are 20 or more candidates, a mixed election and selection process will happen. If there are 19 or fewer candidates, then the process of selection without election takes place.

Will you help move Wikimedia forward in this important role? Submit your candidacy here. Please contact strategy2030(_AT_)wikimedia.org with questions. Best, JKoerner (WMF) (talk) 22:12, 4 August 2021 (UTC)[reply]

Internationalism

[edit]
Earlier discussion: Wiktionary:Beer parlour/2016/January#Internationalisms in etymologies

I think it'd be a good idea to have a {{internationalism}} and Category:Internationalisms by language to mark templates as internationalisms in etymology sections. This would be particularly useful for the plethora of terms part of the so-called international scientific vocabulary that is still gaining new terms mostly constructed of Latinate and Greek elements. It's in theory possible to determine the language in which they were coined for at least some of them, but there's also plenty where that task is much harder if not practically impossible.

Naturally though, I'd say that if {{internationalism}} were to exist, its usage would be restricted in cases where the immediate source language is known. After all, those are just borrowings rather than internationalisms. The problem is that you sometimes simply cannot know the exact chain of languages a word went through.

In order to prepare for this, we'd also need a better definition under Appendix:Glossary#internationalism. The existing definition doesn't really exclude Wanderwörter. I'd argue that internationalisms specifically have to be words that have spread in the modern age, as internationalisms in the sense I see couldn't really have existed in a world before languages were as connected as they are now (or were a couple hundred years ago or so). Another possible prerequisite is that the word has been adapted into the target language, such as by the mostly regular but language-specific processes that govern how Latinate and Greek components are adapted over. As an example, a word like Latin positiō would have once upon a time been adapted into Finnish as positsiooni, after the German and/or Swedish models, but in modern language it's positio. This also applied to other words; postpositio was once postpositsiooni.

Hungarian already appears to have a template, {{hu-int}}, for this sort of purpose. — surjection??15:21, 5 August 2021 (UTC)[reply]

I support this. Vininn126 (talk) 13:59, 6 August 2021 (UTC)[reply]
I think this is a possible solution to a real issue but needs to be thought through carefully. I ran into this issue a lot when creating entries for Russian internationalisms like канонизи́ровать (kanonizírovatʹ, to canonize), дисквалифици́ровать (diskvalificírovatʹ, to disqualify) and кассацио́нный (kassaciónnyj, cassation (relational)). My solution was to assume these entries came from German unless there was some evidence to the contrary (e.g. no such corresponding word in German, or the definitions didn't match), and write something like "Probably borrowed from {{affix|ru|kanonisieren|-овать|lang1=de}}". Sometimes I just said "Ultimately borrowed from " followed by a Latin or Greek term. But both of these solutions are questionable and subject to a good deal of guessing. What I'm concerned about with something like {{internationalism}} is that people will use it lazily and promiscuously to avoid actual etymological investigations, and it will end up being more or less meaningless. As it is, it's not clear to me that an etymology that says nothing but "Internationalism" actually contributes much anything over just leaving the etymology unspecified. It would be better to include a sample of corresponding terms in other languages and list the underlying Latin or Greek terms that make up the word, which at least provides some context. Benwing2 (talk) 03:50, 8 August 2021 (UTC)[reply]
Therein lies the rub. I don't think it's possible to just discourage people from using it lazily, even by having the documentation contain a text in big red all-caps saying "add an ultimate origin (Latin, Greek) or at least comparisons whenever you use this template!" But if one thinks about it, right now most languages simply have no etymologies whatsoever for these so-called internationalisms, so even just an "internationalism" would be better than nothing at all. It'd also add the entry to the appropriate category and thus give people an easy avenue to find etymologies to improve by adding details to. — surjection??17:02, 8 August 2021 (UTC)[reply]

Deleting "Hangul syllable" entries

[edit]

All entries that consist only of {{ko-syllable-hangul}}, like (gwael) and (myak), should be deleted by bot. All etymology sections containing them should also be deleted by bot and the etymologies renumbered, if possible. This has been done manually since a few months ago by the two Korean editors here, but a more drastic solution seems preferable.

For those unfamiliar with Korean, these are the equivalent of nonsense sequences of Latin alphabet letters like "swrg" or "gwerq". For some computer-related reason they have been assigned separate Unicode characters, but they are not in any sense Korean words any more than "wetw" is an English word. As Korean is written in an alphabet, the composition and theoretical pronunciation of these syllables is highly transparent.

I am unsure who exactly these entries benefit or are intended for. The only possible demographic I can think of is people who 1) have zero knowledge of Korean, but 2) decide to look up Korean characters anyhow, and 3) not just any characters but ones that are not actually words in the language. This seems unlikely to be a large group of people.

This has the following effects (in increasing degree of severity):

  • It clogs up Category:Korean lemmas with "words" that are not only not lemmas but not words at all, and in many cases never actually used in the language.
  • It wastes the time of editors who are manually creating these because an automated bot could make hundreds upon thousands of them within ten minutes.
  • The pronunciation section is misleading because the phonetic pronunciation of Korean syllables varies depending on its position within the word.
  • It makes it appear as if there are actual entries for words like (kwol, (colloquial) quality) and (hing, onomatopoeia especially commonly used by young women). This confuses and disappoints readers, who are obviously going to be looking for a definition of the word they have just encountered, only to be faced with this non-entry.--Tibidibi (talk) 15:48, 5 August 2021 (UTC)[reply]
SupportSuzukaze-c (talk) 00:52, 6 August 2021 (UTC)[reply]
Support Benwing2 (talk) 06:35, 6 August 2021 (UTC)[reply]
This proposal is overwhelmingly dishonest:
  • They are not nonsense words like 'swrg', 'gwerq' or 'wetw'. The English analogy would be nonsense words like 'thung' or 'gwet'.
  • I believe their character-like nature is a design feature. Remember that they were designed for an environment where writing was done using Kanji.
  • Tools for manipulation of decomposable Korean characters are not as well-known, so there may be some usage there. Another possible usage group is people without access to a Korean script renderer.
  • Putting them in Category:Korean lemmas looks like a bug in {{ko-syllable-hangul}}; that should be fixed.
  • There are exactly 11,172 of them. Perhaps it would be more productive for a bot to complete the set.
  • It would seem that the pronunciation section is in need of some work.
Furthermore, WT:CFI allows "Characters used in ideographic or phonetic writing such as 字 or ʃ."
Oppose --RichardW57m (talk) 12:05, 6 August 2021 (UTC)[reply]
  • @RichardW57m How exactly are (ppyon) or (myak) like "thung" or "gwet"? These are outright impossible syllables in Korean. If you ask Korean speakers to come up with nonsense syllables that still sound like they might be words in the language (the same way you could get English speakers come up with "thung" or "gwet"), they will never come up with (ppyon) or (myak). These are bizarre combinations to anyone with the slightest knowledge of Korean, fully equivalent in absurdity to "swrg", "gwerq", or "wetw".
    Please explain why. What constraint on words do they violate? Or is it just that they are not phonetic writings of isolated syllables? For example, word-final 'k' is rare, but it does occur. Or doesn't the suffix (nyeok) sound Korean? --RichardW57 (talk) 02:02, 8 August 2021 (UTC)[reply]
  • @RichardW57 /Cja/ is not common in standard Korean, and the sequence (mya) in particular is extremely rare (only three entries in the standard dictionary, excluding redirects to standard forms and recent and uncommon loans, and all of them rare words). The final (k) is similarly uncommon. This is not remotely like English /θʌ-/ or /-ʌŋ/, both extremely common sequences in the language. (ppyon) should not even be worth discussing, but the final (nj) appears only in 앉다 (anda) and 얹다 (eonda) and their derivatives, whereas the iotized vowel (yo) does not occur in verbal stems. Perhaps they are not "impossible" in a theoretical sense, but they are certainly combinations so improbable as to be functionally impossible to imagine in the language. And as a native speaker, I can assure you that they are no less bizarre than e.g. "pswm" (which, by the logic you seem to suggest, could also be a plausible English word given the existence of pseudoscience and cwm).--Tibidibi (talk) 06:43, 9 August 2021 (UTC)[reply]
  • I see nothing particularly "character-like" about Hangul syllables other than the fact that they are written in syllabic blocks. Every letter combines predictably, and there are no particular ligatures that demand special treatment. If anyone has learned the alphabet, they will be able to read all 11,172 Hangul Syllables without any problem whatsoever.
    What about writing them? Kerning and sizing look potentially complicated. One could say much the same about the classical Mongolian script - but they insist on teaching it as CV syllables! (There is small amount of ligaturing in Mongolian, and the Unicode Consortium was beaten down to accepting a horrible and actually undefined phonetic encoding, so copy-typing is nightmarish.) --RichardW57 (talk) 02:02, 8 August 2021 (UTC)[reply]
  • A Korean-English dictionary should not cater to a demographic so inept in the basics of the script as to find the (minimal) changes in the curve or proportional size of the letters "complicated". I don't claim to know anything about the Mongol script, but if it's anything like Manchu, the impact of ligatures is significantly larger than in Korean (where, I repeat, ligatures may as well not exist). Also, Koreans (unlike Manchus or Mongols) do not teach or learn Hangul as a syllabary, so that seems irrelevant.--Tibidibi (talk) 06:43, 9 August 2021 (UTC)[reply]
  • The purpose of a Korean-English dictionary should not be to service "people without access to a Korean script renderer", nor to help people who cannot even read the alphabet. In any case, people who lack access to a Korean script renderer and people who cannot read the alphabet seem extremely unlikely demographics to look up random syllables that are not words in the language and which they are unlikely to encounter in Korean text.
  • I don't see your point in noting that there are 11,172 Hangul Syllables encoded in Unicode.
    You spoke of hundreds of thousands of syllable blocks, giving the impression that there were an enormous number of precomposed syllable blocks. My point is that there is a fixed number of them, and there won't be any more. The current number of Korean lemmas and syllable blocks is 35,778.
  • 11,172 already is "an enormous number of precomposed syllable blocks"! And note that my point there is not the exact number of existing syllable blocks, but that they are a waste of effort if manually created.
  • The pronunciation section is an irremediable issue because Hangul syllables do not actually have one fixed pronunciation, even phonemically. Some syllables have long vowels, some syllables have orthographically unmarked tensing, some syllables (depending on dialect) have high pitch. (geu) in isolation is pronounced [kɨ˨] in Middle Korean, [kɯ] in modern Seoul, and [kə] in Busan. How would you remedy this?
    Giving them all in a collapsed subsection is the obvious solution. Are all these applicable alternatives being given for words? --RichardW57 (talk) 02:02, 8 August 2021 (UTC)[reply]
  • As a matter of fact, yes. All Standard Korean variation in vowel length and tensing is marked by {{ko-IPA}}, including in different words written with the same Hangul syllabic block.
And "giving them all in a collapsed subsection" is a non-solution. For instance, any syllable might have either a short or long vowel; depending on the morphology, any syllable with a lenis initial might actually be pronounced with a fortis initial not marked in the spelling. Should every syllable block entry therefore have a paranthetical vowel length mark, or note that the initial might be fortis in some collapsible box? This is not a simple matter of dialectal or synchronic distinctions in realization, but a problem caused by the fact Standard Korean has phonemic distinctions that the modern script does not consistently express. Hence any attempts to assign a definitive pronunciation for a syllabic block is misguided.--Tibidibi (talk) 06:43, 9 August 2021 (UTC)[reply]
  • These are not "characters used in ideographic or phonetic writing". These are compounds of such characters, combined in fully predictable fashion. The two examples given in WT:CFI itself show the failure of the argument; is a logogramic character, and ʃ (ʃ) is a single letter in an alphabet. They are equivalent to individual Hangul consonant and vowel letters, not the syllables.
    Sounds like the multilingual letter î. Is not that a predictable combination? --RichardW57 (talk) 02:02, 8 August 2021 (UTC)[reply]
  • It is worse because the entry is utterly valueless to a language learner. Anyone who has learned Korean for more than two days will know the "information" contained therein.--Tibidibi (talk) 06:43, 9 August 2021 (UTC)[reply]
  • I agree with User:Tibidibi here. It is a shame that Unicode decided to waste so much space in the BMP with these encoded syllables. That's 11,000+ code points out of 65,536 that could have been used for something better. AFAIK Korean is the only language that gets such treatment, and it has encouraged misguided proposals like the Tamil All Character Encoding that seek to emulate this for other languages. I think the decision to do this was made due to a desire to maintain round-trip compatibility with some now-long-obsolete Korean-language multibyte encoding, but whatever. Benwing2 (talk) 23:59, 7 August 2021 (UTC)[reply]
    It's worse than that - it was a purely political decision to get past Korean objections in the ISO process. However, these Tamil encodings do represent how some people's minds work. Tamil children in Canada apparently do have trouble conceiving of CV combinations as consonant plus vowel. --RichardW57 (talk) 02:02, 8 August 2021 (UTC)[reply]
    Most Latin script languages get such treatment, in the form of precomposed letters. There's also a fair bit of BMP squandered on precomposed letters, and there's the dead waste of Arabic presentation forms. --RichardW57 (talk) 02:02, 8 August 2021 (UTC)[reply]
  • I agree about the Arabic presentation forms; totally useless. Precomposed letters are a bit different; there are fewer of them and only ones that actually are in use are defined. Also in some languages, precomposed letters are treated as single entities for sorting, and there were many existing encodings at the time e.g. ISO-8859-1 that contained them. As for Tamil, I strongly suspect that "this is how people's minds work" is not a legitimate argument; the history of writing shows that people in *all* languages think more naturally in terms of syllables or CV sequences than in terms of separate consonants and vowels. Benwing2 (talk) 03:23, 8 August 2021 (UTC)[reply]
Support It's about time; those entries really just take up space and don't provide much benefit. Just because they have a Unicode codepoint for some reason doesn't mean that we should have an entry for all of them. If anything, they should be under the multilingual header anyways. AG202 (talk) 13:55, 6 August 2021 (UTC)[reply]
So restore tham as multilingual? Presumably it's the ones that coincide with words and morphemes that are the resource drain - the nonsense syllables surely chiefly cost for backup operations. "Remember, Wiktionary is not a paper dictionary." --RichardW57m (talk) 13:23, 9 August 2021 (UTC)[reply]
SupportAustronesier (talk) 14:05, 6 August 2021 (UTC)[reply]
Support — Fenakhay (تكلم معاي · ما ساهمت) 23:23, 6 August 2021 (UTC)[reply]

Another important point I missed is that {{character info}} already contains the entire information in {{ko-defn-hangul}}. So bot-removing all these non-entries will not even result in a loss of information as long as there are other etymologies for the word.--Tibidibi (talk) 06:46, 9 August 2021 (UTC)[reply]

So are you suggesting that someone looking for a decomposition should use a sandbox? I don't see how else one would get access to the decomposition. Of course, {{character info}} could always be optimised to omit Hangul syllable blocks, could it not?. I suppose if you can find a word containing the syllable, one can always interpret the transliteration, can't one? --RichardW57m (talk) 13:23, 9 August 2021 (UTC)[reply]
SupportOmgtw15 (talk) 03:53, 11 August 2021 (UTC)[reply]

Implementation

[edit]

@Suzukaze-c, Benwing2 A week has passed with overwhelming support (8 Support to 1 Oppose), including both of the two regular Korean editors. Could a bot operation be done for this?

I believe there are four tasks at hand:

  1. Delete all entries where the only headword template is {{ko-syllable-hangul}}. I assume this is the simplest. Examples: (gwael) and (myak).
  2. Delete Etymology 1 where the headword template is {{ko-syllable-hangul}} and there are multiple other etymologies, and renumber the Etymology sections accordingly. This is probably trickier for the bot. Examples: (eung) and (gong).
  3. Delete Etymology 1 where the headword template is {{ko-syllable-hangul}} and there is only one other etymology, and reorder the header hierarchy. This also sounds potentially tricky for the bot. Examples: (jing) and (kkeom).
  4. Some entries are extremely poorly formatted and cannot be fixed automatically. Examples: (eop). As apparently the only regular native Korean editor left, I will fix these manually.

If category (1) can all be deleted by this weekend (hopefully this should be quite easy), it might turn out that (2), (3), (4) are few enough to be manually addressed. Right now I'm not sure exactly how many entries belong to categories (2), (3), and (4). It seems like the clear majority of the 1,215 entries that need fixing belong to category (1).--Tibidibi (talk) 13:21, 13 August 2021 (UTC)[reply]

Also pinging @Erutuon as a botter.--Tibidibi (talk) 13:22, 13 August 2021 (UTC)[reply]
  • Comment:
I keep seeing statements in various contexts here at EN WIKT where the poster is already quite familiar with the subject matter at hand (such as hangul in this thread, or the relationship between Ingrian and Proto-Finnic in another recent thread). The people making such statements seem to have lost sight of the fact that our EN WIKT readership can only be safely assumed to understand written English.
Case in point: before I learned hangul, I thought that each of the composed hangul glyphs represented an independent syllable in some complicated fashion, as a very large syllabary. I had no clear idea that each glyph is just a composite of individual letters. Were I to try to parse a Korean text in such a state of ignorance, I might select a single "syllable" (the selectable glyphs making up a hangul text) and try to look that up. I would have no way of decomposing the glyph into the individual jamo (letters).
Other users with a similar lack of familiarity with hangul might do the same thing.
Considering 1) the lack of any apparent harm from having these entries, and 2) the demonstrable gain in usability from having these entries (at least, for those just learning about written Korean), I lean strongly towards keeping these.
@Tibidibi, the above thread has gotten chopped up in strange ways and various statements are missing any sig, but I think you are the strongest proponent for deletion. Can you better articulate what harm we suffer because these entries exist? ‑‑ Eiríkr Útlendi │Tala við mig 21:38, 13 August 2021 (UTC)[reply]
@Eirikr, the key issue is the same as you raised a while ago when you criticized somebody for making an entry with no definition but {{rfdef}}. A large number of these non-entries are actual words, some actually quite common in the language, for which no entry has been made. When readers click a bluelink, or search for a keyword and find that there are results, they expect to see definitions. They do not expect to see what effectively amounts to a redlink dressed up as an entry. In fact, these are actively worse than the definition-less entries you criticized, because the latter at least had pronunciations and parts of speech.
For sections where there are real entries for words, the issue is the simple waste of space. Why should readers have to go through an entire header section just to find information which is so basic that it is already duplicated in {{character info}}?
It is true that these could be of marginal utility to readers who do not know Hangul at all. But I find it highly unlikely that any such people would be trying to parse Korean text in the first place. In addition, I do not think it is appropriate to cater to a userbase that does not know something taught on the first day of Korean class! In that sense, I find this very different from the relationship between Ingrian and Proto-Finnic; a fluent Ingrian speaker could go their entire life without ever having heard of Proto-Finnic, while the nature of Hangul blocks is something that all Korean learners will know from Week 1, and even a large number of people who have never actually studied the language itself.
To give another analogy, I find this equivalent to adding the Usage note that "The capitalized form of "word" is "Word". In English, initial capital letters are used at the beginning of sentences, to mark proper nouns, and sometimes for emphasis. The fully capitalized form of "word" is "WORD". In English, full capital letters are used in abbreviations, or in Internet slang to mark shouting." to every single English entry. This would also be useful to English learners, but clearly this is overkill. And the composition of Hangul syllables is far, far more basic and simple than English capitalization rules!--Tibidibi (talk) 01:28, 14 August 2021 (UTC)[reply]
  • @Tibidibi: "I do not think it is appropriate to cater to a userbase that does not know something taught on the first day of Korean class!" --> What if someone never had a Korean class? Indeed, before I had studied any Korean, I seriously did not know that a single "syllable" glyph was composed of multiple simpler jamo. You assume too much about our readership, I think.
One key difference between Korean syllable blocks and arbitrary strings of English letters is that English letters are still individually selectable. An English-reading person (our very audience) can tell that there are individual characters in any such string of Latin letters. Or indeed Cyrillic letters, or Greek letters, etc. as the encoding (on most modern systems) still allows a user to select individual letters. An English-reading person cannot select an individual jamo from within a composed syllable block -- they can only select the whole block. If we have no entries for composed syllable blocks, users have no means of looking up these individually selectable textual units. We have entries for other individually selectable textual units in other scripts, so why not for Korean?
Re: "waste of space", Wiktionary is not paper, so that's not really much of a concern. ‑‑ Eiríkr Útlendi │Tala við mig 04:30, 14 August 2021 (UTC)[reply]
@Eirikr You cannot select individual letters in Hindi, Thai, Tibetan or any other abugida language; you can only select syllables. Yet we don't enter every Devanagari etc. syllable into Wiktionary. The only real difference I can see between Korean and an abugida language is that Korean syllables have special treatment in Unicode, which (as pointed out by User:RichardW57), was a purely political decision to get the Korean delegation on board. Benwing2 (talk) 04:50, 14 August 2021 (UTC)[reply]
@Benwing2, Eirikr, Tidibid: That remark on selectablity is untrue, and as a policy statement merits banning. What is true is that it is difficult to select a letter without its accompanying combining marks. For Thai, script-independent editing generally treats spacing marks as letters, and Thais like it that way. An attempt to prevent selection of Thai letters by changing the Unicode classification was met with howls of protest, and the change was rescinded. Until recently, selection by Unicode default graphemes was generally available even for Devanagari - it may still be generally available where one can find the editor's customisation menu. --RichardW57 (talk) 15:59, 15 August 2021 (UTC)[reply]
@Benwing2: The Thai graphic unit is the syllable, which includes the final consonant and even consonants beyond (การันต์ (gaa-ran)) it.
@Benwing2: We ought to allow what people consider as letters. There is a list of strongly backed candidates for letter status in the file NamedSequences.txt in the Unicode Character Database - it includes all the possible Tamil syllables. You'll notice that it also includes the subjoined Khmer consonants. --RichardW57 (talk) 15:59, 15 August 2021 (UTC)[reply]
@RichardW57 I don't know why you're accusing people of being "overwhelmingly dishonest" and suggesting that I be banned merely for making a statement you disagree with, but it reflects very badly on you. Benwing2 (talk) 19:28, 15 August 2021 (UTC)[reply]
@Benwing2 I used the term dishonesty because what was presented wasn't quite a lie. Perhaps I was lily-livered when I suggested that someone should merely be banned if they tried to make it difficult for people to edit text in their own language, e.g. by deliberately preventing the selection of parts smaller than an orthographic syllable - 6 characters in an orthographic syllable is not unusual in some languages. I find it horrifying that there are people who regularly edit text in their own language in (Latin) transliteration rather than in their own script. --RichardW57 (talk) 20:09, 15 August 2021 (UTC)[reply]
The fact remains we don't include all syllables in all abugida languages, and I would be strongly opposed to doing that. Whether Unicode includes a particular list of syllables in their database is irrelevant for Wiktionary's decisions. Benwing2 (talk) 19:28, 15 August 2021 (UTC)[reply]
@Benwing2 I would have hoped that we would at least ask why Unicode has done what it has done. I see we do at least allow the whole of the Welsh alphabet - or is the toleration of letters like ph an oversight? --RichardW57 (talk) 20:09, 15 August 2021 (UTC)[reply]
That's... already been explained to you how that's different. In Welsh, "ph" is a letter in the alphabet, similar to or in Korean, or gb in Yorùbá, and are actual relevant lemmas and have an actual usage. That's an entirely separate thing from allowing random combinations like 쫹 in Korean which don't exist whatsoever in the language. Please please don't derail like that, and if you don't have that much experience in the language, please give the space to others that do and know what they're talking about. AG202 (talk) 03:06, 16 August 2021 (UTC)[reply]
  • @Benwing2: (after browser failure ate my previous attempt at replying)
For Hindi, Thai, and Tibetan, it is true that a user can only select a combination of [CONSONANT] + [VOWEL]. However, I find that I can use Backspace to remove any additional vowel diacritic and thereby get just the consonant (with its inherent vowel).
For instance, I can select Hindi या (), and then after copy-pasting into the search bar (or text editor, etc.), I can press Backspace to yield just the consonant glyph (ya). Likewise for Thai and Tibetan.
However, for Korean, pressing Backspace deletes the entire syllable -- it is not possible to decompose a Korean syllable glyph into any portion of its consistent characters (at least, not on Windows, not without specialized tools). Take, for example, (sip) -- pressing Backspace just deletes the entire glyph, leaving me with nothing. ‑‑ Eiríkr Útlendi │Tala við mig 19:04, 20 August 2021 (UTC)[reply]
@Eirikr If people do not know anything about Korean, they are not the intended audience of the Korean entries on Wiktionary. Every entry for every language on Wiktionary makes certain assumptions about the readership—again, like the capitalization in European languages. Explaining the composition of Hangul syllables is like having a usage note on every German noun that German nouns are always capitalized, or giving a "combining form before suffixes" for every language written in Arabic script.
In any case, @Erutuon has enabled {{character info}} (which already contains all relevant information) even for nonexistent pages, so the issue seems to have been resolved; what we're now left with are the superfluous separate etymology sections.
The space we're talking about is the length and visual formatting of a single page. In that context, it is indeed a concern that redundant and (for the vast majority of readers) irrelevant information always takes up the most prominent part of the page.--Tibidibi (talk) 04:53, 14 August 2021 (UTC)[reply]
@Eirikr, Tibidibi Another issue that should be mentioned is that these syllables are being categorized in CAT:Korean lemmas, which clogs the category with non-words. Benwing2 (talk) 18:21, 14 August 2021 (UTC)[reply]
@Tidibidi: For which a sane answer, given above, was to classify them as something else.
  • @Tibidibi: (after browser failure ate my previous attempt at replying)
I cannot agree with your contention that "If people do not know anything about Korean, they are not the intended audience of the Korean entries on Wiktionary." Our targeted audience is English-language readers. We cannot assume knowledge past there. By insisting that readers already have knowledge of other languages, we erect barriers to learning, and we make Wiktionary harder to use.
If someone stumbles across a piece of text in a language they don't know, and they come to Wiktionary to look it up, they should be able to find something. I think this is even more important for languages written in scripts other than the Latin alphabet.
Thank you for clarifying what you meant by "space". I share your concerns about data duplication on an individual entry page, and about usability / readability of a page's layout. As a means of simplification, I think @Erutuon's new templates are a very good move: editors no longer have to concern themselves with entries for individual Korean syllable glyphs, and readers will now find something when searching for the smallest selectable portion of a Korean string. This is a positive improvement in my view, and a net gain in usability and discoverability for very little effort or impact on the editor community.
@Erutuon, would it be possible to extend your templates to cover all non-Latin scripts?
I can understand and sympathize with editor concerns about proposals that demand additional work from editors. I am in favor of methods to improve usability and discoverability that also entail minimal impact on editors. Conversely, I cannot understand editor opposition to methods for improved usability and discoverability, when those methods have little impact on editor work.
To be clear, with the new templates, I have no opposition to the removal of the individual syllable-glyph entry pages. I am concerned with how we construct entries, and ensuring that we are not assuming too much specialist knowledge -- it is easy to lose sight of the difficulties that beginner learners face, the more intimately familiar we become with the languages we work on. I do not want Wiktionary's default state to become exclusionist, not towards terms, but towards users. ‑‑ Eiríkr Útlendi │Tala við mig 19:04, 20 August 2021 (UTC)[reply]

With all the mentions of {{character info}}, I thought, why not show it on nonexistent mainspace pages with only one character in the title? So I added it to MediaWiki:Newarticletext and MediaWiki:Noarticletext. So now people can see the letters that Korean syllables are made up of, even if there isn't an entry. If this is a bad idea, it's easy to revert. — Eru·tuon 02:46, 14 August 2021 (UTC)[reply]

It's an excellent idea. --RichardW57 (talk) 15:59, 15 August 2021 (UTC)[reply]

@Eirikr: What templates do you mean in "extend your templates to cover all non-Latin scripts"? If you're referring to {{character info}}, it generates information on any single code point, regardless of script. — Eru·tuon 19:27, 20 August 2021 (UTC)[reply]

@Erutuon, Eirikr: I think Eirikr is referring to the message (?) MediaWiki:Newarticletext, and its sister, having overlooked that the change already applies to any new entry consisting of a single Unicode scalar value. --RichardW57 (talk) 21:06, 20 August 2021 (UTC)[reply]
  • (after edit conflict) @Erutuon: Apologies for my apparent confusion. I meant, whatever it was you did so that the character info shows up, even on non-existent pages like . Perhaps that already works for all scripts, and nothing more is needed?
From Richard's comment, it seems that this already supports all scripts, so everything is copacetic. ‑‑ Eiríkr Útlendi │Tala við mig 21:11, 20 August 2021 (UTC)[reply]
@Eirikr: That's right. It supports single code points of any script. Probably nonexistent pages should also show information on grapheme clusters consisting of multiple code points like पा (DEVANAGARI LETTER PA, DEVANAGARI VOWEL SIGN AA), because (as someone pointed out, maybe you) sometimes, like in this text box, people can't select the individual code points to paste them into Wiktionary. Right now we don't have a module with grapheme cluster code, but I could get it or try to write it. — Eru·tuon 22:48, 20 August 2021 (UTC)[reply]

@Benwing2 another week has passed, now with 8 Support to 2 Oppose. Would it be possible to go forward with this?

I have done some experimentation, and it appears that type 1—entries where the only headword template is {{ko-syllable-hangul}}—are actually quite few in number. Would it be possible to bot-remove types 2 and 3—entries with {{ko-syllable-hangul}} as Etymology 1, along with other Etymologies—as well, so as to cut down on the number of entries to be manually fixed?

These are the kind of edits that need to be made, consisting of four parts: moving {{ko-symbol-nav}} to the top of the page just below {{character info}}; removing the rest of Etymology 1; reordering etymologies accordingly; and reordering the Pronunciation header in single-etymology entries to below the Etymology section.--Tibidibi (talk) 15:37, 19 August 2021 (UTC)[reply]

@LittleWholeSuzukaze-c (talk) 23:51, 26 October 2021 (UTC)[reply]
@Suzukaze-c Huge thanks for the heads up, I don't go off of the main dictionary part of Wiktionary much so I had no idea about this. LittleWhole (talk | ) 21:41, 28 October 2021 (UTC)[reply]

List of syllables

[edit]

hastemplate:"ko-syllable-hangul" -"etymology 2"

휴먀먜쁴삐머쁘흐퀘큐킈퀴극릉혜쿠료크뾰흘름쿼륵쀼릐몌르뿌갂뻬쀠윤갇갗셔냬휘갛힝뾔갚같힌긔셰쿄갖갃뼤힉뾵갘힕페삫뿁갆릇힡갅흩갋힢갊힠갌갎릅뿋힙갍뿀릎흪뾸뿄뾼뿂뾺렬삠뾲뿉뿃뾽힋먬먫륹릁흕흨삔뾷먷뿅뾻뾿뿊먝먶먦힜힒삙삜삣힞힍힗먣먢먴먥먮뾴뾹힎먲먤릃힊힟흟티른릍삗삪삛흣뾳삝삞힏힑흚삒삡삟뿈힛힚릈힣뿇뿆삦삖릆뾱륶흢삤힖먪퇴몯쀄먩삧뾶릌릋샤흗먭힔먯먡삑삚륽릀삓삩먱먠먞먨릏릊삨삕흧먳먰먟뿨뾾륷릂흝삢흒흖흜흫먧먵힓싀륺륿륻륾흤흓뼉흦흛흞빠쾨긍뼝뼌워뫄뼋긴킬뫃솨먘클갯뎌몬갬갠퓨뼣숴킵뺴죠뼚훼뺘뼐뼢뼞뼠뼡뼒졔끠뼛뼊뼏뼟뽜먑킨즤먓킥뻐뼜뼑뼙뽀뫂뎨렵먐갤먗픠뫁몾킷쳬갵큼몹먈킽쿅쿟쿕쿗닝갣먚큿듸먙쿈쁜쿙닣킦쿌쿔듀먛큉킄큇킿쿋펵센삏쁠쁙쁶몼쟈됴싱쁑갷갲볘늑먁킻킅퀭쿝쿚펼삍쁳삂쁃몪굉닥먕킁큄퀻퀵킾퀑폅쁨쁹쁦찌드갶닏몺먖몄큭킴킫큽쁸삉쁷몿뫀갴먔쿜쿞쿛앆먒첼킃퀜퀃퀌쁚쁗몀훠쁞닛섀먇쳈큋퀙궈쵸펻삎쁺쁻쁩쉬닌쳑큳킂큍퀩퀳퀍퀄쁼쀽쁍쁖큅킆퀀퀗톄펫쁝삄멷퀼큏퀠쿽퀫퀨븨뛰몃쳏킼퀮퀯삅닞닿닜즈얶빔쁵쁕몋첸킺큘싶솧쁛큊큣퀰퀒엊쁟쁭쁧쁓큌퀖퀏퀓쿤쳋프츼얻쁄쁏쁌갳닫몆겅걱큠킇퀟곥댸퓌튜촤랴붜쌔곪큔큥킐펨덤닼햬챼펩쁔쎄긱괉몁쳄걷큑큨뀨 끄괈곩폏넏쁒닚쿡틔휵쿵섹쀟뵤버첻킝킌킘큎냑뮤킙펭싣펙믜펟녈쀘쀗뽈킛곭푸폊폋엳쳘큗킞큦쿳쿧풰깅펲냔닽쳎탇탛쵀펶뽸괵뜨몂큧쿰꿔뱌쳡킟큡폍솅넊넣쀈딩괋괊햐쀆냘쌰씌긃괗괠쿱릭쀔쯰낀섿쀋쀌놰텨솓쀙뽁첶쿻쑤킉쀅늗뱨쏘쀡쀻넁뽇괧툐쪼쮸뒈뗘띄핳셉넡냠넇냫흳괟냽튀쯔돠럐뭬쭤꾜쐐덩슥빌팜셱넢넠힁쳗킏퍠츄츠휺펳꼬꾸뛔뚜뤼흭댜좨뀌떄뚸뷰쒀므휸쳦탗뻑핟덛흼뚀쐬솆멕쓔뽐뽄긂퓐캬쬐쩌꺄꺠꿰트탸콰쫴쬬꼐뤄늡뻗뻠냳냡냗덥힇쒸픋괩핏쳨훟퉈챠쭈퉤쫘픙퓝멓덯늫흴탖푀톼켸픅딨딧괨냰솇쀰냭썌쑈줴똬뮈뽛맇퇘쪠뜌뗴뵈롸뢔괳퓔홒홐릳뻘솋멤멛멈끿휷휶핃쮀쨔뙈뙤걀컈땨쀵봬쎼낂딙턔뫠쩨붸셲멯렌풔낁픓딚퓒퓙쀱쀨멩쎠뤠끊졌퐤퐈쮜쨰끾렉괦훋칢뱃밷뺙셷쀧냼냴뻼몜릞냅픒뿧뿓룘룔틱랙뺻굫뻳뺭몓몡퓚겐뼁먿뭊뺵뿯룙릫곈뻉갼뼇횃뻄뺠뺟렜릥뺼몧셨뇄픡홷씼뮥샨릗텐횸뽿쀃뽣펃튽뿩젿뛙뜧뮦읜냂뛮굔픻곊뽷뢨땽뜎욚뾓뾯뢣귁깞얬컫톈뾛햆꿿뤯뗶

From those syllables, entry content other than "Syllable"

[edit]

as determined with https://en.wiktionary.org/w/index.php?title=Module:sandbox&oldid=64532174

	["갼"] = "9",
	["걀"] = "====See also====\n* {{ko-l|달걀||egg}}\n",
	["굫"] = "# {{ko-defn-hangul|ㄱ|ㅛ|ㅎ|rrt=gyoh|yr=kyoh}}",
	["긂"] = "Used in the noun form of {{l|ko|동글다}}, {{l|ko|둥글다}}, {{l|ko|영글다}}, etc.",
	["뇄"] = "Used in the past-tense forms of {{l|ko|되뇌다}}.",
	["렜"] = "Used in the past-tense forms of {{l|ko|누렇다}}, {{l|ko|둥그렇다}}, {{l|ko|설레다}}, {{l|ko|퍼렇다}}, etc.",
	["뢨"] = "Used in the past-tense forms of {{l|ko|사뢰다}} and {{l|ko|아뢰다}}.",
	["셨"] = "Used in the past-tense forms of verbs and adjectives when {{l|ko|시}} is followed by {{l|ko|었}}.",
	["씌"] = "===See also===\n* {{ko-inline|씌다|ssuida}}",
	["얬"] = "Used in the past-tense forms of {{l|ko|뽀얗다}} and {{l|ko|하얗다}}.",
	["읜"] = "# {{ko-defn-hangul|ㅇ|ㅢ |ㄴ}}",
	["칢"] = "Used in the noun form of {{l|ko|거칠다}}.",
	["컫"] = "Used in the verb {{l|ko|일컫다}}.",
	["휘"] = "{{ko-IPA|l=y}}",

Nothing worth keeping here. These entries can be deleted. —Suzukaze-c (talk) 02:50, 5 November 2021 (UTC)[reply]

@Erutuon, Surjection — could you guys help with deletion as admins? —Suzukaze-c (talk) 02:58, 5 November 2021 (UTC)[reply]
@Suzukaze-c: Sorry for the delay; done at last. — Eru·tuon 18:01, 18 November 2021 (UTC)[reply]
@Erutuon: Great, thank you! —Suzukaze-c (talk) 01:13, 19 November 2021 (UTC)[reply]

Vote to prioritize definitions in entry layout

[edit]

I'm looking for feedback on a vote that's long overdue: Wiktionary:Votes/2021-08/Prioritizing definitions. Ultimateria (talk) 17:35, 5 August 2021 (UTC)[reply]

Feedback:
  1. I'm not sure about the treatment of Pronunciations. My personal feeling is that they are as important as, if not more than, the definitions, especially for learning a foreign language. Also, in most dictionaries, they come before definitions.
  2. Is there an estimate about the workload if the vote passes? That would include the time to develop, test, run, and maintain the bot(s), I presume. How will irregular entry layouts be dealt with? I feel that some discussion about implementation would be helpful for votes to make their decisions.
--Frigoris (talk) 17:52, 5 August 2021 (UTC)[reply]
Some pronunciation sections are very large and may take up a significant part of the screen estate, so having them before definitions is not a good idea unless there's some way to reduce them down in size for all languages. Placing the pronunciations any higher may on the other hand cause issues if there are multiple parts of speech under the same entry, because you might get something like
==Language==

===Noun===

====Synonyms====

===Pronunciation===

===Verb===

====Synonyms====
which is not very intuitive. — surjection??18:02, 5 August 2021 (UTC)[reply]
Let us take the Chinese entries as example. For Western learners the pronunciations are possibly more important than definitions, since the script can be far less phonetic than their own familiar ones. The creators of {{zh-pron}} has put a lot into balancing space economy with information content; there are foldable components that unhides the rich information upon click.
At least with Chinese, it would've been way less intuitive if definitions come before pronunciation, since for multi-reading terms, a reading can govern a related cluster of meanings that as a whole is fairly separated from the other reading(s). --Frigoris (talk) 19:08, 5 August 2021 (UTC)[reply]
On (shuǐ), {{zh-pron}} takes up more than a single page's worth of screen estate on mobile with default settings. If our plan is to have definitions (the most important part in dictionaries) first, that is simply not ideal in its current form. — surjection??19:12, 5 August 2021 (UTC)[reply]
If we confine ourselves to just changing the level/location of the Pronunciation headers, the alternative would be either to invade the space right above the next language header, Japanese, causing a large break in front of it, or for multi-reading Chinese terms to be internally sparsened by the possibly large {{zh-pron}}s. The solution, as it seems to me, would have been a better way to present the zh-prons for compactness so that the problem is minimized, rather than shifting them around elsewhere. --Frigoris (talk) 19:20, 5 August 2021 (UTC)[reply]
Right-floating has absolutely zero impact on mobile. It'll still take up the same amount of space as it used to. Having pronunciation info right at the bottom is not ideal, but I'd argue it's less bad than having a massive block of pronunciation info drown out the definitions on a page to the point users get tired and don't even bother. The alternative is making all large pronunciation sections (more than half a page long on mobile) collapsible. — surjection??19:35, 5 August 2021 (UTC)[reply]
Do we have to have the headings that take up an entire line? Most paper and online dictionaries make much better use of space by putting various things into paragraphs: dog, noun, /dOg/, a barking animal; .... Equinox 18:05, 5 August 2021 (UTC)[reply]
This is a very good point. Maybe someone could compile an example entry also for this. Allahverdi Verdizade (talk) 11:26, 6 August 2021 (UTC)[reply]
Yes, I think we should discuss this (even if only in a separate vote); our current setup has a lot of excess whitespace, especially on mobile, because a whole line is devoted to "Noun" and then a whole nother line to "dog (plural dogs)". But figuring out how to keep the table of contents coherent and usable if we change that is an issue... - -sche (discuss) 19:15, 6 August 2021 (UTC)[reply]
It would be good to have some example pages that demonstrate the new layout. DTLHS (talk) 18:52, 5 August 2021 (UTC)[reply]
@DTLHS: I've made a longer English example here and a shorter Spanish example here. Ultimateria (talk) 01:43, 6 August 2021 (UTC)[reply]
Thanks. Probably the translations / derived terms should go under the POS instead of the etymology? DTLHS (talk) 03:10, 6 August 2021 (UTC)[reply]
@DTLHS: I considered it but it puts etymologies really low in long English entries. I think it's important to put etymology before those headers because it relates to the word directly, and lists of other words do not. You can see at the English mockup that the most relevant information for the term itself is found between Pronunciation and Etymology, then you have all the rest, then you have Entry 2. The new structure just needs some extra part-of-speech labels. Ultimateria (talk) 17:03, 7 August 2021 (UTC)[reply]
@Ultimateria: Other headers like Synonyms, Antonyms, Hyperonyms,... also divert away from the main entry. Not sure of the distinction you're making between these and the Derived and Related terms to justify placing the Etymology section inbetween there? Sitaron (talk) 17:47, 7 August 2021 (UTC)[reply]
@Sitaron: It is distinct; synonyms and antonyms can help you understand a definition, but knowing e.g. that "breastplate" and "license plate" are terms derived from "plate" doesn't inform your understanding of the term "plate". Plus, the current trend is to nest synonyms et al under definitions, which could work for translations but not for derived/related terms or descendants. Ultimateria (talk) 22:20, 7 August 2021 (UTC)[reply]
OK, but putting derived terms and translations as a subsection of the etymology section makes no sense- these have no relationship to each other. Furthermore it increases the burden of disambiguating the derived terms with labels where previously they were automatically associated with a particular part of speech. DTLHS (talk) 02:31, 8 August 2021 (UTC)[reply]
They wouldn't be subsections of Etymology, they would also be L3 headers (in single-Entry pages). I think extra disambiguation is a small price to pay to cut right to the definitions while keeping etymologies relatively high up. Ultimateria (talk) 06:09, 8 August 2021 (UTC)[reply]
  • This is why ninjawords was invented. Maybe we could put a banner on the front page like if you just want definitions for English, without all the etymology, pronunciation, translations and other foreign-language crap, use NinjaWords. Not sure if faff or bullshit would be a better term than crap, though...... Queenofnortheast (talk) 19:03, 5 August 2021 (UTC)[reply]
  • I can't say I'm really psyched about any changes to the headings order, I prefer the way it is now, but I guess that's to be decided by the vote. Thadh (talk) 19:47, 5 August 2021 (UTC)[reply]

I don’t see any reason to change the order of information; since, as recognized, the importance varies between languages. Also, between reconstructed and attested language. It makes not much sense to go, as in the vote, alternative forms—alternative reconstructions first and then very far away etymologize—where is “reconstruction notes”? Oh, that’s even farer on top, with another mass of things between, oddly “glyph origin” yet farer above though it be etymology of the sign.

Wiktionary does not make use of space horizontally well. Nowadays 4K monitors are standard, often multiple, so I see a lot of white space on the right. If Wiktionary really wants to take a step forward then it may switch to using multiple columns. Anyway I wanted to have IPA beside the transcription schemes in Arabic and Persian entries – better though switchable inflection tables –, as employing even any room in the vertical space for pronunciation sections is wasty and inefficient.

But I doubt the bottability of any proposal. It’s all a waste of manhours, for the order being ever debatable. Fay Freak (talk) 21:34, 5 August 2021 (UTC)[reply]

  • Support moving etymologies below (but above translations, as per @Geographyinitiative). Oppose moving pronunciations so far below, per @Frigoris, @AG202. At the least they should be above Conjugation, even if we do bring them below the definitions.
Alternatively, languages with simple IPA templates could have them on the headline template and have a separate header for audio files and phonological trivia at the bottom, like fr.wikt does. This allows true "Pronunciation" headers to be reserved for languages that actually need them, like Chinese.--Tibidibi (talk) 00:49, 6 August 2021 (UTC)[reply]
Why are you man presupposing that etymologies aren’t written for multiple parts of speech? Many things would have to be written anew. Fay Freak (talk) 00:55, 6 August 2021 (UTC)[reply]
Thanks for the tag! Yes, like I said in Wiktionary talk:Votes/2021-08/Prioritizing definitions, the change for the Pronunciation section doesn't make that much sense to me, and if anything I'd prefer the fr.wikt solution if any change has to be made. Support the change to etymologies as long as it's above translations (though I do appreciate them at the top as a linguist), but oppose the change to pronunciation unless something else is proposed. AG202 (talk) 02:05, 6 August 2021 (UTC)[reply]

Update: After reviewing the feedback here, the talk page, and on Discord, I've kept the Pronunciation section at the top of the entry, as the only section before part of speech headers (and renamed the vote accordingly). I recommend seeing the new order in action at User:Ultimateria/alt entry layout, and where the change is less stark at User:Ultimateria/alt entry layout 2. The focus of the vote is now to continue grouping pages by etymology while moving the etymologies themselves out of the way of definitions. You can see in my first mockup that definitions are in clearer focus and that definitions, derived and related terms, and translations form more cohesive sections. Ultimateria (talk) 03:35, 6 August 2021 (UTC)[reply]

I will vote yes on the proposal the way it is formulated now. Allahverdi Verdizade (talk) 11:26, 6 August 2021 (UTC)[reply]
I don’t see the advantages in the examples. One isn’t even prioritizing definitions since pronunciations come first. Even for languages where pronunciations is frequently unguessable, I don’t see why this should be so. I can also know English words and their meanings without knowing how to pronounce them. Or Chinese characters. It could be that one is only interested in writing—in fact I have never spoken in English in my life, only written it. A great way to learn languages is treating all like dead ones, like one learned Latin. But here you are even prioritizing the recently controverted pronunciation sections of Latin. And there are more headings than before (Entry N). Fay Freak (talk) 04:28, 8 August 2021 (UTC)[reply]

Should we not explictly allow for pronunciation and etymology to be shared by all the entries on a page? Perhaps some general words can be found. For Lao-script Pali, it is definitely the case that what have different etymologies may have different sets of alternative forms. --RichardW57m (talk) 12:44, 6 August 2021 (UTC)[reply]

Pages with just one entry have one Etymology (L3) section, and pages with multiple entries need an Etymology (L4) for each entry. Pronunciation is a L3 header above definitions by default, and pages with multiple entries may have L4 Pronunciations under each entry if they don't share pronunciations. As for Alternative forms, they will come shortly after definitions, which is already an option. The parts about Etymology and Alternative forms are already covered in my proposed changes to WT:ELE, but I can add a sentence to the Pronunciation section about what to do in multi-entry pages. Ultimateria (talk) 16:30, 6 August 2021 (UTC)[reply]
Okay, I've added it. Ultimateria (talk) 17:20, 6 August 2021 (UTC)[reply]
I could only find it in the complex layout example. Apart from that, I can only find pronunciation as part of a numbered entry section. --RichardW57 (talk) 02:26, 8 August 2021 (UTC)[reply]
Alternative form sections can be quite long - one for each of about 30 scripts for Sanskrit. That gets unwieldy if there are half a dozen entries. --RichardW57 (talk) 02:26, 8 August 2021 (UTC)[reply]

──────────────────────────────────────────────────────────────────────────────────────────────────── Personally I am fine with requiring the alternative forms be placed below the definition but I somewhat like the current system with etymologies placed above. This is consistent with many print dictionaries that I remember consulting in the past, which tend to list entries somewhat like this: headword /pronunciation/ [etymology] 1. definition 2. definition 3. definition etc. Benwing2 (talk) 03:30, 8 August 2021 (UTC)[reply]

I think we need to step back and look at the big picture: Mediawiki is designed around headers. That means that if you want to divide things into sections, you have to stick a big fat piece of text at the top of each section, with the biggest, fattest piece of text first. In other words, organization is always prioritized before content.
I'm not sure how easy it would be to implement, but I think we should concentrate on minimizing the headers and shifting as much as possible to footers. The tricky part, of course, is designing the appearance so you know what section you're in without a grand megalith at the top to make it obvious. Chuck Entz (talk) 05:31, 8 August 2021 (UTC)[reply]
What do you mean by "footers"? Ultimateria (talk) 06:13, 8 August 2021 (UTC)[reply]
A section footer: something that shows that you're at the bottom of the Noun, or Pronunciation, or Etymology 1 section. Chuck Entz (talk) 06:51, 8 August 2021 (UTC)[reply]

Wonderfool again: splitting a, o

[edit]

Wonderfool is trying to "fix" the Lua memory errors by splitting pages like a, o into other-languages sections. This is introducing its own errors. We should decide now whether to revert these changes. Benwing2 (talk) 23:44, 7 August 2021 (UTC)[reply]

I have undone them and he has posted here: Wiktionary:Grease_pit#Worst_attempt_ever_at_saving_Wiktionary. Equinox 23:49, 7 August 2021 (UTC)[reply]

Read-only tomorrow

[edit]

Hello!

A maintenance operation will be performed tomorrow, Tuesday 10th August, at 05:00 UTC.

It will impact 17 wikis, and is supposed to last a few minutes. During this time, saving edits will not be possible. For more details about the operation, please check on Phabricator.

A banner will be displayed 30 minutes before the operation.

Please help making your community aware of this. Thank you! SGrabarczuk (WMF) 02:02, 9 August 2021 (UTC)[reply]

Quotation mark standardization

[edit]

I was directed to comment here prior to creating a WT:VOTES regarding style and policy. I have been informed that standardizing quotation mark style was previously proposed, with voting for straight, and voting for curly, but both proposals failed. I think this concern is worth looking into again. I was previously unaware that the two styles existed simultaneously on Wiktionary, and that there had been discussion/votes on the matter; I'm sure many were similarly left out.

While inquiring about variations in quotation mark display, I was directed to view Wiktionary:Style guide#Quotation marks. I propose that Wiktionary should adopt a standard style of quotation mark usage, either straight or curly, but not both. For a comparative reference, see w:MOS:STRAIGHT. I find the mixture of quotation mark styles distracting to read/edit. Wiktionary should adopt one style of usage for uniformity, with allowance for agreed upon exceptions as necessary. — CJDOS, Sheridan, OR (talk) 18:51, 12 August 2021 (UTC)[reply]

Seems pretty pointless to me. If there is a change, it probably should come from inside the WT community, anyway. But I'm just a single user with a single account, so am not speaking on behalf of the community. Wubble You (talk) 15:39, 13 August 2021 (UTC)[reply]
I disagree with what it says at w:MOS:STRAIGHT, i.e. "Use straight quotation marks, not curly", as far as it might be suggested to apply to this project. Curly quotation marks and apostrophes are more professional-looking. We should in the long term aspire to using them everywhere, with sufficient software support. (Please don't bother to comment that I have not used them in this post.) Mihia (talk) 22:47, 17 August 2021 (UTC)[reply]
I would agree with that, Mihia, if they were as crisp as straight quotation marks/apostrophes. On a 15.5 inch notebook computer screen at 100% zoom, curly marks are very difficult to see in normal article text (maybe not as much of a problem with larger text). But, that's not the conversation I started—I'm only asking for consistency—decide on one style, and make it uniform across Wiktionary, because having a mix is distracting in both displayed and source text. — CJDOS, Sheridan, OR (talk) 05:25, 20 August 2021 (UTC)[reply]
How is this policy enforced on WP? Is there a quote checking bot to replace typographical quotes?–Jberkel 23:28, 20 August 2021 (UTC)[reply]
I don't know specifically which bots perform this task, but yes, bots will and do replace curly marks with straight marks in the course of other Wikipedia edits. I don't believe the bots exclusively seek out curly marks, as there are circumstances that allow for them per w:MOS:CONFORM. I usually only notice when editors have performed the corrections, whether entirely manual, or assisted. — CJDOS, Sheridan, OR (talk) 05:28, 24 August 2021 (UTC)[reply]

See the trees in the etyma forests

[edit]

We already categorize affixes by their distinct types using |id=, as in Category:Middle English words suffixed with -ly (adverbial).

In contrast we have |from= in definition lines using {{given name}}. It causes the categories Category:Arabic given names from Coptic and Category:Arabic male given names from Coptic to appear in Category:Arabic terms derived from Coptic.

The categorization I reproach not, but it is ironical and irreconcilable to vote to prioritize definitions but then include etymologies in definition lines. Indeed it can not be avoided sometimes for high-falutin figurative senses, but what concerns is the duplication. Shouldn’t |from= be deprecated? Its historical grounds is, as I under­stand it, not more than Wiktionary shirking from inclusion of foreign-lan­guage content in the unwoke 20·00s, to just mention the origin lan­guage instead of the origin term like popular out­moded paper dictionaries. But now the same item is in both Category:Arabic male given names from Coptic and Category:Arabic terms derived from Coptic—this shouldn’t be; I would like to see the actual number of Coptic loans in Arabic, Russian loans in English and so on, by reason that names of people and settlements do not work and are not to be viewed like generic vocabulary.

I conclude that we should to go farther and have a flag in etymology templates – {{bor}}, {{inh}} etc., but also requesting ones such as {{rfe}} –, to mark if a term is a toponym or an anthroponym. This would make any category Category:Requests for etymologies in langname entries much more workable. For the ety­mo­lo­gy of place-names is a different field than the etymology of common nouns and verbs. For Category:Requests for clarification of definitions by language there may be dif­fe­rent groups; for instance old units of measure are often given inexactly. One has to cate­go­rize them by {{rfdef}} at best under a particular ca­­te­­go­­ry of under­de­fined measures so someone can go through all of them and resolve them by help of metrological material. Fay Freak (talk) 21:07, 13 August 2021 (UTC)[reply]

Clarify what web pages count as "permanently recorded" for WT:ATTEST

[edit]

There's been a lot of discussion about this over the years (e.g. 1, 2, 3), which ultimately hasn't led to much clarification at all on WT:ATTEST. WT:ATTEST specifies that Usenet is acceptable but is silent with respect to web pages. It should be specific as to which web pages count as permanently recorded for the purpose of this policy and which don't.

As for what should count, I think that anything that can be stuck into the Internet Archive or WebCite should count. A 2012 vote to mention something about WebCite failed with a tie, 7 to 7. It has been alleged that Usenet articles are more durable than web pages on the Internet Archive, but I agree with arguments that the opposite is the case. The broader issue here is that in 2021, Usenet is vastly less popular than the Web, so the first clear written uses of new terms are going to appear on the Web. If we wait for words like sniddy, currently undergoing an RFV, to appear on Usenet or in print, we'll be waiting a long time before we can include words whose usage was clearly established years ago. —Kodiologist (talk) 15:21, 16 August 2021 (UTC)[reply]

A few observations.
  • Part of the reason for the three durable uses rule is to weed out things that a professional editor would not let by. Yes, I discriminate against lower registers. Tweeting lulz three times doesn't make it a word even if the Internet archive catches it. The online marketing department's latest coinage can be lost in the dustbin of history with no harm done.
  • There are words that pop up regularly on Twitter, in newspaper comment sections, and in similar contexts that we do want to keep. This is especially true of words that professional journalists and editors avoid because they are politically unsound. I don't have a formula to apply. We could be more liberal with "clearly widespread use."
  • I do not consider modern Usenet a good source. I would put a cutoff date around 2000-2010 as so many NNTP servers shut down and access became inconsistent.
  • Some web sites break all their URLs every few years, independent of whether they throw away old content. Some keep long term stable URLs. Some of those with long term stable URLs are on the Internet archive.
  • We might allow a stable web site to provide one of the uses without saying all three uses can be online, say a long term stable web site that allows archiving. In practice this is unlikely to matter very often.
Vox Sciurorum (talk) 17:27, 16 August 2021 (UTC)[reply]
Honestly, though WT:ATTEST exists, in practice, a bunch of people just use news articles online, and rightfully so, especially with more and more newspapers & magazines moving to online only, like Complex Magazine. Wayback Machine should be included, as the problem of websites being possibly removed is:
  • Extremely rare
  • Usenet, which is archived on Google has the same potential issue where someone could report something and ~theoretically~ have it removed
  • In the rare case where a website is removed, an RFV could just be filed for it.
It's really not that big of a deal, and if the website's real goal is to document all words in all languages, then it shouldn't be this restrictive in the digital age. Re: Tweets, informal registers and slang are recorded on this website, being marked accordingly, especially terms from Usenet that are way more restrictive & exclusive than some words that are being sent to RFV all the time (see how far we had to go with Mickey Mouse ring). So I don't see the big deal with looking at tweets, and if there's a concern about consistency, then the number of citations required could just be increased. AG202 (talk) 20:34, 16 August 2021 (UTC)[reply]
The problem with tweets is that I can go create a source to cite a word I think exists, whether it's legitimate or not. Not that many trolls are familiar with our CFI, but it could still be a problem. I could decide that lulz is a verb now and make three accounts to tweet "I lulzed at this video." P.S. We should still define lulz; I don't think anyone's claiming otherwise. Ultimateria (talk) 16:43, 18 August 2021 (UTC)[reply]
I see the issue with that, but that's why I think the number of citations could be increase and there could be other requirements. Tweets are used all the time with tons of different linguistic studies on vocabulary and words that people use, so honestly it's a bit weird that Wiktionary has that bar in the first place. When a word could be used on Twitter over ten thousand times over is barred from being included in Wiktionary, but a word that has only <10 uses in a fringe community on Usenet from decades ago can be included, is that not an obvious glaring discrepancy? AG202 (talk) 00:52, 19 August 2021 (UTC)[reply]
Good point; I'm not a fan of Usenet cites either. Ultimateria (talk) 17:24, 22 August 2021 (UTC)[reply]
archive.org: "How can I exclude or remove my site's pages from the Wayback Machine?   You can send an email request for us to review to info(ad)archive.org with the URL (web address) in the text of your message." So it isn't durably archived in general.
(The @ had to be replaced as an error appeared:
"Warning: Your edit appears to contain an e-mail address. Posting e-mail addresses here is not recommended; they will be viewable publicly, exposing them to the risk of spam.
If you have an e-mail address assigned to your Wikimedia account, you may use the link [[Special:EmailUser/{{subst:REVISIONUSER}}]] (copy the bracketed text below) to refer other users to it. Note however that only users who themselves have an e-mail address set can use this link.
If you understand the risks and wish to save the edit anyway, you may proceed again."
But when trying to save anyway, a captcha reappeared and then this message only popped up again.)
--18:08, 16 August 2021 (UTC) — This unsigned comment was added by 2003:DE:3720:3785:D99D:E2B0:ACB4:E34 (talk).
I feel otherwise about words like lulz. If you come across some arcane word in a scientific journal, it's likely to be a transparent compound or defined nearby. If not, it's likely to be a word that we can't help with: colubrid and Oligocene doesn't explain much to a random reader, with Wikipedia necessary for it to have any understanding. We could do better, but ultimately you don't need a dictionary, you need an encyclopedia, or in many cases, a textbook.
On the other hand, words like lulz are not transparent, not defined nearby, but can be usefully defined by us. If they do saturate the culture, like gay, it can be useful to have an early record. Even if it's just used for a shorter period of time, we function as a source for someone wondering what it means to have been pwned on Facebook or some game chat.--Prosfilaes (talk) 04:30, 17 August 2021 (UTC)[reply]
Just to basically repeat what I said at RFV. IMSO, we do need to modernise the attestation rules (or interpretation thereof) so as to allow some types of Internet content beyond the moribund Usenet, while at the same time having sufficiently strong requirements to avoid opening the floodgates to vast amounts of made-up crap, extreme ephemera, bad English etc., such as is found in Urban Dicktionary. Mihia (talk) 22:19, 17 August 2021 (UTC)[reply]
OP may not have seen the discussion Wiktionary:Beer parlour/2021/February § Durable archiving of half a year ago. Rather than giving “durable” a fake definition, I am still for allowing things “consistently appearing on the internet”, as I formulated it. I also said “a word needs only as much to be attested as it is specifically claimed to be used”.
Basically it has been recognized that for terms that reoccur on the internet, it is not that important that a page gets lost if later one gets as many pages with similar usage. A RFV discussion would be a snapshot investigation that at a given date an internet word was seen, more than as made-up crap or hoaxical. The admins of German Wiktionary deal with in about the same manner, it appears, when they deleted Robberich “male seal”, it being too much of a neologism to bear. And on French Wiktionary one just needs to afford proof at all. I believe on Russian Wiktionary they just quote stuff on the internet without even linking it because links in the quotations there are forbidden (not too strange if one knows how slyly their compatriots advertise), but they can’t have durability or use—mention criteria like the Anglos because it would exclude the bulk of Russian mat, and apparently they fare well therewith. Fay Freak (talk) 17:22, 18 August 2021 (UTC)[reply]
We seem to be, in the course of the recent months, discontented weekly by the hitherto written attestation criteria. Faster than I had thought though, I have managed to write a new version of the central vexation in the CFI, attempting to do justice to the last insights (you may of course say they are mostly mine, that is it then why some formulations have a multi-layered purpose you do not immediately discern).
For the teleology of the attestation criteria (because all this attestation tiff is to lead to a target) I have also taken recourse to Wiktionary:Beer parlour/2019/May § Cites in different languages, and Talk:Moscow. See also Talk:glownigger and Wiktionary:Beer parlour/2020/February § What is durably archived?. Fay Freak (talk) 18:42, 18 August 2021 (UTC)[reply]

On the Talk page for WT:ATTEST, I have been complaining for months that the WT:ATTEST sentence (the WT: ATTEST requirement is summed up in one sentence) has a separate subsection for every phrase and word in the sentence EXCEPT the words "permanently recorded media" which are the EXACT words that need a subsection. WT:ATTEST literally has a subsection for explaining what "a year" means, but doesn't lift a finger on the PRM/durably archived requirement. It is a cruel barrier to entry to leave the site like this. Wiktionary is not a fiefdom. --Geographyinitiative (talk) 19:08, 18 August 2021 (UTC)[reply]

I completely agree, ‘permanently recorded media’ needs a clearer definition and I would suggest that a web link that is 10 or more years old should be considered to be permanently recorded, it’s unlikely that the hosts of the site will suddenly decide to remove it (see the discussion about ‘Russian’=‘mammary intercourse’ on the TeaRoom.). Overlordnat1 (talk) 09:35, 23 August 2021 (UTC)[reply]
From the point of view of starting now, is there any way to tell, generally speaking, how long a web page has been in existence? I don't personally agree that web pages more than 10 years old should be considered permanent. For our purposes, I think it should be assumed that any web page, however long it has been in existence, can disappear, change, or be moved to a different URL at any time, without notice. Mihia (talk) 22:29, 23 August 2021 (UTC)[reply]
Yes - look for it on the Wayback Machine at www.archive.org. That will tell you when snapshots were taken. --RichardW57 (talk) 23:55, 24 August 2021 (UTC)[reply]

I think part of the issue is that many people use the "permanently archived" criterion for reasons that have nothing to do with permanence: for making sure that editors can't just fake their own citations for a hoax or neologism, for screening out misspellings and protoneologisms, etc. Personally, I would like to allow online magazines such as Huffington Post and Breitbart, which include a lot of the terms we are missing. Kiwima (talk) 22:58, 31 August 2021 (UTC)[reply]

Well, could we develop a whitelist of websites that are allowed? It would be good to make some progress on this issue now. Mihia (talk) 21:11, 1 September 2021 (UTC)[reply]
We could start small by saying that a major news-ish site can provide one of the three citations if the URL being cited has remained valid with substantially the same content for at least a decade. This would apply only to the newsy part of the article, not comments or framing. Vox Sciurorum (talk) 17:53, 12 September 2021 (UTC)[reply]
I agree: let's start small and make some progress on this perennial issue. As I see it, there are two separate issues with sourcing from websites. The first is quality control, which may be necessary in order to limit inclusion of bad English, made-up nonsense and other general crap. This can be achieved, at least initially, by restricting to named sites, such as, as you say, major news sites. The second is durability of the source. The fact that a web page has existed for ten years does not prevent it from disappearing tomorrow. Additionally, a website or (especially) web page that has existed for only a short period of time can be high-quality as well as a long-lived one, and the existing "spanning at least a year" rule would prevent us from using solely very new or ephemeral content to support an entry. Therefore, I don't see a good reason for the "decade" rule. I think we should allow content from web sources of "sufficient quality" however recent, or however recently they have existed at a particular URL, and deal with the "durability" issue by archive or screenshots. (The only slight concern I have with the latter is that they are easily fakeable. Btw, how reliably permanent is the/a internet archive? As permanent as Wiktionary itself?) Mihia (talk)
The main objection in the past hasn't been the permanence of the Internet Archive itself: it's been an established institution for over 2 decades. The issue is with the permanence of the archived pages: unless the policy has changed recently, it simply takes a request from the current owner of a domain to have its archived pages removed, at least as far as public access is concerned. Chuck Entz (talk) 14:26, 13 September 2021 (UTC)[reply]
@Chuck Entz @Mihia This has been brought up a few times, but since I've started asking and from the discussions I've seen, no one has really brought up how often that actually happens. I've never heard of or seen Wayback Machine actually delete pages en masse, even when the page incriminates people directly. As I mentioned above:
"Wayback Machine should be included, as the problem of websites being possibly removed is:
  • Extremely rare
  • Usenet, which is archived on Google has the same potential issue where someone could report something and ~theoretically~ have it removed
  • In the rare case where a website is removed, an RFV could just be filed for it.
It's really not that big of a deal, and if the website's real goal is to document all words in all languages, then it shouldn't be this restrictive in the digital age." And honestly, the policy has de facto changed with how people often treat citations anyways; it's just a matter of institutionalizing it. AG202 (talk) 14:39, 13 September 2021 (UTC)[reply]

Kodi's proposal

[edit]

I've written up some proposed changes to the policy and started a vote at Wiktionary:Votes/2021-09/Clarify archiving policy for attestation. Kodiologist (talk) 21:05, 18 September 2021 (UTC)[reply]

@Kodiologist: That’s full of redundancy or things unworthy of rules still, one does not see where the recommendations end and the hard rules are, there are truisms like “internet sources are not required to be formally published” but still missing the point that it is not the individual web-pages per se that count but the mass, the “whole” picture of them, and the needed picture depends on whether it is an “internet word” or what occurrence for a word is claimed. You continue to ignore existing proposals which have resolved that: Mine is even bold in this discussion: Why no assessment of it? Fay Freak (talk) 21:44, 18 September 2021 (UTC)[reply]
It's 100% hard rules except the clause that says "recommended". I don't think the quote is a truism because previous debates on this topic have shown a desire to exclude some sources as too informal. If you have any questions about why I proposed some given policy point or decided against another, I'm happy to clarify. I hope you don't take my choices as a snub. —Kodiologist (talk) 23:05, 18 September 2021 (UTC)[reply]
And I don't know which parts you think are redundant. I was careful to be concise. —Kodiologist (talk) 23:11, 18 September 2021 (UTC)[reply]
Well redundant is the whole sentence “Internet sources are not required to be formally published, because informal language is within the remit of Wiktionary.” It’s like a party manifesto where you show what you want but not how you are going to get it. If you are trying to say that things having been on the internet is good all is a clumsy way to say this, although an interesting way to talk around. It is however easy to censure that now individual web-pages fulfil all attestation just because of that one Internet Archive (which is the CFI’s single point of failure, imagine what happens on Wiktionary if the Internet Archive goes down, if only for a week and nobody understands what happened)—else I am not sure what the rule propositions are, there are a lot of question marks (does “must be accessible to future readers” mean words can go unattested after websites go away, especially its archive, even after their existence was well assessed?). For the time being it is better than the current exclusionary annoyance but still you have not answered what you think of my proposal. Fay Freak (talk) 02:11, 19 September 2021 (UTC)[reply]
I think "Internet sources are not required to be formally published" is more precise and easier to understand than "things having been on the internet is good", because e.g. "good" is vague without further clarification, and I'm not proposing every source that's ever been on the Internet is an acceptable source: I'm saying the source has to be archived. A page that was never archived and has now disappeared isn't an acceptable source. Besides, "things having been on the internet is good" isn't idiomatic verb usage in English; we would say "things that are on the internet are good" or simply "things on the internet are good".
The Internet Archive seems appropriate as a single point of failure because the Internet Archive people have put a great deal of money and effort into internally duplicating their archived data and ensuring it remains accessible for a long time. I don't think there's much to gain by e.g. making our own additional archive of each cited page, if that's what you're thinking. Still, somebody could write a bot to do that if they had the interest and the disk space.
"does 'must be accessible to future readers' mean words can go unattested after websites go away, especially its archive, even after their existence was well assessed?" — No. There are no time limits or grandfather clauses that apply to WT:ATTEST, and I'm not proposing to add any.
I don't think your proposal would do the job, chiefly because it doesn't clarify what counts as "permanently recorded", which is the problem that I'm trying to solve. —Kodiologist (talk) 11:54, 19 September 2021 (UTC)[reply]
@Kodiologist: You are not removing the idea of durability or archivedness or accessibility whatever you want to call it. It’s the same idea while your text works on the premise that the terms are not, or defines accessibility as archivedness; in the climax you introduce a fiction to make the indefiniteness less relevant.
I clarified the idea of permanent record once as “an institutional guarantee”. Some have made a whole philosophy of institutions. You are convinced the Internet Archive is one such institution. I in my proposal solved the problem, not this definition but the problem of excessive non-inclusion and excessive expenditures in trying to quote durably and having discussions, and the loss of repute lying in the absurdities of these, moreover the cognitive dissonance of not acknowledging the internet while trying to find all by virtue of it, by a new sufficient requirement consisting in circumscribing the needed presence of a word on the internet. Which is not a Google search either because search engines are bad institutions but emphasizes that a lexical item can be, and in the event of dispute is (unless manifest) substantiated (”darlegen und beweisen”). Nothing against how people naturally understand what should be in a dictionary: we just need something the better or smarter people who know what should be in a dictionary can refer to those whose concepts are fuzzy, something so the system is not gamed. To describe the topic so people do not post off-topic (they often do for their various passions which seek restriction). At this actual problem my proposal is better—you but patched a problematic definition, but not all our inclusion problems come from it. Fay Freak (talk) 01:30, 24 September 2021 (UTC)[reply]
Sorry man, I have trouble understanding you, and this message is particularly hard for me. For example, in "works on the premise that the terms are not", I can't figure out what you're saying that the terms are not, and in "in the climax you introduce a fiction", I don't know what climax or legal fiction you're referring to. If you're writing your messages in German and then translating into English, then perhaps your translation is too literal. —Kodiologist (t) 02:17, 24 September 2021 (UTC)[reply]
@Kodiologist: I read it again for you; the trouble is caused because your own formulation is troubling, to any who habitually reads or drafts legal instruments: Your determination of the indeterminate concept of durability etc. by defining that the Internet Archive makes something durable etc. is a fiction in the very technical sense, and the climax of the passage. I say “etc.” because durability or archivedness or accessibility is all the same idea here. Your text works on the premise that the terms do not represent the same idea (as some could think so) but ultimately treats them as synonymous. Everything else until before the words “do not quote …” is no actual content. You just inform the reader, very badly, that it is all the same and that additionally the Internet Archive counts. What you yourself believed your text to mean I don’t know. Probably you didn’t understand your own text either like mine not. (Right, like not understanding the consequences of one’s acts one might not understand the objective meaning of what one is talking. And some text supposed to rule indefinitely many never quite means what its author intended. One only finds the objectivized will of the lawmaker; in German one actually says “objektivierte Wille des Gesetzgebers”.)
I never translate my English from German, but legal dogmatics is relatively rarefied in English-speaking jurisprudence. While you find the term indeterminate legal concept in many a book, a large share is from Germans or other continentals or relating to the European Union. So it is necessary to translate German terms to which there is no equal usage in English, but it does not mean I have translated from German, I have transferred and adapted—a particulari ad universale non valet consequentia.
And because people are automatically averse to the foreign, I say explicitly that I claim it obvious that it is not better to follow common-law approaches of amassing contradictory essays of case law just because this is English Wiktionary. You are not using clear terms but pretend to dispense with the lacking clarity nonetheless. Instead Wiktionary should be more conscious of how determined its concepts are.
And for the same reason, I stress that my transferral is not wrong just because etymologically transferral can be equated to translation. I wrote very good foreign English, and if you counter by arguing that only idiomatic usage is good usage then you commit the naturalistic fallacy. Fay Freak (talk) 03:50, 24 September 2021 (UTC)[reply]
This is still hard to understand, and also vaguely insulting. I'm afraid I don't have anything more to contribute to this conversation. —Kodiologist (t) 15:56, 24 September 2021 (UTC)[reply]
I will be voting against that proposal, though there are some liberalizations that I would support. Vox Sciurorum (talk) 22:19, 18 September 2021 (UTC)[reply]
@Vox Sciurorum: While Kodiologist wasn’t trying to make a snub to me, Wiktionary collectively is. What do you think of my liberalization draft? Surely you have thought something? (Hello, is somebody listening out there? Am I retarded and why? We can also vote both at the same time. But not if nobody is listening, and my better proposal may be a reason to vote oppose on the other.) Fay Freak (talk) 02:11, 19 September 2021 (UTC)[reply]
@Fay Freak: I don't really have any insight to bring, but I think your idea that "for terms that reoccur on the internet, it is not that important that a page gets lost if later one gets as many pages with similar usage. A RFV discussion would be a snapshot investigation that at a given date an internet word was seen, more than as made-up crap or hoaxical" is an interesting one and would be worth exploring, though I'm concerned by the inherent "instability" of such a system. That said, entropy means that nothing in this universe is eternal, and that no external archive - nor Wiktionary - will exist forever, so we might as well embrace the instability. This reads like bad philosophy, but I hope it makes at least a bit of sense. PUC21:39, 23 September 2021 (UTC)[reply]

Request for comment notification

[edit]

Here is a link to a RFC on Meta concerning all Wikimedia projects. Lionel Scheepmans (talk) 22:55, 16 August 2021 (UTC)[reply]

Retiring Rhymes:

[edit]

As Wiktionary:Beer parlour/2021/August#Automatic rhymes has been implemented, rhymes are now categorized automatically from the {{rhymes}} template and its parameters. As a result, the current Rhymes: namespace we have has become largely obsolete, and it's probably time to retire it (and its associated X rhymes categories) much like we are going to do with Index. There are three issues that need to be dealt with before this can be done:

  1. Many of the existing rhyme indexes have syllable counts that could be migrated over to the new template by adding the appropriate |s= parameters.
  2. The rhyme pages have some extra information, such as the pronunciation of the rhyme itself and other miscellaneous notes. Are these worth keeping?
  3. The existing mid-level rhyme indexes such as Rhymes:English/ɔː-. A corresponding Category:English rhymes/ɔː- also exists, so maybe these could easily be converted by making them umbrella categories for the other rhymes? In that case, should something like Category:Rhymes:English/ɔː(ɹ) even be directly under Category:Rhymes:English or only under Category:Rhymes:English/ɔː- instead? (Otherwise these mid-level categories wouldn't probably serve much of a purpose)
  4. (After X rhymes are gone, do we want to rename the new Rhymes:X/ etc. categories to X rhymes/etc.? This should be relatively straightforward, albeit laborious for bots, from a technical standpoint)

surjection??09:53, 17 August 2021 (UTC)[reply]

I very much think we ought to keep the Rhymes namespace pages. The category system is in my opinion too clunky for rhymes; the lists take too much space with all those headers and the words aren't ordered by syllable count. Adding extra parameters to the rhymes template is also additional work and the subcategories it would create are a pain to navigate compared to having everything on the same page. Also, automation has (or had) been partly implemented already, as by adding rhyme words to rhyme pages with the automated tool the rhymes template was also automatically added to the word's entry. I'm not sure that tool is currently working, but if not, it can be fixed. I have created a lot of Rhymes: pages, mostly for Icelandic, and the way I do it is usually to add all the words I can think of and/or find in word lists and put them directly on the Rhymes page. This means adding a lot of words or word forms that don't have a Wiktionary entry yet, and this is a great way to find new words to put into Wiktionary. Categories can only be used for existing entries. It's a lot of work to actually make all those entries, so they can't all be made immediately, but having a rhyme list mostly complete already is very helpful, and it's also very nice to have lists like that with red links for stuff that we will want to add at some point. Rhyme pages also have the potential to include other helpful information that cannot be included on Category pages, such as pointing out words that have multiple pronunciations and giving other per-word information, as well as linking to other relevant Rhyme pages for those specific words. – Krun (talk) 11:48, 17 August 2021 (UTC)[reply]
"the words aren't ordered by syllable count" is not exactly true, as there are subgroupings by syllable count (see Category:Rhymes:Finnish/ulɑpːɑ for an example). It's just that {{rhymes}} needs the syllable count in order to do this categorization. The fact is that the Rhymes namespace in its current form is not maintainable, and the Index namespace is enough of a proof for this statement. If there is a list that must be maintained separately from the entries, it's always going to fall out of date in one way or another. While it is true that rhyme pages can serve as a means to find entries to be created, that's not really much of a reason to keep them around - you can just have word lists in userspace whatnot (and that won't even require you to figure out the rhymes). — surjection??12:14, 17 August 2021 (UTC)[reply]
IMO, the bets approach would be
  1. Get all existing links from the rhyme pages and fill in the |s= parameters accordingly
  2. Move all existing pages from Rhymes: to the Appendix, as with the Index
  3. The main pages (Rhymes:English) would be kept if they have useful information besides a list, as Appendix:English rhymes or similar
  4. Create the intermediate categories and in their current form just place them under Category:Rhymes:English etc., probably with a * so that they show up at the beginning
  5. Move pronunciation notes etc. to the categories (which can have extra content)
  6. Word-specific notes: nearly all of these are (one pronunciation) or similar. These aren't needed as the entry already goes through the pronunciations if there are multiple.
  7. Make {{rhymes}} link to the correct category instead of the Rhymes page.
  8. Drop the Rhymes namespace
Any comments? — surjection??10:08, 19 August 2021 (UTC)[reply]
Looks good to me! Do you think you could write a bot program for adding the |s= parameters? I'm also wondering if we need a vote for this (since the Index deletion had one)... Thadh (talk) 18:10, 22 August 2021 (UTC)[reply]
I also agree with the changes. Maybe we should implement them before deciding the fate of the namespace. Ultimateria (talk) 21:54, 22 August 2021 (UTC)[reply]
Yes, we'll need a vote. I'll do a bot job at some point, but that doesn't have to happen before the vote, as the rhyme pages would be kept in an appendix before they'd be deleted either way. — surjection??09:58, 23 August 2021 (UTC)[reply]
Agreed. Vininn126 (talk) 10:56, 23 August 2021 (UTC)[reply]
@Surjection: I like the idea of getting rid of the Rhymes namespace, though there are two big points that have to be considered:
  1. Is it technically possible to add extra qualifiers next to an entry in a category? This would solve the point brought up by Krun. See for instance Rhymes:German/eːən where there are multiple entries with the qualifier "(some dialects)". And if it is not technically possibly to display such information in Category:Rhymes:German/eːən, what should be the policy? Include such regional rhymes in the categories?
  2. For the rhyme categories to work, the IPA transcription used in {{rhymes}} would have to be standardized and written down somewhere. See my topic here, particularly the duplicate categories Category:Rhymes:German/ɪŋn̩ and Category:Rhymes:German/ɪŋən. Fytcha (talk) 13:18, 10 November 2021 (UTC)[reply]
1. No. The best you can do is add qualifiers under the entry itself.
2. Sure, but that needs to be done by language.
surjection??13:39, 10 November 2021 (UTC)[reply]
As someone who has entered almost all of the existing English rhymes, I think this is an interesting idea but am wary of automation. What I have seen done in the past, such as the templates for sorting rhymes into alphabetical order, tends to break things, which then have to be cleaned up manually, defeating the object.
My questions and concerns:
  • Is the idea here to automatically add a rhyme to a rhymes page and/or category by adding a tag to the entry for the word itself?
  • Would the existing rhymes just be migrated?
  • Would it still be possible to add rhymes wholesale (as I currently do), or would they have to be added word by word? If the latter, then this would be my main objection. That will simply take too long to do, unless someone could write a bot to convert a rhymes page.
  • What would we do with rhymes that are currently redlinks? Options as I see them would be to add the missing entries (laborious), remove them (highly undesirable, as that entry might never be added) or to flag these up in the "Requested entries" (laborious again, but a bot would be useful here).
  • How would we deal with so-called "partial rhymes", ones that sound the same at the end but are stressed at different numbers of syllables from the end?
  • What would happen to rhymes that link to Wikipedia? Not all rhymes are Wiktionary material, but that does not necessarily mean they shouldn't be included.
  • Someone else has already mentioned qualifiers; these are especially needed to distinguish senses (for example, of heteronyms such as "tear", "wind" and "wound") and regional variations ("erase", "tomato", "vase")
How would other regional variations be handled? The English rhymes cover US and UK English, and these do not always tally: *chicken* rhymes with *thicken* in US English but not in UK English (where it rhymes with "stick in").
So although automation is an interesting (and potentially labour-saving) idea, I think there is too much risk of getting things wrong or losing content unless this is very carefully thought out. — Paul G (talk) 07:21, 11 April 2022 (UTC)[reply]
Late: The idea of using categories instead of the Rhyme pages is impractical for Czech rhyme pages since most entries in them are redlinks. The redlinks for non-existing entries are especially for inflected forms but also for lemmas. Rhyme pages make it possible to create a useful rhyming guide even without creating the mainspace entries. The situation is expected to be similar for some other inflected languages. This was raised back in Wiktionary:Beer parlour/2013/December#Rhymes categories again; was the previous discussion just ignored? Therefore, I oppose the proposal. Even assuming no redlink problem, Category:Rhymes:Finnish/ulɑpːɑ is an inferior presentation, which does not show the rhymes for different syllable counts on one page but rather hides them in subcategories. The redlink problem was also raised by Krun; why do the supporters who posted after Krun stay silent on the redlink problem? Does not concern English, other languages be damned, or what is the reasoning, if any?
Moreover, mainspace entries without rhyme template also gets missed. Many Czech mainspace entries lack IPA and rhyme template; English is better covered but from what I remember from some reports IPA coverage is far from universal and so is probably going to be rhyme template coverage. To wit, starting from Czech kočka, we get to Category:Rhymes:Czech/otʃka which only has one entry, whereas Rhymes:Czech/otʃka has multiple entries, some readlink and some bluelink. This is going to be very typical for Czech since I focused on creating the rhyme pages and not on placing the rhyme templates to mainspace. I do not know the situation in other languages, but no one produced any report how big a loss results in abandoning the Rhyme pages in favor of mainspace entries with rhyme template and auto-categories: for Czech, the first approximation is that it amounts to nuking the rather significant content of Rhyme pages and replacing it with almost nothing. Even for English, there is likely to be considerable loss, although probably not as large as for Czech. --Dan Polansky (talk) 23:27, 12 August 2022 (UTC)[reply]
Let me modify the stance a bit: a switch to categories done on a per-language basis would be fine. If a language does not have rhyme pages but it has rhyme categories, the {{rhyme}} template could link to the categories and not have a redlink for the rhymes. But again, for languages whose large portion of the rhyme pages are redlinks, a switch would amount to effectively dropping the content, which is obviously undesirable. --Dan Polansky (talk) 16:27, 3 September 2022 (UTC)[reply]

Categorizing words by topic

[edit]

I've been looking around the topic part of the category tree and although having all words neatly categorized intrinsically appeals to me, I'm having trouble orienting myself and I can't really perceive any clear purpose with which the current architectures was constructed. I thought therefore maybe it's good to have a general discussion about categorization by topic. (As far as I can tell, the last time something like this was discussed was 14 years ago: Categories, semantic and contextual.)

For example, look at the word cat, my first association is animal. There is an appendix page for common animals but no comprehensive list of animals. It's not that I'm sure there should be. It would make sense, but I don't know if it would help anyone. On the other hand there is a list of people with 10,000 entries. I find it hard to justify the one and not the other. (Strangely, cat is in people, but not in animals.) I think there is a real tension here between what is logical and what is useful.

One existing guideline (Wiktionary:Categorization) is: "entries in topical categories should rarely—if ever—be put into a more narrow category and also a more general category". I can see that it isn't desirable to have cat included in all the categories animals, chordates, vertebrates, mammals, carnivores, felids, and cats. It would bloat the whole thing unnecessarily. On the other hand, having cat only included in cats in my view defeats the purpose of categorization because you can't easily find it by simply following the tree. You have to know that cats are chordates for instance, which I'm sure not everyone does. (One way I can imagine this categorization by topic might be useful, is that it allows you to find a word you can't remember the sound of—you'd find it in the same category that contains dog. That won't work in this case.) One approach could be to place cat in those categories they're commonly associated with, which for me would be animals, pets, felids (besides those categories which result from specialized meanings), but this is awfully subjective and I'm not sure how to proceed.

Just looking through the tree what strikes me most is the fragmentary, idiosyncratic nature of it: categories never seem to contain what I expect them to. So in household I find a number of synonyms for cohabitant and little else. Senses doesn't have feelings as basic as hungry, thirsty, warm, cold, but emotions contains everything under the sun. Fear is included in emotions and its subcategory fear, but anger is only in the subcategory anger. I think there need to be criteria to decide which categories should be comprehensive and which should be narrow.

So, before making any big changes, what I would like to hear from those who use this categorization by topic is: Why do you find it useful? What exactly do you expect from it? And from those who implement it: What is your approach? —caoimhinoc (talk) 17:40, 17 August 2021 (UTC)[reply]

Both from Wikipedia, Commons, and Wiktionary, my experience is that some contributors get so fascinated that they spend all their time rearranging the categories, creating ever smaller, ever narrower subcategories. This often makes it harder to find things, not easier. The problem is that we have no clear view of how categories should be used, and so we can't tell when subcategorization goes too far or not. I like to have things in categories, but rather broad ones, and categories that exist on many languages. I try when I can to cross-connect (on Wikidata) categories on English, French, Swedish and Russian Wiktionary. In my opinion, it would be good to stop at mammals and avoid its subcategories. But that's my opinion and not a neutral fact, so I can't assert or enforce it. en:Mammals has 141 entries and 20 subcategories, of which en:Even-toed ungulates has 7 subcategories! These subcategories might be biologically correct, but not well suited for a dictionary.
Which words are included in or missing from a category is also completely random. Are all mammals found under mammals? All weaving terms are certainly not found under weaving. And there are no tools that I know of, which can provide hints at which words could be included where. I could imagine a tool that takes all words currently in category:en:Weaving, search for them in Wikipedia, and find other words that tend to appear in the same articles. That might work for weaving, but will it work for mammals? --LA2 (talk) 21:16, 21 September 2021 (UTC)[reply]

Entries in non-canonical scripts

[edit]

I propose that alternative-script entries should be created only when a quotation is provided. If there be no quotation in such entries, they should fail attestability. I am saying this in light of the creation of Sanskrit entries in Brahmi, Prakrit entries in Devanagari, and Pali entries in what not, all in non-canonical scripts. It is high time that we had set the limit now. ·~ dictátor·mundꟾ 18:47, 17 August 2021 (UTC)[reply]

How do we define which scripts are canonical and which are non-canonical? According to Module:languages/data2, the following scripts are canonical for Sanskrit: {"Deva", "Bali", "as-Beng", "Beng", "Bhks", "Brah", "Gran", "Gujr", "Guru", "Java", "Khar", "Khmr", "Knda", "Lana", "Laoo", "Mlym", "Modi", "Mymr", "Newa", "Orya", "Saur", "Shrd", "Sidd", "Sinh", "Taml", "Telu", "Thai", "Tibt", "Tirh"}. —Mahāgaja · talk 21:52, 17 August 2021 (UTC)[reply]
WP: "Sanskrit does not have an attested native script: from around the turn of the 1st-millennium CE, it has been written in various Brahmic scripts, and in the modern era most commonly in Devanagari." The last part could support the view that only Devanagari is "canonical" for Sanskrit.
German WP phrases: "Sanskrit wird seit einigen Jahrhunderten hauptsächlich in Devanagari­schrift geschrieben, gelegentlich jedoch auch in lokalen Schriften." Here it's the first part and could even more support that view.
Later German WP adds: "In der Wissenschaft verwendet man für die Transkription und Reproduktion ganzer Texte und längerer Ausschnitte entweder lateinische Umschrift oder Devanāgarī." That claims: in science, only Devanagari or Latin transcription is used. --22:09, 17 August 2021 (UTC)
The established method for this matter is to raise RfV on the forms you strongly doubt. For Prakrit, I think we should make Devanagari the canonical script. Do we have valid Brahmi quotations for the Prakrit words we are seeing? Only one of the quotations looks verifiable to me. The rest look to me as though they have been transliterated into Brahmi for the purpose of Wiktionary. But, @Inqilābī, do you really want an RfV war? Quotations are not always required for obvious word-meaning pairs - you are proposing to reject attestation by clearly widespread use. --RichardW57 (talk) 00:44, 18 August 2021 (UTC)[reply]
@Inqilābī, Mahagaja We need to define the 'canonical' writing system and what it implies. We should also set out what to do if a word doesn't occur in the canonical writing system. I've encountered a very few Pali words which seem unlikely to turn up in Roman script. --RichardW57 (talk) 01:01, 18 August 2021 (UTC)[reply]
Generalizing gup.
In Serbo-Croatian, a spelling in any script counts for all, and Cyrillic and Latin should be created both unless not even Štokavian. Likewise, but differently, I see no reason to special-case Azerbaijani Cyrillic script, which was abandoned in 1991 except of course in Russia itself. And I wouldn’t want the absurd situation where an Azerbaijani word can only be had in Cyrillic because muh durable quotations are only found in those Soviet books. It is also very notable how many Classical Azerbaijani texts are only accessible in Latin transcription—but they are created with the Latin form as the lemma, in disregard of the fact that they were only ever used in Arabic-writing time. At least we have the word! Which accords with the rules because remember, the words are documented, their spelling is secondary and must not have been used to be added. We have words that aren’t spelt and only attested in audio (the more “well-documented on the internet” a language is, the more there are such words in periphery dialects durabilized in audio records). How lovely that RichardW57 doesn’t have to check all those scripts being used for a Pali word! He couldn’t know. It’s enough that the word is used and the spelling could reasonably be used.
Also there isn’t an alternative script, often. Diachronically, (a now modified form of, which complicates the matter) Arabic script is the main script of Azerbaijani by length of use, while Latin is synchronically the main script. But uh, fewer people could write back then, so how do you count? Anyway even today it is about half—half with speakers in Azerbaijan itself and in Iran, the former writing in Latin and the latter in Arabic script. For Modern South Arabian there is no script at all but scholars write in Latin script, with varying dainty diacritics, but the speakers themselves in Arabic but only if you tell them to do that so—hence the spelling cannot matter so much but has to be normalized. Fay Freak (talk) 01:29, 18 August 2021 (UTC)[reply]
@Fay Freak, Mahagaja: The design I was told was in place for Pali is that the bulk of the information was to be held in the entry, with the other forms of the lemma serving as 'soft redirects' to it. Thus the senses, the etymology and derivatives would be recorded at a single place, rather than being duplicated. For user convenience, I now give subsidiary lemmas a memory-jogging summary gloss, just as I do for inflected forms. Now, I have interpreted this as allowing script-specific, or rather spelling-specific as I adapted to the multiplicity of writing systems, matters to be recorded at the appropriate subsidiary lemma. This includes the inflection, though I suppose one might want to do it the other way round for Sanskrit nouns. --RichardW57m (talk) 14:08, 18 August 2021 (UTC)[reply]
My spreading of entries into other scripts without fresh quotations is driven by the principle that blue links should not misdirect unregistered users - which will include registered users where logging in is impossible or unwise. There is already be a level of checking done by
  1. the generation of the other forms by {{pi-alt}} and
  2. the existence of the form in some writing system of the other script.
It should be noted that {{pi-alt}} generally undergenerates rather than overgenerates. The set of extant sets of spelling rules is not as well known as I would like, and obviously a consistent set of transliteration rules should be applied. The only overgeneration I am aware of is in the 'New Shan' handling of -ss-, and I shall deal with that. Notifying @Inqilābī, Fay Freak, Octahedron80, Mahagaja, Kutchkutch:. --RichardW57m (talk) 14:08, 18 August 2021 (UTC)[reply]
@Octahedron80: It was Old Shan, and I've fixed it. It now just returns the normal Burmese form if it can't generate the Old Shan form. --RichardW57m (talk) 13:47, 19 August 2021 (UTC)[reply]
What I intended to say is that, a language entry should not be created in multiple scripts. My quotation criterion might be a bit silly inasmuch as the word might not be attested in the canonical script — therefore we might as well provide a quote in a non-canonical script at the main entry itself. That we can decide later, but my main concern is the creation of non-canonical script entries which serves no purpose to us other than the perceived advantage that a reader could search in any script to find the term. Also note that Sanskrit, Pali, and Prakrit—being dead languages—should not be compared to any of those modern languages like Serbo-Croatian or Azerbaijani that can use multiple scripts for any term. Incidentally, it’s funny how human editors are creating tons of useless alternative-script entries in dead tongues— if this has to be done then this at least should be a bot task, right? Anyway, the idea of having entries in multiple scripts for dead/ancient languages is very, very ridiculous, as well as space-wasting & time-wasting. Can we get a bot to get them all mass-deleted, please? ·~ dictátor·mundꟾ 14:32, 18 August 2021 (UTC)[reply]
Oppose If it did go through, I woud suggest that those left on Wiktionary put dead languages in the Roman script, where they are most useful to L1 English speakers. --RichardW57m (talk) 15:45, 18 August 2021 (UTC)[reply]
The faithful Tamil script writing of Sanskrit is quite recent. (Mind you, I'm not sure how compatible it is with modern script renderers - could be fun!) --RichardW57m (talk) 15:45, 18 August 2021 (UTC)[reply]
Searching is indeed a good reason to keep the multiple scripts. I don't know how well it works with Ancient Egyptian though. --RichardW57m (talk) 15:45, 18 August 2021 (UTC)[reply]
Doing all of multiscript by bots would actually be difficult - {{pi-alt}} and {{sa-alt}} do not generate exactly the set of forms found. They currently do not do it for the Burmese flavour of Burmese-script Pali, and they definitely don't do it for Pali in the Tai Tham script. Do you even know how to write Sanskrit in the Tai Tham script? I don't. It will be interesting to see how syllable division is done for preposed vowels in Thai-script Sanskrit - I know there are multiple ways of doing it for Thai-script Pali. What are the rules for the choice between homorganic nasals as consonant letters and as anusvara? There seem to be some idiosyncratic spellings. The per-script quotations are likely to need doing by hand even if you could bottify the rest of the process. --RichardW57m (talk) 15:45, 18 August 2021 (UTC)[reply]
Since this is the English wiktionary with a clear focus on the use of English as the language of the dictionary, may we also implement the {{en-alt}} that shows the spelling in the Deseret and Shavian scripts, which are well supported by Unicode? I feel that Deseret and Shavian forms of English words, upon meeting the CfI, should also be included. --Frigoris (talk) 16:13, 22 August 2021 (UTC)[reply]
Are there words that could meet the CfI in those scripts? w:Shavian alphabet and w:Deseret alphabet say likely not.--Prosfilaes (talk) 05:40, 23 August 2021 (UTC)[reply]
According to the two WP articles, the Shavian script was used in the 1962 edition of Androcles and the Lion, 8 issues of the journal Shaw-script published "[b]etween 1963 and 1965", as well as editions of literary works published in 2012–13. It seems the medium (durably archived print books) and time span (spanning at least a year) are well-met. As for the Deseret script, the article says it was used in 1854–1877 in a few newspapers, books, dictionaries, as well as epigraphy. So again, as long as someone can get hold of these materials and provide citations, I can't see why the word forms aren't likely to pass the CfI. --Frigoris (talk) 08:09, 23 August 2021 (UTC) I should add that I meant the actually attested word-forms in these scripts, rather than the countless hypothetical forms that can be generated for English words. --Frigoris (talk) 08:12, 23 August 2021 (UTC)[reply]
The independence requirement of CFI could be interpreted to cause problems. Indeed, one could argue that CFI was intended to rule out less extensive spelling reform enthusiasms. --RichardW57m (talk) 12:37, 23 August 2021 (UTC)[reply]
  • @RichardW57: You said that ‘searching is indeed a good reason to keep the multiple scripts’. However, you can very easily search a non-canonical script term without the need for any entry. For this very reason, the proposal to have romanised Sanskrit entries failed. There has also been a recent discussion to get rid of Gothic romanisations. Thus, there are no other good reasons to keep such entries, as our search results are either excellent or pretty good at the least. The purpose of ‘searching’ having been explained to be defeated, I think you would now be inclined to agree with the principle of one language - one script for dead languages. And all quotations in non-canonical scripts can be added in the main entry itself. Maybe a vote is required to establish this rule. ·~ dictátor·mundꟾ 21:06, 2 September 2021 (UTC)[reply]
    @Inqilābī:: So why can't I find ເທວານໍ, the genitive plural in the Lao script with implicit vowels of deva? I can find the form without implicit vowels, ເທວານັງ (devānaṃ) because I have entered the declension of the corresponding lemma ເທວະ (deva). Searching for Roman script forms is generally privileged because the transliterations are given for the inflected forms - but that wouldn't help you find the alternative Latin script form devānaŋ. --RichardW57 (talk) 22:27, 2 September 2021 (UTC)[reply]
    @RichardW57: But readers would search the lemma form, right? Searching for inflected forms is never recommended. ·~ dictátor·mundꟾ 07:04, 3 September 2021 (UTC)[reply]
    @Inqilābī:: No! The lemma form isn't always obvious. And retyping just the end is not always straightforward - one might not have the keyboard to hand, and one might not be able to read the script very well. In this case, a stemming algorithm will come up with the possibilities ເທາານ, ເທວານາ, ເທວ, ເທວາ, ເທວານັນ຺ຕ຺ and, if it knows that present participles aren't always there as subsidiary lemmas, ເທວານັຕິ and ເທວານາຕິ. --RichardW57 (talk) 07:37, 3 September 2021 (UTC)[reply]
    @RichardW57: I wonder how many people even read Wiktionary (especially to study non-Latin script languages). We are a dictionary in the making, far from being complete: so the user-friendliness of this site could always be dealt with later on. Be that as it may, those who are our presumed regular readers, would consider seriously searching the term they are looking for. All of them must know Roman, or else they would not be consulting en.Wiki in the first place. Knowing Roman solves all the perceived challenges, I guess. ·~ dictátor·mundꟾ 18:19, 3 September 2021 (UTC)[reply]
    @Inqilābī:: Searching for Sanskrit putri and anena fails; the search process finds entries putri and anena, and looks no further. Finding Sanskrit deva only works if one follows offered English and Pali deva, and then follows an entry in those entries to the Sanskrit word. --RichardW57 (talk) 06:36, 3 September 2021 (UTC)[reply]
    @RichardW57: When the romanisation’s form happens to be the same as that of a fullfledged entry, then it’s recommended to put an asterisk afore the term: so try this and this and behold the search result. May be we should somewhere need to mention this technique in order to assist our readers. ·~ dictátor·mundꟾ 07:04, 3 September 2021 (UTC)[reply]
    @Inqilābī:: Nah, Wiktionary is only meant for the cognoscenti. Yes, of course it should be clearly documented. Who've got the privilege? --RichardW57 (talk) 07:37, 3 September 2021 (UTC)[reply]

Macedonian dialectal forms

[edit]

If I or other users add more dialectal Macedonian words in future, how can we ensure that users who don't know any better won't try to generate automatic IPA transcriptions for them and inflectional tables using the modules and templates designed for standard Macedonian? This would be totally inappropriate (much like adding standard Italian templates to Friulian, Neapolitan, Sicilian or Sardinian entries) since many dialects have their own phoneme inventory, their own phonetic realizations and allophones, their own stress rules, their own inflectional categories (e.g. the dialect of Kumanovo has an accusative that's alive and well), their own inflectional endings (e.g. -ав in Kumanovo vs. -ат in the standard for 3PL present) and so on. In essence, each dialect is an independent linguistic system that requires the same attention as what Wiktionary would treat as a separate language, but most users would probably be unaware of this, possibly giving precedence to conventional classifications and political ideology over linguistic facts, and try to fit dialectal words into standard moulds, especially users who do not speak Macedonian well or at all but still contribute freely, as discussed in 2016, and make all sorts of mistakes when documenting the standard, let alone obscure rural dialectal words.

Ideally, we would program templates and modules for each individual dialect and knowledgeable users would apply them before misguided users can interfere, but I do not have the knowledge to guide such programming, and most Macedonian dialects are rather poorly studied, in addition to being endangered, so I don't think that the necessary information would even be available in most cases. Even so, it would be a shame for me to not add dialectal words whose meaning and grammatical category I know (the minimum criteria for inclusion). That brings us to the initial problem of the risk of inapposite additions to bare dialectal entries.

I do not intend to add dialectal Macedonian words in the near future, but I believe that this is something that should be addressed properly well in advance. Martin123xyz (talk) 10:59, 18 August 2021 (UTC)[reply]

We can’t—as “the necessary information is not even available in most cases”. But one who does not speak Macedonian well is not even likely to add so dialectal a Macedonian word. But still it makes sense to have parameters to switch to dialectal inflection. Fay Freak (talk) 13:18, 18 August 2021 (UTC)[reply]
So in the absence of phonological and grammatical information, do you think it would be all right for me to add, for example, "ѓузентија" as a dialectal word from Berovo meaning "hallway" even though I don't know how exactly it's pronounced and declined, whereas I have a source for the spelling and the meaning? Martin123xyz (talk) 13:41, 18 August 2021 (UTC)[reply]
I would suggest having a template that would be included in a "Usage notes" section. with standard wording saying something along the lines of "This is a dialectal word. Pronunciation and inflection may be quite different from those of standard Macedonian."
As for the likelihood of this being a problem: comparative references are notorious for dredging up obscure dialectal or historic forms that no one has ever heard of but that nicely illustrate patterns of sound or semantic change. There are plenty of non-fluent editors who will spot these in reference works and not have the background to see them as anything but ordinary Macedonian. Chuck Entz (talk) 14:07, 18 August 2021 (UTC)[reply]
Yes. As this is already very helpful. Not knowing the stress alone is not a reason to omit it—the more so as it generally isn’t even distinctive in this language. In other languages (Serbo-Croatian, Cushitic, Chadic) there are often tone marks lacking which aren’t written. Often it is also that a word is only found from written sources of a time that did not mark stresses or tones yet—on fran.si there are such historical Slovene dictionaries. Sometimes the vocalization of a medieval Arabic word is a kind of guess (e.g. فِلَسْقِيَّة (filasqiyya); I at least let partake the unexperienced reader in my Sprachgefühl). But the gist has been transmitted this way. If you are sure about the page titles that’s already great. Dan Polansky constantly created Czech entries without any inflection information, by which reason the bulk of our Czech entries lacks inflections, so you should realize how grandiose it is that you generally give them bar some borderline cases where the interpretation can be different. Fay Freak (talk) 14:12, 18 August 2021 (UTC)[reply]
Macedonian is a WT:WDL, which requires 3 cites. I guess the 3 needed cites could at least show a bit of the inflection?
(That there possibly aren't 3 cites for several dialectal terms is a different topic and a problem which exists for English as well: There aren't many dialect authors like Tim Bobbin AKA John Collier or William Barnes.) --2003:DE:3720:3758:3071:FA86:5FE7:D727 10:39, 19 August 2021 (UTC)[reply]
Three citations would show at most three forms, whereas in standard Macedonian, nouns have up to 11, adjectives 16, and verbs much more. We could not guess the inflection: only native speakers of the dialects could provide all the forms, if not on Wiktionary, as informants interviewed by dialectologists who publish sources that people like me can use. Anything else would be beneath basic quality standards. Martin123xyz (talk) 06:32, 20 August 2021 (UTC)[reply]

Nullifying our previous templates vote

[edit]

Per yesterday's decision to unblock Victar, the holding not to use the new {{bor+}} and {{inh+}} templates stands. I obviously think that was the wrong decision, in part because I believe the templates did not require a vote in the first place. Thus, I would like to start a vote to nullify Wiktionary:Votes/2021-04/Creation_of_Template:inh+_and_Template:bor+ and hold, once and for all, that the templates could have been created freely, without any ballots being cast. Thoughts? Imetsia (talk) 14:10, 18 August 2021 (UTC)[reply]

Please do so. I once asked somewhere if that vote could be move to a different mainspace so as to make that vote null and void, but that idea fell through. It’s now the best idea to have another vote to nullify the last vote, during this time of divided opinion. Go ahead. ·~ dictátor·mundꟾ 14:56, 18 August 2021 (UTC)[reply]
Okay, the vote has been created: Wiktionary:Votes/2021-08/Nullifying the previous templates vote. Any feedback is welcomed. Imetsia (talk) 15:38, 20 August 2021 (UTC)[reply]
I think it is a very bad idea to create votes to nullify previous votes so soon after the previous vote. Should we just keep on having votes until your side wins one? Should we then stop or should we then allow more votes? An interval of a year seems like the bare minimum between a vote and a vote to reverse it.
In this case the vote was not long ago, had a full discussion, and a large number of participants. I don't see any justification for such a vote at this time. DCDuring (talk) 16:02, 20 August 2021 (UTC)[reply]
I agree with DCDuring, and not because of my own thoughts on the matter. Thadh (talk) 16:21, 20 August 2021 (UTC)[reply]
I don't understand what it means to "nullify" the vote. Please phrase the question so it is clear what the status quo is and what will change if the vote passes by the requisite margin. Vox Sciurorum (talk) 16:27, 20 August 2021 (UTC)[reply]
To nullify the vote means to hold that it carries no force, and that it should have never been created to begin with. But I'll try to refine the wording to avoid any possible confusion. Imetsia (talk) 18:07, 20 August 2021 (UTC)[reply]
This vote does not so much "reverse" the previous one, as much as it simply holds a vote was never necessary. In other words, this isn't an attempt to get a new supermajority together to "support" the creation of the templates. Rather, regardless of how you feel about the templates on their merits, this new vote simply says that they could have been created without any "ballots being cast." This vote is curative, as it corrects the previous error of creating the initial vote in the first place. It is not a new referendum on whether our users agree with the templates on their merits. Imetsia (talk) 18:07, 20 August 2021 (UTC)[reply]
We have to be careful, because this whole mess is really about perceptions and wiki-politics- the technicalities of rules and votes are secondary. We need to be sure we're clear about everything, so as to deal with the perceptions and give the wiki-politics a chance to work themselves out. The main problem has been that wording has been vague and hasn't clearly explained what was being decided, so everyone has made their own judgements on what the outcome meant. Simply nullifying something without specifying what has been nullified means that people will have different ideas about the effect of the nullification.
First of all, the ability to create the template was never in question: unless there's a vote explicitly forbidding the creation of a template or the template has been legitimately deleted previously, anyone can do so. The real matter to be decided is deployment. There are two possible issues:
  1. Are editors allowed to deploy the template (where and when are also part of this)
  2. Are editors required to deploy it (here again, where and when matter)
Second, what was the status quo re: those two issues (especially the first one). Was there an existing consensus, or wording in an approved vote that prohibited deployment of the template, or of the wording provided by the template, and if so, where and when?
The perception before the vote was created seems to have been that the answer to the question in my second point might have been "yes"... sort of. The vote was meant to make it clear that that was no longer the case. Instead, it just muddled things up more.
The wiki-politics comes in due to resentment at being asked to spend a great deal of time and effort to discuss and vote on something, then having the people who asked ignorethe vote and do the very thing that permission was asked for without consensus to do so. This may not have been intentionally an insult to the community and the princliple of consensus, but it certainly was read that way by many.
I don't have time to go into this further at the moment, but I'll come back to this later. Chuck Entz (talk) 19:24, 20 August 2021 (UTC)[reply]
That's why I think the real solution would be to tackle the issues I have outlined here. PUC21:16, 20 August 2021 (UTC)[reply]
I will once again quote @Thadh's (see diff): "some have brought up that a new template's creation shouldn't need any vote, but I'd argue that since this template is one in a series of arguably most used templates, any creation of a template that takes over a part of or even the whole function of {{bor}} or {{inh}} should get a vote, which it did in this case, and would have even without the initiative of the template's advocates."
I share this opinion: it's true that we don't usually require consensus before creating a template, but most templates are uncontroversial to begin with. These ones are a particular case. So I'd say this was a good vote to have, and I intend to vote Oppose. PUC21:16, 20 August 2021 (UTC)[reply]
@PUC: Just for clarity, when you say "this was a good vote to have" (my emphasis), are you referring to the original April vote or the one that I've proposed? — This unsigned comment was added by Imetsia (talkcontribs).
@Imetsia: Sorry for being unclear: I meant the original April vote. PUC22:07, 20 August 2021 (UTC)[reply]
I'm not particularly convinced by Thadh's argument. The template is "one of a series of... most used templates," but that's a distinction without a difference. And secondly, the new templates in no way "take[] over... the function of" the traditional templates. They are simply another option that users can elect to use. Imetsia (talk) 22:31, 20 August 2021 (UTC)[reply]
But why disallow their replacement by {{bor}} and {{inh}}? If you want to use {{bor+}} to save keystrokes, fine, but why revert someone willing to go through the pain of replacing it with "Borrowed from {{bor}}", as you (and others) have been doing? I've seen no cogent argument for doing so, and it means we now "have two pairs of competing, virtually identical templates, but no way to enforce one or the other and achieve consistency in the long term" (I'm quoting myself). That's what I think should be clarified. PUC22:49, 20 August 2021 (UTC)[reply]
The person "willing to go through the pain of replacing it with" the old format was systematically ensuring that the new bor+ and inh+ templates were put out of operation. I've always thought the templates are legitimate, but one user alone had taken it into his own hands to de facto prevent their usage. It takes us to a situation where the templates might just as well not exist if they can never really be put to use, just because one user obstructs their deployment.
And to the second point, I've never argued consistency (although I realize this was the original argument offered by some others). And at some point, I don't really care about how "consistent" our formatting is, provided that we have a reasonable amount of uniformity with headings, labels, etc. I also don't see the two pairs of templates as "competing." To my way of thinking, you can use one or the other. They're not fundamentally in conflict. There is no race to see how widespread the templates can become, or if one template can overtake the other. If anything, the templates are complementary - again, you can choose the old ones or the new ones. I don't mind that others might continue to implement the traditional templates. Imetsia (talk) 23:12, 20 August 2021 (UTC)[reply]
That's where we disagree. I think consistency in formatting should be pursued for its own sake, because there's nothing to gain from a proliferation of different ways of doing things.
"It takes us to a situation where the templates might just as well not exist if they can never really be put to use": not really. The templates still exist and can be used in new instances by people wishing to save keystrokes; the goal of the templates defenders is achieved, and once they've used them in an entry they can forget about them. But at the same time, people wanting to achieve consistency should, in my opinion, be free to replace them with {{bor}} and {{inh}} when they encounter them; we could even imagine a periodic bot replacement. Where would the harm be in that? PUC23:31, 20 August 2021 (UTC)[reply]
Consistency, but at what cost? Do we have site-wide edit-wars? I, personally, would rather we have rules of engagement similar to our treatment of pondian variation: if someone switches a "US standard spelling of" entry to be the main one, I revert it, and if someone switches a "UK standard spelling of" entry to the main entry, the UK admins revert it. If an entry doesn't have an inh or bor template, the choice as to which one is added should not be subject to second-guessing. Systematic switching should not be allowed. The problem is that there needs to be consensus for it to work. If one person or group decides to unilaterally impose something, another person or group is likely to try to impose the opposite, and we have chaos. Chuck Entz (talk) 00:42, 21 August 2021 (UTC)[reply]
@Chuck Entz: There's no reason we should have edit wars if we have a clear policy about the status of these templates, and I've already proposed a workable one above: 1) the use of {{bor+}} and {{inh+}} would be tolerated in new instances (i.e. when it has an impact on the users' end); 2) they could not be used to replace the old templates when these are already preceded by "Borrowed from"/"Inherited from" (i.e. when it has no impact on the users' end); 3) on the contrary, they could be replaced by the old templates by whoever should wish to do so. They would be in a state of permanent semi-deprecation, if you want. And I'm yet to see a cogent reason to oppose this, aside from a weird attachment to the new templates themselves. As Imetsia said above, "There is no race to see how widespread the templates can become, or if one template can overtake the other"; so why should it matter that the new ones are systematically being replaced by the old ones, as long as one can use them when they want to save some typing? PUC12:02, 28 August 2021 (UTC)[reply]
I think that's a good suggestion, maybe it could be generalized. I've seen some users systematically shorten templates (eg replacing {{gloss}} with {{gl}}). That's actually more typing (removing characters), and to what end? The template system is already cryptic enough as it is. – Jberkel 12:37, 28 August 2021 (UTC)[reply]
The UK/US spelling is, imo, not comparable: choosing one spelling over the other has an impact on the users' end. Choosing {{bor+}} over {{bor}} or the reverse has none (apart from the fact that the new ones create a link to the glossary, something which I'm not convinced is necessary). PUC12:02, 28 August 2021 (UTC)[reply]
@Imetsia: Do you think creating a template {{head+}} which gives the parameters |3= and |4= for gender and head respectively would also not need a vote? Thadh (talk) 00:01, 21 August 2021 (UTC)[reply]
@Thadh: No, it would not need a vote. If we really dislike such a proposed template, we can nominate it to RFD, where we can then vote to delete it. Victar tried to do that with bor+ and inh+, but that effort failed. Imetsia (talk) 00:18, 21 August 2021 (UTC)[reply]
The question that needs to be answered ASAP is what will happen if this vote fails? (Which it is likely to, just based on Wiktionary bureaucracy and the energy that has been going around.) Will editors be or not be allowed to use {{bor+}} & {{inh+}}? Why is this is just not decided upon by the administrators? If a vote isn't needed for a template, there was a problematic person, and the vote was unclear, why was the vote not stopped before it took place? Just from a bystander's perspective, seeing the initial vote and the drama before and after it, there's so much that could've been avoided, and right now, it just seems like it's continuing. There needs to be a serious talk amongst higher-ups about this, as I just foresee this next vote turning into another bloodbath without actually changing anything. AG202 (talk) 21:55, 20 August 2021 (UTC)[reply]
This new vote as I read it has two regulatory effects: 1. Quashing the previous vote 2. Assessing that the previous template creation and deployment was legitimate. I don’t know why this is mixed. Maybe so central a template needs a vote, as PUC said, so there may be two votes. Before I edited, the second point was formulated even broader or worser, the legitimizing “the two templates” (i.e. before it could be interpreted as legitimizing the templates in their present form, or as a new vote on these templates. However the distinction may be artificial and assessing that the templates could be created and deployed without vote is tantamount to a vote for these templates, so we are back in the beginning. On the other hand, there is no need to quash the previous vote if it is replaced with a new one on the same matter. Fay Freak (talk) 22:10, 20 August 2021 (UTC)[reply]
As I've explained before, it's preferable to have a nullification rather than a revote as a matter of principle. The templates should not have required a vote to be created and deployed. If we do a revote, we essentially give credence to the idea that the original vote was legitimate. As a matter of last resort, I would support a revote. But other remedies like this one are available and preferable. Secondly, I think there's a fine line when it comes to whether the "distinction [is] artificial." A user could dislike the templates but, on principle, vote "support" here just because they don't believe the templates should have been voted on to begin with. As I've said before, I dislike the inh+ template. But I will work to keep it in place in the interests of fairness and principle. Imetsia (talk) 22:31, 20 August 2021 (UTC)[reply]
If this vote fails, the templates will not be allowed. The recent decision to unblock Victar is effectively a ruling that the templates are, for the time being, illegitimate. At the least, it means that the templates' usage can and will be undone by Victar; thus de facto putting them out of operation. (This is a crude summary, but read my longer comments at my post linked all the way up above). This is the "problematic person" you're talking about, but some colleagues in admin circles have disagreed. Sadly, there's not much to do about that (except an appeal of the unblock, which I support but I don't want to appear too divisive/vindictive). Imetsia (talk) 22:31, 20 August 2021 (UTC)[reply]
Thank you, I did read the essay, and while I will probably be voting in support of the templates, I still think that there's a need for admin to make sure that this doesn't happen again. Also, while appealing the unblock may appear too divisive/vindictive, this "re-vote" will as well with how it's proceeding. AG202 (talk) 17:51, 21 August 2021 (UTC)[reply]
I agree. It's principles over anything else. A better approach would be to look for some compromise (perhaps vote on just {{bor+}}, which seemed less problematic), instead of nullifying/appealing everything. – Jberkel 18:05, 21 August 2021 (UTC)[reply]
AG202 asked, 'Why is this is just not decided upon by the administrators?'. The answer is that their job is to implement the consensus of the community, not to decide its policy. --RichardW57 (talk) 09:21, 21 August 2021 (UTC)[reply]
Not saying that the administrators should decide the policy (as the policy is already set for creating templates), but that those rules should be enforced and someone should step in if there are issues. If the vote shouldn't have happened in the first place, someone should have stepped in to stop it. And regarding consensus on the prior vote, Imetsia talked about that in their essay. AG202 (talk) 17:52, 21 August 2021 (UTC)[reply]
What is the point of having 4 identical templates? After all, you can delete the template {{bor+}} and the template {{inh+}}. And "Borrowing from" and "Inherited from" should be transferred to the template {{bor}} and the template {{inh}}. Gnosandes (talk) 10:54, 21 August 2021 (UTC)[reply]

I don't understand why everyone here seems to think that the existence or deletion of these templates is their respective hill to die on. The arguments that have been brought forth so far against adding "Inherited from..." or "Borrowed from..." are a sign either of linguistic ignorance (no, it's not obvious whether an Old French term is borrowed or inherited from Latin) or of egocentric arrogance (no, it's not obvious to everyone that this Old High German term is inherited from Proto-West Germanic) or of both. On the other hand, it would not be that difficult or tiresome to type those few extra letters if those templates weren't there. BUT: There is a viable compromise that's been proposed several times: Add a paramater to the bor/inh-templates that creates the text (BTW the parameter should be short, preferable single-letter). And when your at it bring the other etymology templates in line so that new editors won't have to face this incomprehensible mess. I'll use those templates as long as they're there and please don't contact me about this on my talk page further on. --Akletos (talk) 14:27, 23 August 2021 (UTC)[reply]

As things go, at some point this has stopped being about templates, inheritance chains or saved keystrokes. – Jberkel 19:27, 26 August 2021 (UTC)[reply]

Request new etymology-only languages codes part 2

[edit]

Previous request: Wiktionary:Beer_parlour/2020/August#Request new etymology-only languages codes

Request 1: new etymology-only language code for Semende/Semendo isolect of Central Malay, possible code pse-smn

Request 2: opinion about Betawi Ora and Betawi Udik isolects (merge or delete, see previous discussion)

Rex AurōrumDisputātiō 23:32, 18 August 2021 (UTC)[reply]

Sandhi forms

[edit]

Is there any policy on the recording of sandhi forms that are written like whole words? For example, Sanskrit नरो (naro) is a sandhi form of नरः (naraḥ) and may be delimited by spaces, but looking it up finds only a Pali nominative singular and a Hindi vocative plural. I am unaware of any record on Wiktionary of it as a Sanskrit form. In Roman script Pali, the sequence '-ṃ ca' is often written as '-ñ ca', and final anusvara can be written as 'm' plus space before vowels. (For Pali, the Indic scripts generally omit the space and merge the syllables, though horrible things happen with punctuation and Pali quotative ti.) --RichardW57m (talk) 09:56, 19 August 2021 (UTC)[reply]

As far as I know, such entries are allowed. If we don't have any yet, it's just because no one has bothered creating them. They could either be labeled {{alt form of}} or we could created a new template {{sandhi form of}} for them. —Mahāgaja · talk 10:07, 19 August 2021 (UTC)[reply]
Or use {{form of|...|sandhi form}} or {{inflection of|...|sandhi form of|...}}. The latter won't play well with the list formatting (;), but we would mostly be handling inflected forms. --RichardW57m (talk) 12:05, 19 August 2021 (UTC)[reply]
Combining form?
Rishabhbhat (talk) 10:22, 19 August 2021 (UTC)[reply]
Less easy to use, but makes sense for consonant stems, where there is extensive neutralisation. One would need to check the language- and script-specific rules on hyphens in entries. I think we will hit spelling issues with any quotations for the effects of -n l-. The last I heard, strict Unicode (and Harfbuzz, whence Chrome) couldn't handle it for Devanagari fonts that use a half form! (The candrabindu goes anbove the wrong consonant.) The Windows rendering system does handle it. (I've raised an accepted bug report on the lack of documentation in 'Microsoft Typography'). The SINHALA candrabindu will be fun. --RichardW57m (talk) 12:05, 19 August 2021 (UTC)[reply]
If you meant, "{{combining form of}}?", I would say that the forms I asked about aren't combining forms as we normally use the term. --RichardW57m (talk) 12:17, 19 August 2021 (UTC)[reply]
Isn't sandhi exactly that? नरो would never appear on its own, only when with another word in a sentence. Sandhi quite literally means, and I quote Wiktionary: junction, connection, combination. Similarly कालो, which occurs as a sandhi form of कालः with certain words, never on its own.
Not quite. A combining form does not remain a word on its own, but becomes part of a greater word. --RichardW57 (talk) 20:47, 20 August 2021 (UTC)[reply]
Leaving aside how it will be done, if we do this, we will have to do it for every word, no? There are several possible sandhi forms for each word ending, depending upon the next character. Rishabhbhat (talk) 10:14, 20 August 2021 (UTC)[reply]
Yes, pretty much as almost(?) every Welsh word has a table of mutations. For inflected nouns, I was envisaging a simple list of the stand-alone sandhi forms, generated by the declension tables. A visitor can then be find these forms by a simple search. Forms that share an orthographic syllable with the next word will be harder, in various ways. --RichardW57 (talk) 20:47, 20 August 2021 (UTC)[reply]
Words ending in vowels generally don't have separate stand-alone sandhi forms. --RichardW57 (talk) 20:49, 20 August 2021 (UTC)[reply]
I can make a module to generate all possible sandhi forms if you wish, in a table. Let me know later if it is okay. (I have significant coding experience) Rishabhbhat (talk) 15:08, 21 August 2021 (UTC)[reply]
@Rishabhbhat: I suggest that you have two domains of application - single words and inflection templates. If I were doing it for noun inflections, I would want to process the collection of inflections while held as a Lua table of word forms and attributes (e.g. footnotes). --RichardW57m (talk) 12:29, 23 August 2021 (UTC)[reply]
Maybe for we could create subpages for Sanskrit words, like नर/sandhi, that show the sandhi forms for all words in its inflection tables. Rishabhbhat (talk) 05:45, 24 August 2021 (UTC)[reply]
I think the 'stand-alone' forms could be listed in one line, or at least, one paragraph. The user can work out which number and case they apply to. --RichardW57 (talk) 07:56, 24 August 2021 (UTC)[reply]
Do you mean to say that for words like भानु, we shouldn't have भान्व्? Bcoz it does combine like that. Rishabhbhat (talk) 10:40, 25 August 2021 (UTC)[reply]
It may require significant UI/UX work to decide how to present the possibly large number of sandhi forms and their associated contexts neatly. Also, sandhi is not just about endings. It goes across word boundaries, so the very idea of "sandhi form of a word" may not always apply neatly. For example, anything form in -a may in principle become -e-, -o-, or -ar- etc. after sandhi. It would be difficult to say which "sandhi form" the syllable with the -e- belongs to. --Frigoris (talk) 16:03, 22 August 2021 (UTC)[reply]
@Frigoris: Yep. That is why I restricted my query to where the word boundary corresponds to a syllabary boundary. Unicode standards have the bizarre concept that word boundaries can't be within a character. I suspect even the conversion of visarga to repha could be hard to resolve, especially in the Malayalam and Khmer scripts. Just deleting from the end doesn't necessarily yield a form of the first word. --RichardW57m (talk) 12:29, 23 August 2021 (UTC)[reply]
Well @RichardW57m, in some languages the word boundary could be the character, as in French constructs like a-t-il? where the t is the boundary. I don't think this is specific to any encoding scheme, being a language/script-specific feature. In the case of Sanskrit, there's another irksome thing: the initial consonant can be changed by sandhi too. A form that starts in one of the certain consonants and ends in the visarga can in principle have a large number of sandhi combinations. --Frigoris (talk) 15:43, 23 August 2021 (UTC)[reply]
For declension tables, this might be done as with a CSS :hover for each individual forms, displaying all possible sandhi forms of it. Hovering on नरः in the table for नर can can show/list नरो, नरस्, नरश्, नर etc. Rishabhbhat (talk) 10:38, 25 August 2021 (UTC)[reply]

Universal Code of Conduct - Enforcement draft guidelines review

[edit]

Hello all,

The Universal Code of Conduct Phase 2 drafting committee would like comments about the enforcement draft guidelines for the Universal Code of Conduct (UCoC). This review period is planned to begin 17 August 2021.

Community and staff members collaborated to develop these draft guidelines based on consultations, discussions, and research. These guidelines are not final but you can help move the progress forward. Provide comments about these guidelines by 17 October 2021. The committee will be revising the guidelines based upon community input.

Everyone may share comments in a number of places. Facilitators welcome comments in any language on the draft review talk page or by email. Comments can also be shared on talk pages of translations, at local discussions, or during round-table discussions and conversation hours.

There are planned live discussions about the UCoC enforcement draft guidelines:

The facilitation team supporting this review period hopes to reach a large number of communities. Having a shared understanding is important. If you do not see a conversation happening in your community, please organize a discussion. Facilitators can assist you in setting up the conversations.

Discussions will be summarized and presented to the drafting committee every two weeks. The summaries will be published here.

Please let me know if you have any questions about the draft, the community consultation or the UCoC.

Many thanks! --Ravan (WMF) (talk) 15:44, 19 August 2021 (UTC)[reply]

  • The draft anticipates the possibility that "no local structure exists to address a complaint" which is obviously the simplest solution for Wiktionary. Global sysops can drop in and delete exonyms, offensive quotations, derogatory senses, and the like as they are noticed. Vox Sciurorum (talk) 16:05, 19 August 2021 (UTC)[reply]
    I think jew, welsh and Eskimo are safe, though the definition of the common noun 'Eskimo' shoud mention rather than use the proper noun. The use of the language name 'Welsh' may be protected by the infeasibility of coming up with something else. --RichardW57m (talk) 10:45, 25 August 2021 (UTC)[reply]
  • I've parsed conduct as not including content, but that doesn't seem to be the case. – Jberkel 11:34, 25 August 2021 (UTC)[reply]

Support for Middle Bulgarian

[edit]

Do we have any support for Middle Bulgarian? We have for Old Bulgarian / Old Church Slavonic (ISO cu) and Modern Bulgarian (ISO bg), but as far as I can find, nothing about Middle Bulgarian. Bogdan (talk) 19:05, 20 August 2021 (UTC)[reply]

Not from my side, since it is not clear what it would encompass. What’s its corpus and what not? Before its eventual introduction one has to solve first the issue about the naming of “Old Church Slavonic”, which is often not that old, Wiktionary:Beer parlour/2019/September § I want to add Church Slavonic terms, see also Wiktionary:Beer parlour/2017/March § Old Russian vs. Old East Slavic; Old Slovak, Old Ukrainian, Old Belarusian, etc., as in the example of лиликъ (lilikŭ) sometimes new enough to be borrowed from Old Ottoman. Is this word Middle Bulgarian? You might not add clarity by splitting two unclear languages into three. Fay Freak (talk) 19:48, 20 August 2021 (UTC)[reply]
Sure, it's difficult to tell apart these languages apart because the changes were gradual and often there was a conservatism in spelling.
I met this problem with some Romanian words that have etymologies that are past the Old Church Slavonic phase of Bulgarian (for instance, because the nasal vowels were lost), so they were described as being borrowed from Middle Bulgarian. How should I describe them in the etymology section? Bogdan (talk) 18:29, 21 August 2021 (UTC)[reply]
@Bogdan: Middle Bulgarian is most often treated as a form (recension) of Church Slavonic by scholars in the field, although there is confusion around the subject, particularly for the reason that there is no clearly-defined corpus of either Church Slavonic or Middle Bulgarian texts distinct from each other (see William Veder’s “The Trouble with Middle Bulgarian” for some discussion of these issues). As things currently stand, we include later Church Slavonic under the Old Church Slavonic header, so if we were to treat Middle Bulgarian as a variant of Church Slavonic, it would also fall under the OCS header. In that case we should probably introduce Middle Bulgarian as an etymology-only language with its own code, but without separate entries from Church Slavonic. However, as User:Fay Freak notes, we still have unresolved questions about how to treat the OCS header and New Church Slavonic in general, so there is certainly room to organize things differently if a better solution presents itself.
For now I will add Middle Bulgarian as an etymology-only language so you can at least use it in etymology sections. We’ll have to use a two-part code because the ISO doesn’t list it; would cu-bgm be acceptable? — Vorziblix (talk · contribs) 15:49, 25 August 2021 (UTC)[reply]
Yes, an etymology-only language would be great, since this is what I need it for (borrowings in Romanian). From what I understand, in Middle Bulgarian texts, they often used the Old Church Slavonic spelling, even though the underlying phonetics had changed. Bogdan (talk) 18:57, 25 August 2021 (UTC)[reply]
Great. The code has been added! — Vorziblix (talk · contribs) 20:54, 25 August 2021 (UTC)[reply]

Universal Code of Conduct: Enforcement draft guidelines review

[edit]

The Universal Code of Conduct Phase 2 drafting committee would like comments about the enforcement draft guidelines for the Universal Code of Conduct (UCoC). This review period is planned to begin 17 August 2021.

Community and staff members collaborated to develop these draft guidelines based on consultations, discussions, and research. These guidelines are not final but you can help move the progress forward. Provide comments about these guidelines by 17 October 2021. The committee will be revising the guidelines based upon community input.

Everyone may share comments in a number of places. Facilitators welcome comments in any language on the draft review talk page or by email. Comments can also be shared on talk pages of translations, at local discussions, or during round-table discussions and conversation hours.

There are planned live discussions about the UCoC enforcement draft guidelines:

Wikimania 2021 session (recorded 16 August)
Conversation hours - 24 August, 31 August, 7 September @ 03:00 UTC & 14:00 UTC
Roundtable calls - 18 September @ 03:00 UTC & 15:00 UTC

The facilitation team supporting this review period hopes to reach a large number of communities. Having a shared understanding is important. If you do not see a conversation happening in your community, please organize a discussion. Facilitators can assist you in setting up the conversations.

Discussions will be summarized and presented to the drafting committee every two weeks. The summaries will be published here.

The full announcement and translations can be found here.

Please let me know if you have any questions. Best, JKoerner (WMF) (talk) 23:15, 20 August 2021 (UTC)[reply]

Voting opens for the Wikimedia Foundation Board of Trustees

[edit]

Voting for the 2021 Board of Trustees election is now open. Candidates from the community were asked to submit their candidacy. After a three week long Call for Candidates, there are 19 candidates for the 2021 election.

The Wikimedia Foundation Board of Trustees oversees the Wikimedia Foundation's operations. The Board wants to improve their competences and diversity as a team. They have shared the areas of expertise that they are hoping to cover with new trustees.

The Wikimedia movement has the opportunity to select candidates who have the qualities to best serve the needs of the movement for the next several years. The Board is expected to select the four most voted candidates to serve as trustees. This term starts in September and lasts for three years. Learn more about the Board of Trustees in this short video.

Vote now until August 31.

Below is some useful information about the election process.

Learn more about the candidates
Candidates from across the movement have submitted their candidatures. Learn about each candidate to inform your vote. The community submitted questions for the candidates to answer during the campaign. Candidates answered the list of community questions collated by the Elections Committee on Meta.

Vote
Voting for the 2021 Board of Trustees election opened on 18 August 2021 and closes on 31 August 2021. The Elections Committee chose Single Transferable Vote for the voting system. The benefit of this is voters can rank their choices in order of preference. Learn more about voting requirements, how to vote, and frequently asked questions about voting.

Please help in the selection of those people who best fit the needs of the movement at this time. Vote and spread the word so more people can vote for candidates. Those selected will help guide the Wikimedia Foundation and support the needs of the movement over the next few years.

The full announcement and translations are available here. Please reach out with any questions. Best, JKoerner (WMF) (talk) 23:16, 20 August 2021 (UTC)[reply]

Voting for this ends in a few hours. (Some of the candidates are of the "Don't push things on communities that don't want them" type, thankfully.) --Yair rand (talk) 19:21, 31 August 2021 (UTC)[reply]

order of definitions

[edit]

I recently updated the order of a set of five definitions for a word. I moved obsolete meanings to the end of the list then moved jargon secondary to the everyday meaning. If this makes sense to those that can carry the torch from here, go for it. — This unsigned comment was added by 198.2.80.218 (talk).

  • I prefer chronological order when there is a clear evolution of meaning over long periods. Vox Sciurorum (talk) 13:31, 21 August 2021 (UTC)[reply]
  • I understand the argument for chronological order. However, for practical purposes I believe that it is offputting and unhelpful to our readers to put obsolete senses first. I advocate "common modern senses first + logical organisation". I believe that the etymology section is the better place to explain sense development; whether long ety sections should be placed first is another conversation. Mihia (talk) 22:29, 21 August 2021 (UTC)[reply]
  • I completely understand the desire to put most commonly used and modern senses nearer the top - as that makes sense from a usage perspective; however, most dictionaries do it the opposite way - the way explained above by Vox Sciurorum, which shows sense evolution (a very interesting feature imo), and which I also seem to favour. I think it is not out of the realm of reason to assume that most users coming to Wiktionary will already be familiar enough with how most other dictionaries order their senses, so it shouldn't be a shock to them if we have it that way. But if we decide on the other, I can live with that too :) Leasnam (talk) 22:30, 21 August 2021 (UTC)[reply]
  • There are good reasons for doing it either way, and no general agreement on which to choose. As a result, it's unlikely that we ever will have a formal policy either way. I would advise not making systematic changes to the order for their own sake, out of respect for those who disagree. If you're making extensive changes for other reasons that involve the ordering of senses or if you're adding senses, there's some room for arranging things the way you want to. Chuck Entz (talk) 23:50, 21 August 2021 (UTC)[reply]
  • Also, as has been suggested before, both by me and I think by others, in the most ideal of ideal worlds each sense would have a "first attested" date attached, by which senses could be sortable. The "by commonness / in logical order" ordering, which can hardly be automated, would be editor-created, and then chronological order could be user-selectable, though the way in which this would work with sub-senses would need thinking through. I guess at the moment this is a fairly remote aspiration. Mihia (talk) 00:48, 22 August 2021 (UTC)[reply]

French, Tagalog, Portuguese given names categories include Japanese names for no sourceable reason

[edit]

See this discussion. Briefly, most Japanese (and seemingly only Japanese) given names appear to populate a handful of other languages' given name categories (almost always presented as "borrowed from Japanese"). There are zero supporting sources for these inclusions, it is inconsistent with every other cultural name, and they all seem to have been added by a couple IPs over a few days in 2018 and sporadically in 2020 without discussion. The logical extension of this is pretty clearly untenable: adding L2s for every language for every single name would completely negate the purpose of categories and make all given name pages hundreds of thousands of bytes. I propose we get rid of these L2s entirely unless actual, broad, cultural usage can be established. JoelleJay (talk) 18:32, 22 August 2021 (UTC)[reply]

Pinging participants in prior discussion: Lambiam, PUC, Mahagaja, Mnemosientje, koavf, -sche, J3133, Tooironic, Andrew Sheedy. JoelleJay (talk) 18:38, 22 August 2021 (UTC)[reply]

Perhaps we should limit CFI for proper names to "clearly widespread use". Naming 3 babies "X Æ A-12" should not be enough to justify its inclusion. Vox Sciurorum (talk) 19:31, 22 August 2021 (UTC)[reply]

Vox Sciurorum, we don't even have attestation any French people have these Japanese names, which to me would indicate grounds for immediate removal. But I've been reverted and told to come here to get consensus, so here I am. JoelleJay (talk) 23:06, 23 August 2021 (UTC)[reply]
The Wikipedia article Basque surnames starts with this definition: “Basque surnames are surnames with Basque-language origins or a long, identifiable tradition in the Basque Country.” While not immediately usable in precisely the form “X-language origin or tradition in X-country” – Baxter has a Middle English origin, and we do not use geographic criteria – I think this is the right spirit. It explains why the surname Iturri is categorized on Wikipedia as a Basque-language surname, and not, in spite of its bearers Carlos Iturri and Simón Iturri Patiño, as a Spanish-language surname. Also, the English surname Vanderbilt comes from a Dutch-languages surname, which however is spelled van der Bilt; IMO Vanderbilt should not have an L2 of “Dutch” regardless of its origin. Smithing a criterion is not trivial, but if workable I expect it will be accepted by the community.  --Lambiam 06:45, 23 August 2021 (UTC)[reply]
Lambiam, maybe I'm not understanding wiktionary PAGs, but shouldn't all these L2s be attested, or is that only necessary for standalone entries? Can someone just add L2s indiscriminately or is there guidance somewhere on when an L2 is warranted? JoelleJay (talk) 23:06, 23 August 2021 (UTC)[reply]
Every entry must be under some L2 heading. This should be a language we recognize; see Index:All languages and Category:All languages. There is also an L2 heading Translingual for items that cannot be assigned to any specific language, such as $$$ or SARS-CoV-2. Each entry must be attestable; see Criteria for inclusion. These criteria give no guidance, however, on which language(s) to assign to a specific entry. The mere fact that term X is used in a text written in language Y does not necessarily mean X belongs to the lexicon of Y. There is an essay on code-switching giving some hints on how to discern uses that belong to a different language, but they are not helpful in the case of given names or surnames. Take the surname “Bernoulli”. We list it currently under the L2 heading of English. While there is no problem in attesting the use of the name in English texts, there is also no problem in attesting the use of the name in French or German texts. The Bernoulli family was from Baseldytsch-speaking Basel in Switzerland, originating from Antwerp where Brabantian was spoken, and the surname is still common in Basel.[1] IMO, the assignment of the L2 heading of English to this surname is no better than calling it Swahili. A point can be made for listing the variant “Bernouilli” as, specifically, French, being an obsolete French spelling (pronounced /bɛʁnuji/) of the name “Bernoulli”.  --Lambiam 07:00, 24 August 2021 (UTC)[reply]
Lambiam, thank you for your patience! So if someone at some rollback-requiring time in the past added to, e.g., the name Kenzo the L2 heading of Swahili and thereby introduced Category:Swahili male given names and the 9+ other categories that go along with it (Category:Swahili lemmas, Category:Swahili uncountable nouns, etc.), there is no way to remove it without generating consensus on that page or here? There is no PAG support for removing obviously inappropriate, inconsistent, unattested L2s if the edits went unnoticed? Could they have been reverted if someone had noticed them at the time? What is stopping people from adding all 42,000 possible categories to every given name?? JoelleJay (talk) 19:56, 24 August 2021 (UTC)[reply]
At the English Wiktionary we operate a lot less by policies and guidelines than over at the English Wikipedia and offer much less room for wikilawyering, but we do have page deletion guidelines with a list of possible reasons for speedy deletion, such as “complete rubbish”. If someone adds, say, Kadzo as a Swahili surname, I can’t easily tell whether this is vandalism, a mistake, or actually a valid Swahili surname. When I have a serious reason to doubt the validity of an entry, the best course for me is usually to signal the issue and ask for more input. Often, others are able to find the verification I could not find. If an entry cannot be verified, it should be deleted. That is not an issue of consensus. Consensus is required for entries that can be verified and are not speediable, yet are thought by a user not to meet the inclusion criteria. Any discussion is then, typically, about the interpretation of these criteria when applied to the challenged term. There are areas where these criteria are known to be unclear, but if someone goes on a spree of adding dubious entries while hiding behind the argument that there is no rule against it, the most likely outcome is they’ll soon find themselves blocked and their changes rolled back.  --Lambiam 23:07, 24 August 2021 (UTC)[reply]
Ah I see, I'm definitely approaching this from a Wikipedia perspective! I was instructed by Robbie SWE to get consensus here (diff) to delete them, so it's frustrating that on the one hand I am reverted based on policy but on the other, despite there being no logical reason these L2s added in quick bursts by IPs should exist, there also doesn't seem to be anything in policy that would prohibit them (other than utter lack of CFI attestation) if the changes went unnoticed for years. Then again, if WT doesn't have the PAG policing of WP I see nothing stopping me from going on a POINTy spree adding all the other possible L2s...(kidding). JoelleJay (talk) 02:27, 25 August 2021 (UTC)[reply]
Only if you manage to keep your misconduct below the radar, which, I think, is equally true for point-provers at our sister projects.  --Lambiam 12:40, 25 August 2021 (UTC)[reply]
Despite carefully reading the prior discussion from January, I just couldn't discern if we'd decided that deleting these sections in other languages would be the best course of action. I agree that some very language-specific names might not deserve their own sections in various languages, but I fear that we risk setting ourselves up for a constant flood of edit wars. That's the reason why I suggested bringing it up in the Beer Parlour. --Robbie SWE (talk) 09:01, 25 August 2021 (UTC)[reply]
There was no concretely formulated proposal in the discussion of last January. An earlier proposal failed strongly, I think due to the proposed criterion not being considered workable. Shall we work here on formulating a new proposal?  --Lambiam 12:40, 25 August 2021 (UTC)[reply]
I read the prior proposal, and the end result of doing nothing to limit L2s just really isn't feasible if we want given name categories to be even remotely usable. We acknowledge other types of words are etymologically "from" a particular place and don't categorize them as lemmas in every other language; why should it be different for names? If a single, even entirely hypothetical instance of someone in France being given a particular foreign-borrowed name means that name is now "French", why have given names by language categories at all? The Insee database of the nearly 700,000 name-instances in France (excluding Mayotte) since 1900 is very easy to check for validation of widespread use. Going through category:French female given names, I see Akiko has been used only 34 times in 120 years, Aoi 23 times, Atsuko 0 times. At least 20,348 unique names have appeared 20 or more times just in France; considering all Francophone countries we'd be getting orders of magnitude more unique names etymologically "from" more unique languages. These databases exist for many countries, but even if unavailable the burden should absolutely be on the editor trying to add a "borrowed from X" L2 to a given entry to demonstrate widespread use. JoelleJay (talk) 20:29, 25 August 2021 (UTC)[reply]
There may be a case to be made here for classing given names in a particular script as "Translingual", and adding entries in any specific language iff that name can be shown to be used by speakers of that language. Curious what others think. ‑‑ Eiríkr Útlendi │Tala við mig 23:53, 25 August 2021 (UTC)[reply]
I'd be fine with that. In previous discussions on Translingual proper nouns we've had some objections because pronunciation can differ of Translingual terms differs according the language context in which it is spoken. Thus Translingual taxonomic name entries have no pronunciation section. I don't think that is satisfactory for names of persons. I think users would expect to have pronunciation sections for proper names. Should that be how speakers of a given language actually pronounce a name or how most bearers of the name living in that language area hope others will pronounce it? My grandfather (or an immigration officer) simplified Dühring to During for his neighbors and customers, with a pronunciation change following the change in orthography. DCDuring (talk) 03:00, 26 August 2021 (UTC)[reply]
Perhaps the immigration officer was anti-Dühring ;).  --Lambiam 09:37, 26 August 2021 (UTC)[reply]
Doesn't this risk only affecting non-Western names? I mean, here in Sweden, we've adopted loads of Anglo-American names and naturalised them, making them "typically" Swedish (Liam is currently number 5 on our top list of the most popular baby names for boys 2021). But what about Swedes named Padma, Sunita, Ahmed or Muhammed? Is someone going to come along and delete their hypothetical Swedish sections, justifying their actions with the statement "not a Swedish name"? I feel that this issue is far too complex for a Translingual solution. --Robbie SWE (talk) 08:13, 26 August 2021 (UTC)[reply]
I suppose we can agree that it would be most unreasonable to list the name Tiphaigne de la Roche as an English surname. Three independent instances of use can be attested in English texts that are permanently recorded and span decades,[2][3][4] so a naive appeal to our CFI is not sufficient to exclude it. Comparable issues exist with the names of dishes; is huevos revueltos[5][6][7] an English term, like huevos rancheros? The cited uses of huevos revueltos are in a context of ordering breakfast in a Spanish-speaking country, and the term is simply the Spanish for “scrambled eggs”, so these are instances of code-switching. The uses of Tiphaigne de la Roche all refer to a French individual; this is IMO similar to code-switching: the French name is used because there is no English alternative, but it remains French. If someone were to add Tiphaigne de la Roche as an English surname, I’d have no hesitation to nominate it for deletion.
    The question then is, when does the use of an originally foreign proper noun stop being foreign and becomes “naturalized”? Next to the usual requirement of three attestations, I think that we can require of purported attestations for a given L2, say Zulu, of an originally foreign (non-Zulu) name, they refer to at least three different individuals using the name while living in a Zulu-speaking area. Also, cites of names of first-generation immigrants from a non-Zulu-speaking area to a Zulu-speaking area should not count for attestation purposes. This seems a workable criterion to me that, as far as I see, will root out most of the dubious L2 assignments.  --Lambiam 11:48, 26 August 2021 (UTC)[reply]
I am not convinced an originally non-Zulu name appearing three times among people living in a Zulu-speaking area would make such a name "Zulu". Are all missionaries now considered members of the culture in which they proselytized? I think there would justifiably be a lot of pushback if we started claiming "Piet" is a native Zulu name as well as Boer. It's effectively erasing ethnolinguistic heritage. If "Piet" became a very common name bestowed among Zulu-speakers that would be a different story. JoelleJay (talk) 17:52, 26 August 2021 (UTC)[reply]
Were these missionaries born there? If not, they are already excluded by my suggested rule. Obviously, Willem is not a "native" Afrikaans name either in the sense that the name originated in South Africa – unlike many Afrikaans-speaking individuals named Willem who were born there. Are we erasing heritage by listing it under the heading Afrikaans?  --Lambiam 12:36, 27 August 2021 (UTC)[reply]
Πέτρος is a Greek name; just because it's been spelled in Latin script, why does that make it a Boer name? We list some form of Πέτρος in many, many languages; at what point does it become a native name? Ethnically, Piet is not a Germanic name; it is the adopted Greek name of a man from modern-day Israel.--Prosfilaes (talk) 16:14, 27 August 2021 (UTC)[reply]
Actually, the nickname (reportedly) bestowed upon that Aramaic-speaking fisherman by an Aramaic-speaking carpenter was כֵּיפָא. Πέτρος is a masculinized translation. The ring name of Dwayne Johnson is given in Russian as Скала, a direct translation of "The Rock". If The Rock keeps performing miracles and Russia wins the new Cold War, our great great grandchildren may be named after Saint Scala.  --Lambiam 21:55, 27 August 2021 (UTC)[reply]
Robbie SWE, this is kind of embarrassing, but the whole reason I came across these entries is because I play/create a lot of name quizzes on Sporcle and thought Wiktionary would have a more comprehensive category of [language] names than Wikipedia. I am extremely familiar with name usage in various countries so seeing tons of Japanese names that I had never encountered in French/Portuguese quizzes or national databases was immediately bewildering, especially because it was just Japanese names. If other languages had had this problem too then I would've been like, "oh well, Wiktionary etymology categories are total garbage, I'll use something else", but it really stood out as an error that hadn't been fixed rather than something protected by consensus. This was reaffirmed when I saw a) they were all added by apparently single-purpose IPs in short succession, and b) that outside of Japanese this wasn't a problem with other languages' name entries, and furthermore these Japanese names weren't appearing in categories outside of English, French, Portuguese, and Tagalog. JoelleJay (talk) 21:22, 26 August 2021 (UTC)[reply]
@JoelleJay, it's not embarrassing at all – you're free to use Witkionary however you see fit. In all honesty, it did raise my eyebrows when these Japanese names were added – I remember considering a revert but decided not to because I knew that an edit war was sure to ensue. I personally believe that names should receive a special, made-to-measure category which isn't language-sensitive – not really Translingual, just something in-between, maybe more bound to name origin than specific language categories. I must however say that this attestation discussion is a bit puzzling – if I find more than 3 permanent residents in the white pages bearing a name, does that suffice? Or does it have to be news articles, obituaries or marriage certificates to attest that these names have become somewhat naturalised? It's a slippery slope... --Robbie SWE (talk) 09:10, 27 August 2021 (UTC)[reply]
Robbie SWE, I agree that names should have a separate treatment here since their "usage" is of an entirely different nature than that of even other proper nouns. Regarding your white pages comment, there are at least 34,480 unique names that have appeared for babies born in France in the last 120 years; it would also be trivial to find three instances of many of these names in French newspaper archives. If that is the threshold the community wants to use, then ok, but this would utterly abolish any etymological, categorical purpose of our name categories and we'd basically be hosting dozens of near-identical directories (languages used in many countries, or those of countries with large immigrant populations, will end up having extensive overlap of name instances) of nearly every possible name. JoelleJay (talk) 17:50, 28 August 2021 (UTC)[reply]
The issue of what language names are is thorny and comes up recurringly; (semi-)related discussions include this (2013 Info Desk), this (2018 BP), and this (2019 BP). If a name isn't given to French people, but only found in French texts that mention Japanese people, it can be handled like English Aleksei. But if 'Japanese' (or e.g. 'Arabic'...) names are given to French babies, whether by "ethnically French" parents picking 'exotic' names or by the children or grandchildren (etc) of immigrants, rarely but attestably just like some obscure "native French" (e.g. attested since Middle French, but only rarely) names, or new names converted from words (English examples: Joy, Honey, Mercedes) or invented from scratch, then excluding some of those kinds of names based on ideas of what counts as part of true French culture rather than just what exists in the language is untenable, IMO. I'm intrigued by the idea of having Translingual sections, but ... pronunciation and inflected forms differ between languages (look at e.g. Petrus or Brian), and transliterations are source- and target- language dependent, e.g. in the Russian Wikipedia Matthias Grünewald's name is Маттиас, while Matthias Castrén's name is Матиас, in Ukranian Matthias Grünewald is Маттіас and Matthias Castrén is Ма́тіас, in Macedonian Matthias Grünewald is Матијас. - -sche (discuss) 22:13, 29 August 2021 (UTC)[reply]
Is there a way we could continue to allow entries in each language, but limit categorization so that the categories are actually useful? Like by categorizing based on etymology? Andrew Sheedy (talk) 22:41, 29 August 2021 (UTC)[reply]
We do have Category:English given names from English, Category:French given names from French, et al, if someone wants to make a concerted effort to add "from=" information to names. - -sche (discuss) 23:29, 29 August 2021 (UTC)[reply]
Since AFAIK the only language that has been indiscriminately added to other languages' given name categories is Japanese, and there only to a handful of languages, wouldn't it be much easier and more consistent to just codify the standards we've apparently been using all along and boot out the clearly not widespread names? If we decide a name being given once or thrice or twenty times to a Francophone-born baby at any point in history is enough to make it "French" then the given names category will cease to have any categorical function. Names that make it in the top 1000 in any given year would be a quantifiable, objective cutoff. Even the top 500 is a decent cross-section of the diverse names actually being used in France -- from 2020 these include Jade, Louise, Chloé, Giulia, Nour, Joy, Aïcha, Jenna, Fatima, Cataleya, Kenza, Khadija, Anouk, Anastasia, Lily-Rose, Kayla, Paola, Ruby, Ashley, Wendy, Elif, Carmen, Oumou, Swann, and Hailey. JoelleJay (talk) 03:31, 30 August 2021 (UTC)[reply]
Eliminating entries so that we have neater categories is putting the cart before the horse. We have categories to store the entries that are useful, not the other way around. If we could provide consistent pronunciation sections, I'd see value in including every name in every language they've been pronounced in. Even without, I see no reason not to be generous rather than limited. The top 1000 in any year is available for only a tiny number of countries (and countries don't map neatly to languages) and a tiny number of recent years, so I don't see the usefulness of that rule.--Prosfilaes (talk) 05:23, 30 August 2021 (UTC)[reply]
Prosfilaes, why have categories at all then? And how would we be able to determine the pronunciation of any of these names? The French etc. L2s in the Japanese entries only say "borrowed from Japanese"; how is it useful to know that one child with that name hypothetically could have been born in a francophone country (because, again, none of these L2s are attested and for many of them not one baby has been born with that name in a country speaking that language)? How is it possibly useful to have 40,000 categories for every single name?? JoelleJay (talk) 21:57, 30 August 2021 (UTC)[reply]
I'm speaking to how to handle things that we include : if a name doesn't meet WT:ATTEST, it can't be included at all, no matter how it's labelled or categorized, and it's moot to discuss it here because it should just be RFVed and deleted on that basis. Regarding categories: we already have categories for French names derived from French. That the more general category of all French names is currently incomplete and mostly contains ones from French and Japanese is something to address by making it more complete. - -sche (discuss) 22:33, 30 August 2021 (UTC)[reply]
So in most of these cases the name can easily be attested, it's the inclusion of French etc. L2s that isn't (these were added indiscriminately by IPs). And I'm not seeing a category for French names derived from French (which would be ideal) for the etymologically-French names. If that exists it should be properly autopopulated somehow. JoelleJay (talk) 19:06, 1 September 2021 (UTC)[reply]
We have categories to group related names. We would determine the pronunciation of names the same as any other words; preferably with direct audio evidence. How is it Wiktionary's job to keep track of who was born where? If they're RFVed, we won't keep unattested names. That doesn't mean we need to limit it to the Top 500 names or whatever arbitrary line..--Prosfilaes (talk) 23:51, 30 August 2021 (UTC)[reply]
Prosfilaes, I tried to get rid of the unattested L2s (not the whole name entry) but was reverted and told to come here... Again, these L2s were added in bulk by some IPs 3 years ago with no attestation; how else do we demonstrate a word is not widely and commonly used in a language if not through databases documenting its lack of usage in that language? How is anyone supposed to show these names aren't related in the way they are grouped? And re: pronunciation: most of these names are functionally non-existent in France/French-speaking countries so you won't find audio evidence. Some of them may have never been spoken by a French/Portuguese/Tagalog speaker ever, let alone recorded... And, I'm still confused how it's at all workable to add 10+ categories for every language to these names; Kumiko is already stretching my screen with 40 cats (although only two of them are Japanese cats and none of them are "[romanizations of] Japanese female given names"). What possible use is there for this name, pronounced the Japanese way, to appear in category:Spanish terms with IPA pronunciation? JoelleJay (talk) 19:06, 1 September 2021 (UTC)[reply]
You don't just delete entries, you take them to WT:RFV. "Widely and commonly used" is not the standard; the standard for keeping words is 3 uses under the rules of WT:CFI. Kumiko shows hundreds of French works using it at HathiTrust, including works like Les geishas, ou, Le monde des fleurs et des saules Robert Guillain (1988).
You care about the categories; I don't. No amount of fuss about the categories will shift me at all.--Prosfilaes (talk) 21:57, 1 September 2021 (UTC)[reply]
You have to take L2 additions to RFV? I am not talking about the name itself, I am talking about the sections for French etc. proper nouns that were added to existing Japanese entries. And my impression was that words that are just translated, or are only mentioned in the context of their native language, were not considered new "borrowed" words in other languages. Like, a few articles in Hausa discussing Marxist geography do not make "Karl" a Hausa given name, that would be ridiculous. Tagalog newspapers commenting on the inauguration and later assassination of Moïse obviously don't make "Jovenel" a Tagalog name either. This is not how the given names entries and categories have ever operated, otherwise we would be seeing more than just Japanese names scattered around. If the categories - -sche mentioned above exist that would be a reasonable workaround, but AFAICT they do not. JoelleJay (talk) 19:39, 2 September 2021 (UTC)[reply]
What do you mean by L2 additions? Any new sense is supposed to be RFDed or RFVed.
I've been in many discussions here about when a word is borrowed or when it's merely code-switching--e.g. tovarish, Moskva, Schultüte--and it's definitely something that needs to be discussed. If some learns that "Karl Marx ya rubuta game da kwaminisanci." ("Karl Marx wrote about communism." according to Google Translate), I don't see why kwaminisanci is Hausa but Karl Marx isn't; they're both concepts a monolingual Hausa speaker will handle the same way. Is Quetzalcoatl an English word?
I don't think that a name having an entry under a language heading in Wiktionary should mean it is a given name in that language. I'm not concerned about that, and don't see why this dictionary should be. If anything, it's an Appendix problem, not a mainspace problem.--Prosfilaes (talk) 23:01, 4 September 2021 (UTC)[reply]

──────────────────────────────────────────────────────────────────────────────────────────────────── Ok, I guess I will send some of these senses to RfV, although I suspect that it would be trivially easy to find instances of French media discussing some Japanese people with a particular name or even a few instances of Japanese immigrants giving the name to their French-born children. I think Vox Sciurorum's comment earlier regarding "clearly widespread use" is relevant here since basically every given name in any language will have appeared 3+ times in CfI-compatible sources in an inherently "used" capacity without any ambiguity as to "definition", whereas a borrowed common noun has the hurdle of needing to be defined in the recipient language first.

Your point about borrowing is interesting and I think there's a lot of philosophical/linguistic discussion to be had there. I believe there is a distinction between a proper noun/term that refers to a single entity ("Quetzalcoatl", "Karl Marx"), which could easily appear in a native language dictionary untranslated, and any component words/terms ("coatl", "Karl"), which would not appear on their own. I also feel untranslatable (usually proper) nouns in general occupy a different space cross-language-wise from nouns that could be or are translatable, and that even their existence in a native dictionary doesn't necessarily make them a [language] word, especially if their only sense is in referring to a specific thing (the person "Karl Marx") or under a specific context (the name "Karl" as used by other German people discussed in [language] texts).

Whenever a new language heading is added to a given name entry, around ten new categories (including "given names in [language]" and "[language] [n]-syllable words") are automatically populated with the name. I don't really understand what you mean by  I'm not concerned about that, and don't see why this dictionary should be. If anything, it's an Appendix problem, not a mainspace problem.  JoelleJay (talk) 21:50, 6 September 2021 (UTC)[reply]

I completely agree that the current given name problem needs a solution. Most of the currently questionable ones have been added by IPs (presumably the same editor) over the years. — surjection??21:55, 6 September 2021 (UTC)[reply]

Incorrect information in Macedonian entries

[edit]

Could someone generate a list of all the Macedonian entries to which the template {{mk-IPA}} has been added to generate a phonetic transcription by users who do not have Macedonian listed as a native language in their Babel box? Since the module was created, many non-speakers have been adding it to words which the module cannot handle without a manual respelling, probably assuming that it was fully automatic. I have spotted many incorrect transcriptions and have corrected them, but there are probably many others which I have not yet come across.

If possible, I would also like to see a list of entries to which a declension or conjugation table has been added (a template starting with {{mk-decl}} or {{mk-conj}}) by non-native speakers, since I have come across incorrect inflections too, e.g. at презаспие (prezaspie), which I fixed today after creating a new template that could accomodate this verb.

Perhaps @TheDaveRoss could help since he generated a report for me in 2016?

Martin123xyz (talk) 20:02, 22 August 2021 (UTC)[reply]

A whole lot of Old English sound files

[edit]

(Notifying Benwing2, Leasnam, Lambiam, Urszag, Hundwine, Mnemosientje, The Editor's Apprentice): and @Mahagaja. User:Rafcki has been uploading Old English sound files to Commons and then adding them to our Old English entries en masse. At best, these sound to my ear a bit amateurish. At worst, they have a tendency to accent both syllables when the IPA shows the first syllable accented, and treat some vowels as long when the IPA doesn't- not to mention slipping into modern English vowels now and then. Probably the worst is at Old English heafod:

Audio WS:(file)

which sounds to me almost like [ˈhɛ͜oː.ˈvɒd].

What, if anything, should we do about this? Chuck Entz (talk) 22:37, 22 August 2021 (UTC)[reply]

Hi there,
I was alerted that you were talking about me. No hard feelings or anything, however I think some of your points I can address here. The quality of the microphone may be an issue with some of my pronunciations, as I am a native English speaker, and I definitely do say "heafod" with a front "ae" sound. Likewise, the microphone I am using, which is not of good quality, may be the issue. Also, my stress is definity a little off. I also speak a bit of Polish, so that may be where it comes from.
As for the other vowel quality sounds, I really can't explain to you why I seem to be slipping into "English vowel" sounds, might be your ears picking up on something I am not.
As for how I sound say these sounds, I am really trying my best here, but this is the point I take most issue on. I have been deeply self-studying Old English for the past three-ish years. Reading everything from blogs, to academic books, and I can confidently say that there is no set opinion on how Old English was spoken. Take the issue of diphthongs, some belief that they were pronounced the way they were spelt, others believe that they had a schwa at the end of them. Thus, there is no consistent agreement, although most seem to favor the former over the latter.
I think I spent the last four hours making these pronunciations, and if y'all can make better ones, then by all means please do. However, I just made these so folk can hear what can be one dudes take on how Old English, the language of my foreskin, was spoken.
Also, I am completely fine with re-recording myself over again if you would like that. I do agree on the stress part, and I apologize for that. — This unsigned comment was added by Rafcki (talkcontribs) at 00:03, 23 August 2021 (UTC).[reply]
The language of your foreskin??! JoelleJay (talk) 00:26, 23 August 2021 (UTC)[reply]
Yes, you were alerted- intentionally. Linking to someone's user page in a signed message does that. This isn't a question of whether you meant well, but whether the results are something we want in our Old English entries. If I thought you were commiting vandalism, I would have already reverted all of your edits (though your talk-page edit made a good point and was quite helpful, so I would have left that one). Indeed, I don't consider myself qualifed to single-handedly pronounce judgment on your efforts- that's why I brought it here.
Generally we prefer to limit sound files to fluent (preferably native) speakers. In the case of a language that's been dead for a thousand years, that's obviously not possible, but that makes accuracy all the more important- there's no native speakers to compare against or to spot any errors. Chuck Entz (talk) 01:19, 23 August 2021 (UTC)[reply]
These pronunciations are just plain unhelpful. They betray a very strong accent, and make little effort to produce even some of the most basic phonemic distinctions in OE, like vowel length, correctly. I think Rafcki should keep the language of his foreskin to himself, and the audio files should be removed from all entries. —Μετάknowledgediscuss/deeds 01:57, 23 August 2021 (UTC)[reply]
Yes, please remove them all. --{{victar|talk}} 04:33, 23 August 2021 (UTC)[reply]
I agree, please have them deleted. And check your autocorrect settings or learn how to spell forebears. —Mahāgaja · talk 06:33, 23 August 2021 (UTC)[reply]
I also agree that many of these recordings are off the mark and should be removed. Every diphthong turns out disyllabic-sounding, actual disyllables have separate stress on the final syllable that in some cases is stronger than on the main syllable. In hine, the medial n is lengthened, and in hire, the vowel is lax, but long. About lax: OE short vowels were probably lax, but the i in wit is lower than cardinal [e], which is unlikely. There are some like git, min and þu that do the job, so they could actually stay. –Austronesier (talk) 08:31, 23 August 2021 (UTC)[reply]
Rafcki: As an uninvolved editor, I was just glancing over this discussion. I will assume that you intended to use a descriptive of your forerunners, rather than the anatomical word you chose. I ask that you change the questionable word in your reply, or at least strike that part of the text. Used in the context as it currently appears, I don't think the crude language helps a Foundation project. Thank you for your consideration. — CJDOS, Sheridan, OR (talk) 08:36, 25 August 2021 (UTC)[reply]

I agree in substance with this issue. The recordings needed improvement. But I feel uneasy about the way this issue was treated. User:Rafcki clearly put in a lot of effort and with a bit of guidance could have become a good contributor. Instead they got what reads like harsh criticism from a group and mockery for a both funny and innocent mistake. From what I can see this was uncalled for and I wouldn't be surprised if they don't return. For me this didn't live up to the maxims 'Be helpful' and 'Be considerate' in Help:Interacting_with_other_users. I know everyone has much to do, but slightly kinder wording would have made a big difference. —caoimhinoc (talk) 18:59, 30 August 2021 (UTC)[reply]

Affixoids, do they exist – not yet recognized on Wiktionary (explicit language)

[edit]

We currently do not have the part-of-speech categories “prefixoid” or “suffixoid”. This makes en.Wiktionary’s categorization of terms disagree with native lexicography and with itself, across entries.

Talk is of German scheiß-, hunds-, German mords- and the like; if you search for “affixoid” or “prefixoid” you find this term in descriptions of other Germanic languages modern and historical, terms surprisingly not invented by German professors according to Wiktionary on prefixoid. Now it is easy to see that in scheißegal (totally unimportant) “scheiß” is an adverb. In German Legt beim Fahren euer scheiß Handy weg (Put away your bumbaclot phone when driving) it is then an adjective. For in the English sentences, discussed isolated, you damn sure see this way! dead is an adverb, goddamn is an adjective (also an adverb like fucking?).

However editors made a “prefix” out of scheiß-, while I analyzed it with its living etymological connections as adverb under scheiße, at least in the entry for scheißegal, unlike in scheißekalt, which differs again from scheißkalt. todernst is often written dead-serious in English, where dead is defined as adverb. I cannot recommend creating dead- with a hyphen either, no matter whether it is attested with spacing, hyphen or written together, nor the resulting term. todernst and dead-serious cannot be a compound or derivation, because dead is an adverb, isn’t it? The spelling does not change the POS, am I right, it’s the bloody same word?

And if it is not derived nor compounded then it is SOP – do you follow? So the kin of hundsgemein needs to be deleted, and the English equivalent terms too, notwithstanding WT:COALMINE, which aims at compounds which are SOP, while this is isn’t even a compound or affixation. The contents of Category:German words prefixed with scheiß- have to be removed, in so far as the terms come from this prefixoid. I say this because forsooth there are two etymologies for Scheißkerl: One is a real compound (meaning somebody with shitty character), one is just the SOP prefixoid (meaning a man of pure manhood, or not even anything added at all like fucking only expresses intense emotion), because prescriptivists write the prefixoids together with the words they are attached to. Additional exercises are found at Wiktionary:Requests for verification/Non-English#scheiße, where a narrow IP demanded spellings for a thing that cannot be concluded from spelling and is a policy question, if you miss entropy to be sure what I am talking about. In short German dictionaries include Scheiß- and scheiß-, so de:arsch- and de:scheiß-, “Präfixoide”, however we are a multilingual dictionaries so I opt for a way wherein the languages harmonize.

We could of course have arsch- because of with arschkalt and the like dictionary users search it. (fehl- is more like historical a prefixoid, now a full prefix — although for German we have the pesky idea of “separable prefixes” to which it belongs; verbs with these are surely not generally SOP.) Still however this is just a convention as this all not attested with hyphen at the end, a convention for unilingual word lists which we make an alternative to, and somehow one could have all shapes, as which such a thing could be looked up, at multiple pages (the tyranny of page titles strikes again). {{alternative analysis of}}?

The spelling for the prefixoid of hundsgemein opens a problem because of the Fugenelement (“interfix”) it contains. Now it is not ruled out that man creates hunds (adverb) as it is very defensible, as scheiße (adverb) in the adverb, but we also might write it to Hund (noun) saying in a gloss that it is also used as a kind of intensifier and reh teh teh and indicate in the context label ({{lb}}) that it is only written together and with -s-. A good example for this is Persian خر (xar) (where the resulting terms are often not SOP because of not-to-be-guessed meanings.) Fay Freak (talk) 22:24, 24 August 2021 (UTC)[reply]

"We currently do not have the part-of-speech categories “prefixoid”":
Prefixoids as well as some neoklassische Formative (neoclassical formatives) or Konfixe (e.g. bio-) are categorised as prefixes. (Some other neoklassische Formative or Konfixe are categorised as suffixes.)
WT uses some grammar terms loosely, like "Derived terms" is used for both derived terms (derivatives, derivates) and compound terms (compounds). And in this case, Duden online too calls scheiß-, Scheiß-, hunde-/Hunde- prefixes.
"not invented by German professors according to Wiktionary on prefixoid":
Doesn't matter, though according to WP he lived and studied (for some time) in Germany and later was a professor.
"Now it is easy to see that in scheißegal (“totally unimportant”) “scheiß” is an adverb.":
No. scheißegal is one term. It doesn't attest *scheiß (like Langzeitstudie which is lang +‎ Zeit +‎ Studie or Langzeit- +‎ Studie [capital L is questionable but used by WT] does not attest *Langzeit), and could for example also be a compound Scheiße/Scheiß + egal (and a slight change of meaning could be compared with Haupt (head) and haupt- (main)).
"In German Legt beim Fahren euer scheiß Handy weg (Put away your bumbaclot phone when driving) it [= “scheiß”] is then an adjective.":
Standard/prescribed spelling is Scheißhandy, cp. “Scheiß-” in Duden online (see also “scheiß-” in Duden online).
"I analyzed it":
How it is (descriptivly), or how it should be (wishful thinking, prescriptivly, cp. diff)?
dead (adv.) vs. dead-:
If dead does exists, terms with dead- (like dead-serious) could simply be compounds, and SOP as "hyphenated compounds". However, as for German it was tried in the other way: based on scheißekalt, it was tried to create scheiße (adverb). But this doesn't work: cp. above with Langzeitstudie and possible formation as Scheiße +‎ kalt.
"The spelling does not change the POS":
It can. E.g. in Irren ist menschlich (i.e. das Irren ist menschlich) and irren ist menschlich (i.e. zu irren ist menschlich) the different interpretations and hence different spellings change the POS of Irren/irren. And in WT there are also things like man vs. -man.
"And if it is not derived nor compounded then it is SOP":
child or Kind is neither derived nor compounded but primitive and not SOP.
For formations like Scheißhandy (whether analysed as prefix scheiß- + Handy or prefixoid scheiß- + Handy or compound Scheiße/Scheiß + Handy), they first of all are single terms. And usual argument is, that not all people know how to split it and some will look it up as a single term. (And it's also not spelled with a hyphen, so the "hyphenated SOP" part doesn't apply as well.)
"We could of course have arsch- because":
It's either arschkalt = Arsch + kalt or arsch- + kalt, but not *arsch (adv.) + kalt, if the adverb is extracted from arschkalt and doesn't occur alone.
"creates hunds (adverb) as it is very defensible":
WT:RFVN, it's probably not attested. And even if it were, it might still rather be hundsgemein = Hund + -s + gemein or hunds- + gemein.
--20:22, 25 August 2021 (UTC)
scheißegal is two terms, and it does not attest scheiß-. Where is the hyphen? I only see scheiß therein.
Kind is not multi-word, todernst and dead-serious are. But WT:COALMINE does not say that entries shall be deleted based on whether they are one word or multiple, so you are deflecting from the argument. Scheißkerl (noun+noun) and Scheißkerl (prefixoid+noun) are not the same, but one has to weigh the intended analysis, which question you fail to see, working on language like a bot.
Not going to answer to your graphocentrist ramblings much. The statement that the spelling changes the part of speech is revealing enough. rambling and revealing is not a noun based on how it is spelled, but Irren was a noun before it was spelled, nor are todernst and Scheißhandy one word because they are written together. What is a word, huh? And why does it matter? As you claim to see what matters, tell us what matters here! The spelling? But that’s prescriptivist.
Of course I expanded how it is but you refuse to see how language is and continue to try to prescribe its nature from its written form, arguing from the prescribed spelling “Scheißhandy” about what parts of speech there are, bizarrely not seeing that you are the prescriptivist here. I am not convinced you aren’t just a bugged computer, following some prescribed grammar, hence always this graphocentrist partisanship. Fay Freak (talk) 22:05, 25 August 2021 (UTC)[reply]

criteria for kyujitai and shinjitai

[edit]

User:H2NCH2COOH "Added unofficial kyujitai forms that share same components as hyogai kanji to conversion list; Added extended shinjitai conversion" to mod:ja/data/kyu, which generates automatic kyujitai forms. IIRC there is no policy or guideline deciding what can be qualified as kyujitai or shintai on Wiktionary. I guess a disscussion is needed to establish a standard, in order to avoid confusions. -- Huhu9001 (talk) 09:53, 25 August 2021 (UTC)[reply]

Agree with the need for a standard, but I have no opinion on details, although I am inclined to use JIS standards and the behavior of comprehensive Japanese fonts (Adobe's fonts, etc.) as a guide. —Suzukaze-c (talk) 20:52, 26 August 2021 (UTC)[reply]
My standard for a kyujitai-shinjitai pair is as follows: 1) If a component has different forms in joyo kanji list and hyogai kanji list, the hyogai kanji form shall be the kyujitai form; 2) both forms are encoded in JIS; and 3) the shinjitai or extended shinjitai form is used extensively. Still, there is an issue over adding extended shinjitai pairs, because that would disable the conversion of the standard forms (which is why criterion 3 exists). --H2NCH2COOH (Talk) 14:07, 28 August 2021 (UTC)[reply]
By the way, I believe it is necessary to redirect entries of Japanese glyph variants (kyujitai and variant kanji) to their standard forms like what we have done for the Chinese language. Except for a few cases, providing the same things under different entries and keeping them updated is just a waste of time (and few if any editor is doing so). Moreover, the existing kyujitai-shinjitai comparison template {{ja-kanji forms}} does not indicate whether a simplified form is extended shinjitai (this information purely concerning Japanese kanji is however tagged in the template used in "Translingual" sections), which can be confusing and misleading. --H2NCH2COOH (Talk) 10:28, 19 September 2021 (UTC)[reply]
@H2NCH2COOH: I think we're already doing that, using the {{ja-gv}} template. See the 大將軍#Japanese or 來#Japanese entries for a couple examples. ‑‑ Eiríkr Útlendi │Tala við mig 19:18, 24 September 2021 (UTC)[reply]
Yes I know that. But the template is not quite widely used, and more often than not, even readings sections of kanji entries are redundant. --H2NCH2COOH (Talk) 06:21, 26 September 2021 (UTC)[reply]

Sore-loser rule

[edit]

I've created a new vote for the adoption of a sore-loser policy: Wiktionary:Votes/2021-09/Sore-loser rule. --{{victar|talk}} 18:56, 26 August 2021 (UTC)[reply]

  • There is a pair of rules in many legislative bodies saying (1) a matter can only be brought up once in a session, (2) a member who voted in the majority can move for reconsideration to allow a second vote. Vox Sciurorum (talk) 19:01, 26 August 2021 (UTC)[reply]
    It's a shame that we need a policy. It seems like a basic principle of polite group behavior. DCDuring (talk) 19:03, 26 August 2021 (UTC)[reply]
    I think both these votes are stupid and we ought to be able to come to a decision through maturer ways. I would appreciate a structural way to deal with disagreements like this though, possibly involving at least a few uninvolved admins that didn't participate in the discussion to come to a conclusion - something like our own Wiktionary court. Thadh (talk) 21:49, 26 August 2021 (UTC)[reply]
    I agree in full (except for the first seven words of your statement). I would back establishing our own version of an arbitration committee like the one our big sister Wikipedia has. But that itself would probably require a vote, or at least some further discussion here at the BP. Imetsia (talk) 23:39, 26 August 2021 (UTC)[reply]
While I think the policy would be good, I'd personally suggest changing the name of the vote, as it doesn't sound very welcoming and more off-putting, and people can easily be turned off from it or vote against it. Also, there need to more specifics as to what the exact policy suggestion will be, where it will be listed, what type of votes it applies to, and more. AG202 (talk) 20:44, 26 August 2021 (UTC)[reply]
Next it will be contended what the same “matter of dispute” is. 🤡
Just only make good votes. Can’t vote about it. Fay Freak (talk) 20:50, 26 August 2021 (UTC)[reply]
@AG202: See Sore-loser law. If you have any suggestions to the wording of the vote, please post your suggested edits. --{{victar|talk}} 21:34, 26 August 2021 (UTC)[reply]
I'm aware of the sore-loser laws, but still think with the heightened energy at the moment, that changing it to something else might be better, but that's just my own opinion. In terms of suggestions, it does need to be explicit where this policy will go on Wiktionary, how to define matters of disputes, whether or not there's an appeals or overwrite process (ex: if a bad policy were to be passed for some reason and it needed a revote due to new circumstances), the types of votes it'll be applied to (ex: what if there's an admin that's approved that needs to be removed within 6 months?), etc. A lot of them are (somewhat) hypotheticals, but knowing this website from the time I've been on here, they'd come up at some point, so it really needs to be thought and planned out thoroughly before it's brought to a formal vote. AG202 (talk) 02:57, 27 August 2021 (UTC)[reply]
A titling that tells the deeper truth is better than slathering the name of a concept in an equivocation, dodge or complex phrase. "doesn't sound very welcoming and more off-putting" is what changes shell-shock to PTSD, toilet paper to bathroom tissue, and the like. Orwellian doubletalk. (from George Carlin PBUH) It's a sore-loser policy and let it be that. It's good, meaning-conveying, English. From reading the title you know: the policy proposal is "sore-losers can get f'd". --Geographyinitiative (talk) 02:16, 27 August 2021 (UTC)[reply]
Ummmm I didn't think it's that serious, but thanks? Said myself that I think a similar policy would be good, but with the current environment and the back-and-forths, it seemed a bit pointed. Suggesting a title change that definitely doesn't have to happen isn't Orwellian. No need to go that far, really. AG202 (talk) 02:53, 27 August 2021 (UTC)[reply]
I obviously am biased on this issue, disagree with the proposed policy, plan to vote against it, and will wait for that occasion to explain my position in full. In the meanwhile, a couple points of clarification. Do you intend this vote to have retroactive power? If so, I think that should be stated clearly on the vote page itself. And second, who gets to decide what the "disputed matter" is? The templates-vote controversy was in large part based on your insistence that the new vote was essentially a reversal of a prior vote, but other users have disagreed. I don't think you've explained how to resolve such disagreements as they may come in the future. Imetsia (talk) 23:22, 26 August 2021 (UTC)[reply]
At the moment there is no threshold for creating a vote proposal, just like there is no threshold for adding a new lemma. But unlike entries, which can be proposed for deletion, there is no way to keep a vote proposal from going forward to an active vote, however ill-considered the proposal may be. Might it help if we require a certain number of endorsers before a vote proposal becomes active? (Endorsing a vote proposal does not mean that the endorser favours acceptance, but merely that they think it is helpful to the Wiktionary project if we get a ruling on it.)  --Lambiam 14:00, 30 August 2021 (UTC)[reply]
A somewhat original idea, but I would support it. An analogue is that of "granting certiorari" (which only requires 4/9 to go through), although our task would be of much lesser value/prestige. Imetsia (talk) 00:18, 31 August 2021 (UTC)[reply]
I meant this for votes in general, and not specifically votes for revisiting earlier decisions. Above, someone wrote, “If the vote shouldn't have happened in the first place, someone should have stepped in to stop it.” But no one, also not admins, have the right to stop a vote from going through. Elsewhere I have expressed the wish that the sore-loser vote be not held, but afaik only the proposer can grant that wish.  --Lambiam 10:31, 1 September 2021 (UTC)[reply]
@Lambiam: It's not entirely true: see Wiktionary:Beer parlour/2018/May § Vote: Proficiency as a prerequisite for contribution. PUC10:36, 1 September 2021 (UTC)[reply]
So perhaps Imetsia can grant my wish 🥺. [🤣]  --Lambiam 11:04, 1 September 2021 (UTC)[reply]

New template for CJK character evolution, feedback requested

[edit]

I created Template:sinica Template:R:zh:SinicaGlyph and added it to at 水#References_2. I think this database is very useful character evolution but as someone who has only taken one course on Daoism over a decade ago, my Chinese is not too hot. I want to get feedback from other users who are more knowledgeable about CJK characters to see if this is a good source, useful for our entries, etc. before I add it elsewhere. Thoughts? —Justin (koavf)TCM 20:24, 26 August 2021 (UTC)[reply]

It's a good resource, although we already present a lot of that in English in the entry. The name of the template is extremely ambiguous, and should be changed before it gets used anywhere else. —Μετάknowledgediscuss/deeds 03:07, 27 August 2021 (UTC)[reply]
I would suggest something like "R:zh:SinicaGlyph" so you can see at a glance what it's about. Chuck Entz (talk) 03:44, 27 August 2021 (UTC)[reply]
Renamed. —Justin (koavf)TCM 15:38, 31 August 2021 (UTC)[reply]

Wikipedia's policies

[edit]

Occasionally, I've seen people reference WP:POINT or WP:CANVASS, and I've pointed to WP:PRINCIPLE before. But obviously polices on notability, etc. have no place here on Wiktionary. So do we have a standard for which WP policies we hold to, and to what extent they're enforceable? Imetsia (talk) 18:31, 29 August 2021 (UTC)[reply]

Those pages aren't policies. If you read them, you'll see that they explicitly state that. They are simply guidelines for polite behaviour, and we expect basic politeness just as much as our fellow editors at Wikipedia (but generally don't waste time writing our own essays about it). —Μετάknowledgediscuss/deeds 20:25, 29 August 2021 (UTC)[reply]

IPA for reconstructed languages

[edit]

What is the current feeling on adding IPA to reconstructed protolanguages? If I added IPA to Proto-Semitic entries, using {{a}} to indicate that the pronunciation is reconstructed (and whose reconstruction it is), would this be widely seen as acceptable? —Μετάknowledgediscuss/deeds 03:58, 30 August 2021 (UTC)[reply]

There isn't a blanket consensus. There seems to be a consensus that Proto-Germanic entries include pronunciation info, and consensus that Proto-Indo-European entries do not. Personally, I'm opposed to including pronunciation info for reconstructed languages, because the orthography used for a reconstruction should already be phonemic, making pronunciation info in IPA redundant. —Mahāgaja · talk 06:51, 30 August 2021 (UTC)[reply]
My understanding is we don't know the phonetic value of some Proto-Indo-European consonants, making IPA impossible. Like trying to reconstruct Arabic pronunciation from Turkish borrowings where all three h-like sounds (ه ح خ) merged and the stops were mostly lost. Vox Sciurorum (talk) 15:36, 30 August 2021 (UTC)[reply]
I would be in favor universally removing pronunciations from all reconstructed entries. --{{victar|talk}} 18:23, 30 August 2021 (UTC)[reply]
I wouldn't support that, since every transcription is different, and it would require recognising and knowing the used transcription in order to understand what phonemic values are implied. However, I wouldn't be too disappointed if others want it gone. Thadh (talk) 21:13, 30 August 2021 (UTC)[reply]
@Mahagaja: Concerning your statement "the orthography used for a reconstruction should already be phonemic, making pronunciation info in IPA redundant": have you considered that it may be phonemic, but unintuitive? For example, Proto-Semitic *š is widely thought to be */s/, but you probably wouldn't have guessed that! —Μετάknowledgediscuss/deeds 21:18, 30 August 2021 (UTC)[reply]
I support it. In general, as long as the pronunciations and/or entries (as appropriate) are marked as reconstructed, I support having them where there are references to support them (either specifically or in the form of general rules). BTW, Proto-Algonquian is another example where the conventional orthography is not intuitively phonemic (x and ç are certainly not /x/ and /ç/, they are probably /s/ and /l/). - -sche (discuss) 22:19, 30 August 2021 (UTC)[reply]
I'd say if the letters conventionally used in reconstruction are counterintuitive, that can be explained on the About page. WT:ASEM does say that *š stands for /s/, and *s stands for /t͡s/, although the consonant phoneme table above the statement contradicts it, calling *s an alveolar fricative and *š a palatal fricative. —Mahāgaja · talk 06:05, 31 August 2021 (UTC)[reply]
Proto-Semitic is still a good example of where giving the (usual) correspondences somewhere would clarify the notation. In some cases, giving the orthography of the reflexes would also help, as pronunication seems to have changed drastically during record history in some cases. After all, there's still the fallback position that a reconstruction is an expression of what the regular reflexes would be. --RichardW57m (talk) 08:59, 31 August 2021 (UTC)[reply]
Some established notations are indeed counterintuitive, e.g. for Proto-Austronesian, we still use the Dyen notation, which followed the typewriter-friendly principle of making minimal use of special characters and diacritics, but which is in parts quite arbitrary (*e.g. z most likely was an affricate in the alveolar-palatal area, *j is usually interpreted as a palatalized velar stop).
IPA is however not an ideal tool to solve this, because often reconstructions are less fine-grained than IPA (e.g. the non-low central vowel of Proto-Austronesian is conventionally spelled *e, with many scholars in recent publications spelling it *ə as we do here, but actually we can only say that it ranged between *ə or *ɨ), or still a mystery (such as the actual value of *j). Proto-Algonkian *θ is another example.
I agree with Mahāgaja that the About page is the best place to explain these things, but ideally, it should be only one click away from the reconstructed entry, similar to the key that appears next to "IPA(key)". –Austronesier (talk) 10:27, 31 August 2021 (UTC)[reply]

Morphology section

[edit]

The Wiktionary:Etymology guidelines state that "Analyses of surface forms are of value, but do not replace and should not be confused with an account of historical development." Currently, the advice is to include this information in the Etymology section, called out as a "Surface analysis," such as "Surface analysis astro- + -logy". I've also seen several alternate forms of this phrasing, such as "equivalent to" or "morphologically."

To avoid confusion and edit wars, it would be helpful if this information were contained in its own Morphology section.

A coworker and I have been working to extract this morphological information, currently available at [8]. The morphologies are not all currently perfect, but we could bot-edit these into articles to provide a starting point on which we could improve.

We would also set up templates for morphologies, such as

{{morphology|en|unsurprisingly|un-|surprise|-ing|-ly}}

(the exact format of the template is still TBD).

Here are some examples of what the section might look like:

Morphology (for article unsurprisingly)

[edit]

un- surprise -ing -ly

Morphology (for article endings)

[edit]

end -ing -s

Morphology (for article unlockable)

[edit]

un- lock -able

Morphology (for article running)

[edit]

run -ing

Morphology (for article conversation)

[edit]

converse -ation

My question: What is the process for adding a new section and creating new templates for it?

Jon the Geek (talk) 17:15, 30 August 2021 (UTC)[reply]

Template Considerations

[edit]

Template like {{affix}} do useful categorisation. You should not lose that information. --RichardW57m (talk) 09:08, 31 August 2021 (UTC)[reply]

I think the proposal is not to remove any information that is currently there, but to add a new section with new templates focusing on surface analysis. I agree that any new templates should be as informative as possible, but at least the example given shows hyphens being used informatively, similar to the {{affix}} template. Jonathanbratt (talk) 15:21, 31 August 2021 (UTC)[reply]

I don't think there should be a new section for this, morphology as separate from {{affix}}. We could create a standard template for linking the elements (and providing the text?) of "Surface analysis x+y+z" (under whatever name and wording we agree on), if we don't have one already, so people who want to extract that information could do so easily. But for human readers, our entries already push definitions quite far down, and I'm wary of putting anything else near the etymology section (where it would make sense for this to go) that would push them down even further; the section also seems guaranteed to have very little content (just a string of morphemes), so I don't think it needs its own section in the TOC. It makes more sense to me to handle it in Etymology much like we already do, but with its own specific template to standardize it and make it bot-readable / extractable. - -sche (discuss) 20:31, 31 August 2021 (UTC)[reply]

Something within the Etymology section would make sense to me. Mostly I wanted to semantically separate it in some way since it IS different from the history of the word. There was a question a while back from Jonathanbratt I think trying to find out what the wording should be for the "Surface analysis" stuff, and that had a bit of its own argument about whether it's appropriate to include that at all. I thought this section could be a good compromise. If we added it as a new section, it wouldn't have to go above the definition, for what it's worth. Jon the Geek (talk) 12:46, 1 September 2021 (UTC)[reply]

Fictional characters as proper nouns

[edit]

I see Sherlock Holmes and Darth Vader have "proper noun" sections, but User:Overlordnat1 suggests turning the proper noun sense of Sam Spade into a mention in the etymology. How do people feel about this? A consistent guideline would be nice. None Shall Revert (talk) 11:19, 31 August 2021 (UTC)[reply]

I wouldn’t mind sections on both Sam Spade, the character, and on the generic term derived from it like for some other entries but the generic sense should be mentioned as that’s what makes it dictionary-worthy. Also problematic is that the definition relates to the character but the quotes relate to the more general sense, so it would be good if the entry and our guidelines were more consistent, undoubtedly. Overlordnat1 (talk) 14:03, 31 August 2021 (UTC)[reply]
I'd be fine if we only had a noun section, as we don't include names of individual persons per se. Pedia links and mention in the Etymology should make the connection. It could be that we need a template analogous to {{&lit}} (say, {{&propnoun}}) to make it clear that we are aware of the proper name, but exclude it as a matter of policy. DCDuring (talk) 15:11, 31 August 2021 (UTC)[reply]
Re: "we don't include names of individual persons per se": plentifully contradicted in CAT:en:Individuals. --Dan Polansky (talk) 08:34, 3 September 2022 (UTC)[reply]
Very late: Keep proper noun senses: they are the primary meanings and it seems bizarre to me to exclude them. We don't even need the common noun senses and cover them in the proper noun senses: "Fictional character noted for characteristics X, Y and Z". --Dan Polansky (talk) 08:34, 3 September 2022 (UTC)[reply]

Stipulation on the templates vote

[edit]

Per Wiktionary talk:Votes/2021-08/Nullifying the previous templates vote § No consensus result?, here's a stipulation I'd propose to settle the templates situation amicably:

  • The {{bor+}} template will be in operation, and anyone will be able to use it.
  • No one can replace an instance of the traditional {{bor}} template with {{bor+}}.
  • Usage of {{bor+}} can be undone selectively (i.e. if it would allow including additional information, make the wording clearer, etc.), but not systematically.
  • The {{inh+}} template will either (a) be deleted or (b) continue to exist but no one will be able to use it.
  • There will be a moratorium on further changes to the inh, bor, bor+, and inh+ templates for another (a) 3 months, (b) 6 months, or (c) 1 year.
  • Optionally, if other users would like, we can abort the ongoing vote about the templates and resolve the situation with this stipulation.

I think this would be a good omnibus compromise solution. What do other users think? Imetsia (talk) 14:16, 31 August 2021 (UTC)[reply]

What about just having community-decided usage? It seems the templates are mostly favoured by the Indo-Aryan editing communities on Wiktionary, I would agree with every community deciding separately whether to use these templates or not. That way, I won't have to worry about new editors using the template(s) in languages that don't need them. Thadh (talk) 15:01, 31 August 2021 (UTC)[reply]
@Victar's main area of interest seems to be Indo-Iranian etymology, so I'm not sure that would resolve the conflict. They view the etymologies as their territory, while the Indo-Aryan community sees them as theirs. That's what this is really about. Chuck Entz (talk) 15:15, 31 August 2021 (UTC)[reply]
While I'm okay with this, imho replacing an instance of the traditional {{bor}} template with {{bor+}} should be legal. Two tigers cannot live on one mountain — if {{bor+}} can't replace {{bor}}, {{bor}} will replace it. It is sad that all this compromising is being done for just one user who refuses to do any compromises. Tbh the new vote was also created for the same user. This cunning man tricks editors to create votes for something that doesnt need it because he wants to see it failed. Answer, O sarvavirodhī, was there any previous policy forbidding the etymology text and future templates? Don't answer off topic by citing the borrowing vote, I'm tired of that. It never said that etymology text or any other template for displaying it isn't allowed. Svārtava215:48, 1 September 2021 (UTC)[reply]
What you just said is the opposite of a compromise. I would like to remind you that the vote did fail and that most people that voted support didn't support systematic replacement of {{bor}} by {{bor+}}. What we're trying now is to give the supporters of a shorthand template the possibility to use that shorthand, not to make {{bor+}} a the sole standard template, which it definitely shouldn't be. Thadh (talk) 16:10, 1 September 2021 (UTC)[reply]
Not even the most dyed-in-the-wool template-opponent can argue that the vote "failed." It got 20 support votes against 10 oppose votes, but you and others continue to read the vote through a mirror. And even if you're sold on the idea that the vote didn't technically reach the requisite threshold, note that it did not fail. Its result was "no consensus," not "failed". Imetsia (talk) 21:36, 1 September 2021 (UTC)[reply]
I'm sorry for the imprecise wording, what I meant to say is it didn't pass. Thadh (talk) 19:35, 2 September 2021 (UTC)[reply]
I agree that we should be able to systematically replace bor with bor+ (or, alternatively, simply include "Borrowed from" back into the original template). However, I just want to come to an amicable halfway solution for the time being, and the rest can be discussed and voted on further in the future.
I likewise agree that it's sad we have to do this just to placate one individual who, by sheer force of will, was able to (1) con another user into creating the first vote, (2) convince so many to shift the burden of proof for the existence and operation of the templates, (3) lure an admin to unblock him for abusive editing, and is now (4) getting us to bend the knee and agree to a half-measure. How some users can continue to tolerate, nay, actively defend and promote, that behavior is beyond me. It's bad faith through and through, and administrative action should be taken. Imetsia (talk) 21:23, 1 September 2021 (UTC)[reply]
@Imetsia: That all sounds very amenable to me. My only further preference is that Inqilābī's edit to include a link be undone, but we can hash that out later. I think the vote should just play out, regardless. User:Svartava2 however is still actively reverting and inserting {{bor+}} and {{inh+}} in entries. --{{victar|talk}} 18:58, 2 September 2021 (UTC)[reply]
  • Are "Inherited from {{inh}}" and "{{inh+}}" meant to be equivalent? The first use of 'inherited' can be taken to be normal lexicographic usage, whereas the link in the second one implies that the glossary meaning is to be understood, and by the glossary meaning, modern Modern English blood is not inherited from Middle English blood, for there has been an irregular sound change between the two. (The definition in the glossary was added by User:Mnemosientje on 24 February 2019.) --RichardW57m (talk) 15:36, 9 September 2021 (UTC)[reply]
    The glossary definition of inherited is misleading when it says "derive through regular sound change". We do not require inherited words to exhibit completely regular sound changes. If we did require regular sound changes, then not only would blood not be inherited, but for example words with Middle English /u/ like but would not be inherited in dialects where they irregularly developed /ʌ/ or the like (but would be inherited in dialects that retained /ʊ/). But everyone would probably consider those words inherited in all dialects of English. Apparently the definition needs some clarification. — Eru·tuon 19:17, 9 September 2021 (UTC)[reply]
    Indeed, there are quite a lot of such words, but you've got the second example wrong, and we don't have an entry for any of the Middle English forms! I deliberately chose a very clear word - it is not obvious to me whether it is head or bead which is irregular. --RichardW57 (talk) 04:37, 10 September 2021 (UTC)[reply]
  • I do not agree to such a stipulation. As I said earlier you do not need a vote to create a template and I do not want to dignify this vote to set any sort of precedent. In any case my current interest is in Italian and other Romance languages rather than Indo-Aryan. Victar seems to think he owns *all* languages even those he doesn’t work on, which is nuts. I would be more willing to agree to stipulations on a community by community basis but Victar seems to have no interest in this, and if he won’t compromise neither will I. Benwing (talk) 19:47, 9 September 2021 (UTC)[reply]
    Oppose per Benwing2, “if [Victar] won’t compromise neither will I”; as it is this is happening just for 1 single user which is “wholly rediculous” —Svārtava203:10, 10 September 2021 (UTC)[reply]
    @Benwing: The above is *literally* a compromise myself and others are agreeing to. --{{victar|talk}} 15:58, 10 September 2021 (UTC)[reply]

I propose the following: decisions should be made per community. I will agree to free use of the templates in the Romance languages where I edit, and no use in the Iranian languages where Victar edits. I cannot respond at length now as I am traveling but I will note the spirit of the advisory vote concerning these templates was clearly in favor. Benwing2 (talk) 17:08, 10 September 2021 (UTC)[reply]

@Benwing2: So your proposal is basically "I do whatever I want in the areas I edit". To that I say, see Chuck's reply above. --{{victar|talk}} 17:36, 10 September 2021 (UTC)[reply]
@Victar You are sneaky and underhanded as always. Imetsia well summed up your trickery above. I think going community by community is a reasonable "compromise"; obviously you think you own everything on Wiktionary so you don't like this. I am suggesting that communities of related languages, as defined by the editors of those languages, should make the decision as to what templates they will use. I don't presume to dictate what happens e.g. in East Asian languages, where I don't edit, and neither should you dictate what goes on in languages you don't edit, period. Benwing2 (talk) 07:02, 12 September 2021 (UTC)[reply]
And here come the ad hominem attacks, wonderful. --{{victar|talk}} 07:12, 12 September 2021 (UTC)[reply]
@Benwing2, Imetsia: I agree to the choice being left to the communities, and as a member of the Italo-Dalmatian community (the proud sole editor of Corsican, although it's been a while), I would ask {{inh+}} not be used, and I'm okay with using {{bor+}} for Latin borrowings, provided everyone else editing Italian languages (and preferably also Sardinian) agree to that. The reasoning behind not using the inherited template is that the borrowings directly from Latin are, in my experience, far more uncommon in common words than inheritence. Thadh (talk) 09:22, 12 September 2021 (UTC)[reply]
@Thadh: Okay, that sounds good to me. But anyways we would be agreeing sitewide to condition 5 (it would't make sense only to adopt this within the Italian-languages community). And otherwise, for the Italian languages we agree to conditions 1,3, and 4. If we're agreeing to this thing community-by-community, I'd like to have more flexibility with condition 2. Maybe something like "Template substitutions from {{bor}} to {{bor+}} will be permitted selectively (e.g. when bundled with other, productive edits to a page) but not systematically." This is not an absolute must for me, and we can negotiate around it.
Two more tidbits. The bor+ template would be most useful for me for borrowings from English to Italian, not Latin to Italian. And secondly, I think it would be beneficial to make more of these policy decisions community-by-community, although we currently lack the infrastructure for doing so. Imetsia (talk) 15:21, 12 September 2021 (UTC)[reply]
@Imetsia: The last point you make is a very interesting one, and I would very much support creating something like Mini-Beer Parlours per community (if we can define these).
re. what you said about systematic replacement, I wouldn't oppose systematic replacement per community after a consensus has been reached. So, for instance, if we decide whithin the Italo-Dalmatian community to use {{bor+}} before any instance of borrowing (see the next paragraph), I think it's justifiable to replace the {{bor}} templates. The problem is with languages without a strong community (mostly LDLs), and I would argue that these should be left alone, unless someone has very strong feelings towards this issue, and they can give arguments why to use the +-templates. I could imagine some creoles would want that.
Now, to adress your final comment, regarding English borrowings: I think whether to use the template in those cases should be decided taking two things into account, 1) The frequency of borrowing; I think we can manage with just typing out "From" or "Borrowed from" when there's a borrowing from Chinese in Italian, or something like that, but for English and (Old) French, I could agree to using the templates as a shorthand; and 2) The preference of the editor: Some editors still prefer to just use "From" in borrowings among unambiguous relations, including English/Italian. Since one of the major motivations behind the templates were borrowings/inheritances from Sanskrit and Latin, I think those should be tied down, but other relations should be decided by the editor, and handled in the same way any preference-based markup is handled right now, i.e. not replaced unless there's a good reason to or if really contributing to the entry and the language on their whole.
I think if most of us can agree to the aforementioned, then we can leave the fighting about what to do with Indo-Iranian languages to Victar and the others, which is far less populous a group than we now have discussing this issue. Thadh (talk) 16:04, 12 September 2021 (UTC)[reply]
Ok, I support this. If I understand you correctly, we're okay with mass-replacements from, e.g., English to Italian but not, e.g. Latin to Italian. If that interpretation is correct, then I would agree to these terms.
Just to reformat it all plainly, we're agreeing to the following in the Italo-Dalmatian community:
  • The {{bor+}} template will be in operation, and anyone will be able to use it.
  • Template substitutions from {{bor}} to {{bor+}} will typically be allowed at the discretion of the editor.
  • Usage of {{bor+}} can be undone selectively (i.e. if it would allow including additional information, make the wording clearer, etc.), but not systematically.
  • The {{inh+}} template will continue to exist but be put out of operation.
  • There will be a moratorium on further changes to the inh, bor, bor+, and inh+ templates for another 6 months.
(Notifying Benwing2, GianWiki, Metaknowledge, SemperBlotto, Ultimateria, Jberkel, Sartma): Pinging members of the Italian working group so we can ratify this just within the Italian-languages community. Imetsia (talk) 18:42, 12 September 2021 (UTC)[reply]
@Imetsia: I think you've either typoed or misunderstood a few things: 1) We're okay with mass-replacements from, e.g., Latin to Italian, but not from English to Italian, since the latter is not a borrowing that is widely confusing (The former may be, if we agree on that, and thus arguably needs the glossary link). We're okay with usage of both in any case, as long as the editor does further constructive work on the entry. 2) The {{inh+}} will be put out of operation, not the regular {{inh}} template. With the rest of these things I'm completely okay. Thadh (talk) 18:54, 12 September 2021 (UTC)[reply]
@Thadh: Yes, I both typoed and misunderstood a few things. But yes, I still agree to the stipulation with the inclusion of the clarification you just provided. Imetsia (talk) 19:04, 12 September 2021 (UTC)[reply]
I'm fine with all instances of the + templates, but I'll leave it to the bots to add them. I Support Imetsia's proposal. Ultimateria (talk) 19:11, 12 September 2021 (UTC)[reply]
@Thadh, Imetsia For the most part I support Imetsia's latest proposal but I don't see why we need to avoid using {{inh+}} for Latin to Italian inheritances. (Forgive me if I'm missed the points in favor of avoiding {{inh+}}; I've been very busy lately with RL stuff and am just trying to catch up now.) In general it's far from obvious whether a given derivation of a term from Latin to Italian is an inheritance or a borrowing, and both are quite frequent, so I think it makes sense to allow use of both templates to specify this explicitly. If we allow {{bor+}} but not {{inh+}}, how is a new user of Wiktionary supposed to just "know" that an etymology reading "From Latin foobar" without explicitly saying "Borrowed from" is meant to be an inheritance? For that matter, a lot of "From Latin foobar"'s are implemented using {{der}}, which doesn't indicate whether something is an inheritance or a borrowing, so you can't actually make the inference that an unmarked derivation is an inheritance. So I would be opposed to placing additional restrictions on {{inh+}} that aren't placed on {{bor+}}. However, I don't have an issue with not using {{bor+}} for English to Italian borrowings because, as others have pointed out, those are necessarily borrowings and cannot be inheritances. I would add that the exact same issues come up with all the major Romance languages, so whatever we agree concerning Italian should apply to the others as well. The other language groups where similar issues arise are Indo-Aryan languages vis-a-vis Sanskrit and spoken Arabic languages vis-a-vis Classical Arabic, so {{bor+}} and {{inh+}} should be allowed in these cases as well. So I would argue the following:
  1. For major Romance languages (which for me means the top 6: French, Spanish, Italian, Portuguese, Romanian, Catalan), use of {{bor+}} and {{inh+}}, including automated substitutions from {{bor}} and {{inh}}, should be allowed for derivations from Latin. I take no position on minor Romance languages like Corsican or Dalmatian; I don't know enough about them to know whether the inheritance/borrowing confusion issue is major. These substitutions, or new uses of {{bor+}}/{{inh+}}, can be undone selectively if there's a good reason to, but not en masse without further discussion.
  2. I would also argue the same thing should apply to the Indo-Aryan languages vis-a-vis Sanskrit. I don't edit much in the Indo-Aryan space (although I have worked on Hindi and in fact I created all the Hindi verb/noun/adjective inflection modules) but from what I can tell, all or almost all the regular Indo-Aryan editors are in favor of {{bor+}} and {{inh+}} for Sanskrit derivations for the same reason I want to use them for Latin derivations.
  3. I would further argue the same thing should apply to the spoken Arabic languages vis-a-vis Classical or Modern Standard Arabic, but I don't feel so strongly about this because (a) I don't edit much in these languages, (b) the community of editors working in the spoken Arabic languages is almost certainly small, (c) a lot of these languages don't even have standard written forms. Benwing2 (talk) 06:16, 13 September 2021 (UTC)[reply]
@Benwing: So, definitely, one decision should be made for the bigger and the smaller language, because - speaking from my own experience with Corsican - the etymologies are almost identical to Italian. On the other hand, maybe the mother-daughter relation between Latin and, say, French is slightly different than that between Latin and Italian. I can only speak from my own experience, so I will adress Italian (since I worked with Corsican): We have already divided Latin into different etym-only codes, and so borrowings from a language whose relation to its daughter is unambiguous is only with Classical Latin (in principle), and these borrowings are rare. It's a whole cleanup we probably need to do: CAT:Terms inherited from Late Latin isn't empty, and neither is CAT:Terms inherited from Late Latin and even CAT:Terms inherited from New Latin has one entry in it! I think the inheritances from Classical Latin aren't numerous, and as such I think we should resolve this by labeling the type of Latin where the language has originated (I'll be honest, I didn't think of this at the time of my original message, and I am myself guilty of using incorrect labeling). So, if we fix the labels, the relations between the mother-daughter language will become unambiguous, and as such no glossary link will be needed. This hopefully also answers your other question about {{der}}: It's not really an argument for using a specific template, it's just bad etymologies which we need to fix. Thadh (talk) 10:49, 13 September 2021 (UTC)[reply]
I, for one, totally agree that {{inh+}} should not be avoided/deleted/unused. There should not be any restrictions on it. Svārtava208:54, 13 September 2021 (UTC)[reply]
@Svartava2: The template may be useful in Indo-Iranian languages, but I don't think that's true in Romance languages; however, this will not likely affect you in any way. Thadh (talk) 10:49, 13 September 2021 (UTC)[reply]
As someone who works in both Sanskrit and Latin, I can tell you that it is no more true in PII. --{{victar|talk}} 03:17, 14 September 2021 (UTC)[reply]
@Thadh I'm sorry but I'm having difficulty following your arguments above concerning Classical Latin vs. other languages. Can you restate them? I will tell you that a few months ago I spent over two weeks, several hours a day, converting about 2,000 uses of {{etyl}} in Spanish to use either {{bor}} or {{inh}}. The majority of them were derivations from Latin, and it took a LOT of work (that's an understatement) to figure out for all of them whether they were borrowings or inheritances; and I count myself more or less an expert in the historical linguistics of Romance languages. In many cases there was simply no way for me to determine which template to use, so I left it at {{der}}. And not counting these cases, there are over 5,000 remaining cases where {{der}} is being used, again the majority of them coming from Latin. So I don't buy your argument that we "just need to fix" the uses of {{der}}; this just isn't practical. Furthermore, regardless of whether it is theoretically unambiguous for many varieties of Latin whether a given derivation is an inheritance or borrowing (e.g. presumably all derivations from "Vulgar Latin" are inheritances, and all derivations from Medieval Latin are borrowings; and I will note that derivations from any of Classical Latin, Late Latin and Ecclesiastical Latin can be either borrowings or inheritances), the average user of Wiktionary etymologies will have absolutely no clue about this, so it will be highly useful to state this explicitly. In short, I'm completely baffled by why you are arguing so strongly for suppressing useful information in etymologies (i.e. whether a given derivation is a borrowing or inheritance). Remember that Wiktionary is (a) intended for average users, not experts, and (b) not a paper dictionary, so there is no restriction on size. Benwing2 (talk) 06:14, 14 September 2021 (UTC)[reply]
@Benwing: You see, in my experience, the vast majority of wiktionary users are people with some kind of linguistic knowledge, or at least a vague understanding how language works. Whether we like that or not, it's just how it is, other people rather go to Glosbe or Google Translate. Now, if you see two languages, of which one is ancestral to the other, the natural - again, in my experience, and that of many others - conclusion is, that the daughter term is inherited from the mother term; Whether the user knows the term "inherited" or can define it doesn't really matter, what matters is - they know the concept. Now, when a term is instead borrowed from such a language, most of us will then write "Borrowed from" - the glossary link makes sense if you think our readers may not know the term - even though I think they do - but, hey, I'm okay with making this extra clear.
Now, if I don't know if a term is borrowed or inherited, I, clearly not entirely unresonably, will write that in the etymology: Something like, "Either borrowed or inherited from", followed by the {{der}} template. That's because this gives a clear view on what we know and don't know. Now, you're asking: "[Why would we] suppre[ss] useful information in etymologies[?]" and my answer is simple: Because it's not. If, and I am conviced that is the case, inheritance is implied when the term "From" is given, and a completely akin source word is given (so, without any added morphemes present, and the source not being a root), stating that the derivation is inheritance is not "useful information", it is additional information, without any positive value, and I do think we can take all the help we can get to shorten etymologies to a readable size, and even one additional word and a bluelink may be enough to prevent that. Thadh (talk) 19:39, 14 September 2021 (UTC)[reply]
@Thadh I can't respond in depth right now, need to go to sleep, I've been crazy busy with work. But you haven't responded to my point that there are over 5000 uses of {{der}} in Spanish, which read formatted as "From ..." and definitely do NOT imply inheritance. Benwing2 (talk) 08:54, 16 September 2021 (UTC)[reply]
Change that! We have that power, we are the editors! If it's not known if it's borrowed or inherited, say that, if it is known, fix it. Thadh (talk) 08:56, 16 September 2021 (UTC)[reply]
So what's the comprehended word to use when we don't know whether the word was borrowed, merely modified as to its constitution, or inherited? --RichardW57 (talk) 11:34, 20 September 2021 (UTC)[reply]
@RichardW57: It's up to you, but I would go with "Either borrowed or inherited from" or "Ulitmately from"; it really depends on the context, do you have a specific example in mind? Thadh (talk) 12:15, 20 September 2021 (UTC)[reply]
@Thadh: Yes, Pali katthūrikā (musk). Because most Pali resources lack it, I eventually decided that it was borrowed from Sanskrit rather than inherited from something Wikipedia refuses to call Sanskrit. Your suggested phrases don't distinguish between not knowing the route and simply not telling the route. --RichardW57 (talk) 13:21, 20 September 2021 (UTC)[reply]
@RichardW57: If you want to be more specific (e.g. there has been research on the topic and the source isn't certain), the usual wording is something like: "{{unk|Uncertain}}. Probably borrowed from {{der|sa|कस्तूरिका}}, otherwise inherited from {{der|sa|कस्तूरिका}}." In any case, the template {{unk}} is specifically for etymologies where the origin is unknown/uncertain/disputed. Thadh (talk) 13:27, 20 September 2021 (UTC)[reply]
To not presume a Romance language word is by default inherited from Latin instead of borrowed is simply obtuse and pedantic. --{{victar|talk}} 00:56, 17 September 2021 (UTC)[reply]
@Thadh You are still not understanding. Whether you want it to be the case that "From ..." implies inheritance, it obviously does not, given how many people indiscriminately use "From ..." with {{der}} (see the recent discussion concerning User:Donnanz). You are complaining about the insertion of a single word; since we don't seem to be getting anywhere, we may have to "agree to disagree", which means I will insert the word when I want, and you can go ahead and not insert it when you want. Benwing2 (talk) 07:13, 20 September 2021 (UTC)[reply]
Again, it obviousely does to any reader that isn't overthinking the text. You want to save a few characters when typing the etymology, and the thing is, do you really need a separate template just to save a few keystrokes (or, to be precise, 15 characters)? If you want to use "Inherited from" written out because you think the situation requires it - fine by me, but nobody needs a glossary link unless the term is actually written, and in most situations using the term explicitly is not welcome, it only makes the text longer and throws an unknown term at you. Thadh (talk) 12:15, 20 September 2021 (UTC)[reply]

Template:desctree and module errors

[edit]

While {{desctree}} can save a lot of typing, it has the inherent flaw that it depends on the content of other entries without any indication in those entries that it does so. I've gotten really tired of fixing module errors triggered by perfectly acceptable changes to the linked entries. I created an abuse filter to remind people to check for {{desctree}} links to an entry when they remove a Descendants section, but I'm not sure it's effective enough to be worth the annoyance- I'm still fixing {{desctree}} module errors.

While I'm not a big fan in general of using module errors to enforce good practice (it's like setting off a fire alarm when someone forgets to flush), this is far worse: the person who causes the problem is completely unaware of the module error, so it doesn't change their behavior at all. Instead, it clutters up CAT:E and replaces content with an alarming error message until a third party with the necessary knowledge (usually me) takes the time to fix it.

I would like to propose that the template's behavior be changed so that it no longer throws a module error, but instead displays as if it were {{desc}} and adds a maintenance category- maybe something like "Desctree linking to missing [language of target entry] Descendants section". Chuck Entz (talk) 15:02, 31 August 2021 (UTC)[reply]

That sounds like a very good idea. —Mahāgaja · talk 15:22, 31 August 2021 (UTC)[reply]
Make it so. Vox Sciurorum (talk) 16:31, 9 September 2021 (UTC)[reply]

Is Northern Ireland a country, a constituent country or a province – and does it have counties or traditional counties?

[edit]

There are famously six counties in Northern Ireland. Two of them (Derry and Down) are in CAT:en:Counties of Northern Ireland, three of them (Antrim, Armagh and Fermanagh) are in CAT:en:Traditional counties of Northern Ireland, and one of them (Tyrone) is in neither of those but is in CAT:en:Places in Northern Ireland. The difference? If you type {{place|en|county|c/Northern Ireland}} (using c/ for "country"), it categorizes into CAT:en:Counties of Northern Ireland; if you type {{place|en|county|cc/Northern Ireland}} (using cc/ for "constituent country"), it categorizes into CAT:en:Traditional counties of Northern Ireland; and if you type {{place|en|county|p/Northern Ireland}} (using p/ for "province"), it categorizes into CAT:en:Places in Northern Ireland. This behavior should really be regularized, and also, the redundancy between the Counties category and the Traditional counties category should be eliminated, since all counties of NI are traditional counties – they haven't been politically real since 1973. —Mahāgaja · talk 16:48, 31 August 2021 (UTC)[reply]

Why would it be “a province”? And why are we distinguishing between countries and states? The former word is completely meaningless, while the latter has a well-established definition, and I don’t see why Northern Island would not fall under it. Even the Isle of Man is a state, island state. Then we can have constituent state, and I don’t see why {{place}} special-cases “England, Scotland, and Wales”.
“Region” is a geographical-instead-of-political concept. So if someone calls Northern Ireland “a region”, it does not designate the state but the territory it is on.
“Territory”, in another sense, “dependent territories”, are apparently areas that lack statehood and aren’t counted as parts of a state.
“Provinces” are parts of a state that aren’t states themselves. E.g. of the subjects of the Russian Federation only those called “republic” are states, the others are provinces. Fay Freak (talk) 20:12, 3 September 2021 (UTC)[reply]
It is often described as a province; see w:Northern Ireland#Descriptions. It's also described as a region or a country. I don't think it's ever described as a state. And if you consider the UK a state, then NI is a part of a state that isn't a state itself. —Mahāgaja · talk 07:25, 4 September 2021 (UTC)[reply]
But “is” not what it is “described as” if manaman only juggles words without ascribing a particular meaning to it. Why is Northern Ireland not a helicopter? Because [description of a helicopter follows, to which Northern Island does not fit.] While you shun to have as clear an idea of “a state” etc. to argue that Northern Island is or is not.
On the first try I found “Nordirland, das als teilautonomer Gliedstaat …”. At another place not even Scotland is a Gliedstaat. So staatsrechtlich definition (now try to translate that adjective) may be disputed.
However there are of course legal instruments where Northern Ireland is “a state”, due to being of a separate legal system. For example for this reason alone Northern Ireland is a state according to Art. 22 (1) of the Rome I Regulation, according to the wording of the German version Staat, and Slovene and Croatian država, Polish państwo etc., while the English version uses country and French pays while not using state / état before except in member state / État membre. One just needs to know with which conceptualization one operates. The private international law definition is clear. You see here that country and state are synonymous; the definition of state clumsily given on Wiktionary as “any sovereign polity; a national or city-state government”, with the word “sovereign” making nothing clear. Wiktionary has country as “a nation state” (hence the UK was a member state while “a country”, not an inherent distinction), so just a special case of “state”. The Rome I regulation could have used “state” as well, but did not because of the sense “A political division of a federation …” being more expected for English state without qualifier. In the German, Polish, Croatian/Slovene version Article 22(1) is superfluous because German, Poles, Croats and Slovenes know what a state is. The actual definition of state, which is relevant in most practical matters, is not found on en.Wiktionary.
What is so unusual anyway that common usage of a legal term is vagary?
For clarity we have to stop use the common language. This is not a paradox in legal scholarship. But it is impeccable that South Ossetia, Transnistria etc. are, the Islamic State for a time were states, de jure states. (The term “de jure state” you are used to is a made-up SOP term of Wikipedia, like so many terms invented by Wikipedias, based on an international law definition of “state” which requires diplomatic recognition. But then there is that which only has de facto requirements. Practically only the latter applies. Judgments from all such states may be enforced according to §§ 328, 722, 723 ZPO. Yes, even Moldova may recognize Transnistria to that extent. The FRG doesn’t though, due to lacking reciprocity, § 328 (1) Nr. 5, not because “it is not a state (it recognizes)”, see Wietzorek Anerkennung und Vollstreckbarerklärung von Entscheidungen im Verhältnis zur selbsternannten Pridnestrowischen Moldauischen Republik, WiRO 2017, pp. 135 seqq. (paywall of course)). Fay Freak (talk) 20:51, 4 September 2021 (UTC)[reply]
For consistency, I moved the two which were in "counties" to "traditional counties", per Mahagaja's statement above that they're all only traditional counties anymore. As for the other question, "region" seems like the blandest, most neutral description, although "province" could also work. Northern Ireland doesn't seem to be a "constituent country" per se because in the past Ireland was a constituent country but now part is independent, so Northern Ireland is a..."part of a formerly-constituent country". (Likewise, as Wikipedia goes into some detail about, it's not a "country".) - -sche (discuss) 00:16, 12 September 2021 (UTC)[reply]