Jump to content

Wiktionary:Beer parlour/2016/September

From Wiktionary, the free dictionary

{{also}} template

[edit]

Hello -- I noticed that of the c. 495 thousand entries which differ from other entries only in diacritic marks or capitalisation, only c. 172 thousand have {{also}} templates. Would it be worthwhile for me to add these to the remainder? Also, some dozens of thousands which do have these templates are missing a subset of the items in their respective congruence classes. Would it also be worthwhile to complete the arguments for these templates? An example is gort and ğort. Apologies for coming here with such a fiddly question. Isomorphyc (talk) 01:56, 1 September 2016 (UTC)[reply]

Yes, if you're confident you can de-diacritize and classify them correctly. DTLHS (talk) 01:59, 1 September 2016 (UTC)[reply]
If it can be done conveniently (and correctly), yes.
Some users, especially but not only, in English-speaking countries are not facile with diacritics, eg, me. More importantly, I don't think anonymous users have access to the means we offer to overcome keyboard limitations.
IMO the most important part of the task is to make sure that on all entries that use only the no-diacritic Roman character set {{also}} includes all the entries that use diacritics that correspond to the plain entries. English, Latin, and "Translingual" are the only languages that matter to me. A smaller subset would be only lemma entries.
I'm sure there are other points of view. DCDuring TALK 02:10, 1 September 2016 (UTC)[reply]
Yes, please! I have asked before for someone to do this. Note that there may be a limit to how many arguments {{also}} can handle, and there is in any case a limit to how many we would want to display (let's discuss what that would be: more than 15 links?). For terms that would otherwise display more than that number of alsos, it is preferable to set up and link to an appendix, the way a links to Appendix:Variations of "a" rather than listing all 100+ variants directly in a. - -sche (discuss) 02:20, 1 September 2016 (UTC)[reply]
@-sche: For what it is worth, there are 77 congruence classes having more than 15 members and 129 classes having more than ten members. The largest groups are bo (19), y (19), s (20), sa (20), n (21), i (24), u (38), e (41), o (57) and a (61). My lists currently do not include differences in punctuation; the classes will be slightly larger and more numerous when this is included. The idea of creating an appendix for classes larger than ten or fifteen sounds reasonable to me, but if I create such appendices, they will provide less information than those which already exist. I would be uncomfortable also including, for example, the same sequence of letters in other scripts or Hanzi represented by the same letters in some transliteration scheme, as is currently the practice. I do believe this can be done without errors on a very large and somewhat easy to define subset of the relevant entries, mostly deferring work on scripts with which I may be uncomfortable. Isomorphyc (talk) 02:47, 1 September 2016 (UTC)[reply]
Since we're talking about the difference between no appendix and one that's not as complete as it possibly could be, I don't see the problem: this is a wiki, and others can expand on those later. Chuck Entz (talk) 03:32, 1 September 2016 (UTC)[reply]
Yes - I think that a bot could do this even better. SemperBlotto (talk) 05:39, 1 September 2016 (UTC)[reply]
It's well suited to a bot, but a bot would not be able to create the appendix pages when there are larger numbers. To do this, the appendix pages would need to follow a standard format and not have any additional information added. —CodeCat 12:25, 1 September 2016 (UTC)[reply]
But a bot could generate a list of appendix pages that need to be created and pages that need to be added to them. --WikiTiki89 12:54, 1 September 2016 (UTC)[reply]

Second LexiSession : paths, roads and ways

[edit]

Dear all,

Apologies for writing in non-native English; please fix any mistakes you may encounter in these lines!

The Tremendous Wiktionary User Group, a nice and open gathering of Wiktionarians, is happy to introduce the second chapter of our collective experiment: LexiSession.

So, what is a LexiSession? The idea is to coordinate contributors from different languages to focus on a shared topic, to enhance all projects at the same time! It may remind you of the Commons monthly contests, but here everyone is a winner! First LexiSession was about cat and it was a beginning. For this second LexiSession, we offer a month - until the end of September - to pave the way! There is plenty of names for different kind of roads, streets, avenues, and ways, and wiktionaries can be very helpful to help people to pick the correct one to describe or to translate.

English Wiktionary already have a Wikisaurus:road and a Wikisaurus:way but there is still a lot of information to provide. Well, why is it almost in alphabetical order? How to distinguish between roadway and motorway (for instance)? Is it possible to help readers with pictures or something? These are not instructions, and everyone is welcome to imagine new solutions to provide information about semantic networks and variation. Also, you may be interested to know that French Wiktionary already has eight different thesaurus about streets in eight different languages, including English.

Please share your contributions here! You can also have a look at what other Wiktionarians are doing, on the LexiSession Meta page. We will discuss the processes and results in Meta, so feel free to have a look and suggest topics for the following LexiSession.

Thank you for your attention, and I hope you will be interested in this new way of contributing. I'll get back to you later this month for an update! Noé (talk) 10:53, 1 September 2016 (UTC)[reply]

Great topic (wasn't too keen on the cats). For this session I'm especially interested in local names which are only used in a specific city or region. Also interesting would be to describe (also visually?) hierarchies of paths. – Jberkel (talk) 12:23, 1 September 2016 (UTC)[reply]
@Noé I'll try to contribute more to this one, provided school doesn't get in the way! Let's hope participation is better than last time. :) (And as a tiny note about your English, don't forget that the third person singular of have is has....) Andrew Sheedy (talk) 01:38, 2 September 2016 (UTC)[reply]
@Noé another correction, also there are plenty of names.

Hey guys, end of September: is anyone have made something for this LexiSession? As we are in October now, we'll start a new session! I know a month is quite short, but it's the idea behind LexiSession. Please give a hint if you have made something thank to this LexiSession. Don't be sad if you haven't participate this time, there's be more to come! Noé (talk) 10:50, 2 October 2016 (UTC)[reply]

I made some small changes, but nothing substantial I'm afraid. School prevents me from being able to contribute to other projects on top of my usual edits (which are usually only to entries of words that I have had to look up). Hopefully the next time I have a holiday, I'll be able to really participate! Andrew Sheedy (talk) 18:30, 2 October 2016 (UTC)[reply]

borrowing → bor

[edit]

Wiktionary:Votes/2016-07/borrowing, borrowed, loan, loanword → bor passed. Results: 14-5-3 (73.68%-26.32%) (not counted: +1 late oppose vote). Can someone please do the honors and edit the template in all entries?

FYI: See Thread:User talk:CodeCat/borrowing → bor. In the discussion, I asked CodeCat first, but she said: "I don't think it's right to do it given the strong opposition." Do we need to discuss this further before doing the change? I was hoping we could go on with it and have {{bor}} used in a way more consistent with {{inh}} and {{der}}.

As I mentioned in the conversation with CodeCat, I believe I found some important numbers concerning how the templates are used. Correct me if I make any mistake in the numbers or their interpretation. {{inh}} and {{inherited}} were created together, and it appears that almost all entries that display an etymological inheritance use the shorter form. {{borrowing}} was all we had available for 5 years -- that is, the shorter {{bor}} did not exist. Then {{bor}} was created 1 year ago and about 2/3 of entries of borrowed terms already use {{bor}} rather than {{borrowing}}. This is one reason why I see a trend towards shorter names, confirmed in the vote.

In the discussion, CodeCat suggested leaving shortcuts as shortcuts and long forms as long forms. Feel free to discuss this idea. I disagree with it: people who used the longer syntax {{borrowing|it|pizza|lang=en}} in entries from 2010 to 2015 did it because it was the only format available; once the shorter {{bor|en|it|pizza}} came to exist, people started to use it. --Daniel Carrero (talk) 16:01, 1 September 2016 (UTC)[reply]

Distinction between topical and context-based usage categories?

[edit]

The general purpose of context labels is, as far as I can discern from what others have said, to specify the context in which a specific sense applies. Presumably, it is not understood in that sense in other contexts. However, there are a few systemic problems:

  • Context labels add categories that do not indicate this restricted context. Category:Physics, which is added when you put {{lb|xx|physics}} on a sense, has nothing to do with restricted usage. Instead, it's just a general category where all terms related to the topic of physics can go. As a consequence, some editors are led to think that context labels are just a fancy means of putting entries in topical categories.
  • Worse still, some context labels put entries into "set"-type categories, but display a topical context label. {{lb|xx|particle}} puts entries in Category:Subatomic particles while showing "physics". This is confusing when used on very widespread terms like electron, which are used far outside the "physics" context.

We already have "slang" categories, like those in Category:English slang, but we have none for jargon or restricted-context senses that are not slang. However, I think these are sorely needed. It is very valuable to distinguish senses used only in physics, from those related to physics. What can be done to remedy this situation? —CodeCat 20:25, 1 September 2016 (UTC)[reply]

I would favor using longer and more explanatory names for topical categories. I'll give a few examples. Feel free to suggest any changes.
"names of" (proper noun examples)
"names of" (place names -- subdivision(s) if they exist, country)
"names of" (common nouns) (are those acceptable?)
"relating to" (or "related to"?)
"jargon"
--Daniel Carrero (talk) 21:16, 1 September 2016 (UTC)[reply]
I've been "guilty" of using the context labels to categorize items, and don't agree with the current strict usage policy. The example given in WT:ELE is
{{lb|en|informal}} An [[informant]] or [[snitch]].
It says "Such labels indicate, for example, that the following definition occurs in a limited geographic region or temporal period, or is used only by specialists in a particular field and not by the general population". Informal language however is used by large parts of the general population.
Using category links to categorize is just very awkward, they're invisible and tend to be scattered around the wiki code, at the bottom of the page or somewhere else, and have maintenance problems (forgetting to remove the link when the definition is removed/changed). Conversely, labels are close to the definition, and if the label is removed then the category is removed as well. – Jberkel (talk) 11:12, 8 September 2016 (UTC)[reply]
A fundamental problem is that sometimes a topic is also a usage context and sometimes it isn't. For example, a military slang term for a civilian, belongs in a usage context "military", but is not topically "military", and boat is topically "nautical" when applied to a ship, but is not used in a "nautical" context. The category problem is bad enough, but we aren't helping users notice, let alone understand, the distinction to be made between usage context and topic. DCDuring TALK 19:21, 2 October 2016 (UTC)[reply]

For French Verbs: Displaying participles in the header

[edit]

I'm copying what I wrote on the discussion page for {{fr-verb}}, as I forgot that Mglovesfun was no longer active:

Would it be easy enough to have {{fr-verb}} display in a way similar to {{pt-verb}} and {{es-verb}}? This would increase consistency between French and other languages on Wiktionary (including English, Spanish, Latin, and Portuguese), which would be a big plus. I would suggest including the present and past participles. I would do this myself, but I'm not very technologically inclined.... I would love to see it implemented, though! Andrew Sheedy (talk) 01:34, 2 September 2016 (UTC)[reply]

Mglovesfun is active. His username is Renard migrant.
I'm mildly in favor. It should be Luacized as we already have a module that generates most verb forms. And I can't do that, I'm afraid. Renard Migrant (talk) 17:26, 5 September 2016 (UTC)[reply]
I generally oppose copying inflection information from inflection tables. I prefer the format used by Dutch verbs (lopen) where principal parts are shown when the table is collapsed. —CodeCat 17:28, 5 September 2016 (UTC)[reply]
I suppose that's a workable option, but I would much prefer that all the Romance languages be consistent between each other , given their similar grammar, etc. Andrew Sheedy (talk) 17:59, 5 September 2016 (UTC)[reply]
What does any of this even mean? UtherPendrogn (talk) 18:31, 13 September 2016 (UTC)[reply]
Since I suspect you know what a participle is, I'd imagine your question is what would this actually look like:

faire (present participle faisant, past participle fait)

ok? Renard Migrant (talk) 18:34, 13 September 2016 (UTC)[reply]
What I don't get is why. French present participles don't get used nearly as much as they do in English, Spanish, and Portuguese. --WikiTiki89 18:37, 13 September 2016 (UTC)[reply]
@Wikitiki89 The main reason is that one can see the conjugation at a glance without having recourse to the conjugation table. For example, if we were to have the first person singular of the indicative and the present and past participles in the header, one could look at the header for a verb like mourir or craindre and see: mourir (first person singular meurs, present participle mourant, past participle mort) or craindre (first person singular crains, present participle craignant, past participle craint).
That would allow a reader familiar with French to conjugate all composite tenses, as well as most of the present and subjunctive, among others. Also, while the present participle may not be as common in French as in other languages that use the header, the past participle is far more common. I'm also a big fan of consistency, and see no reason why French shouldn't have such a header, when it could be helpful to users like me. Andrew Sheedy (talk) 20:48, 2 October 2016 (UTC)[reply]
@Wikitiki89 (Pinging again because my signature wasn't in the same paragraph as the ping.) Andrew Sheedy (talk) 20:49, 2 October 2016 (UTC)[reply]
@Andrew Sheedy: I got both your pings. The ping and signature can be in different paragraphs. The rule is that both have to be in a new paragraph (as recognized by the diff tool). As for the conjugation, there are more important things missing from the headword line for faire than the active participle, such as the 1/2 person plural present, the imperfect, future, etc. Those should have priority over the active participle, but there are so many of things you can put there that it would create too much clutter, and that is why we provide a conjugation table. --WikiTiki89 17:47, 5 October 2016 (UTC)[reply]
@Wikitiki89 Very true, but then why have a header for Spanish, Portuguese, and Latin? I'm not going to fight to have the present participle included rather than something else, but I feel like the past participle, at the very least, should be in the header. Andrew Sheedy (talk) 22:36, 5 October 2016 (UTC)[reply]
@Andrew Sheedy: I can't speak for Spanish and Portguese. For Latin we give the four traditional "principle parts". As for French, I have no problem with giving the past participle (for some reason I thought this discussion was only concerning the present participle). We should also choose a few other select forms. What's normally given in a typical French monolingual dictionary? --WikiTiki89 14:04, 6 October 2016 (UTC)[reply]
@Wikitiki89 French dictionaries tend not to give information on conjugation, or to give it separately. In Bescherelles (conjugation books), however, the participles are typically visually distinct, as well as the first person singular of the all non-compound tenses and the first person plural of the present tense of the indicative, subjunctive, and imperative. Obviously, that's too much to put in a header, but including (a) the first person singular of the present indicative, (b) the present participle, and (c) the past participle, would allow the reader to form nearly any tense.
My attachment to the present participle is due to that last fact. For example, verbs in the second group are defined as those verbs that end in -ir in the infinitive and -issant in the present participle. In other words, by displaying the present participle, a reader would be able to see that the verb was in fact regular, saving them a look at the conjugation table. For irregular verbs in which the root changes, it is very typical for the first person singular to have one stem while the present participle has the other (which forms of the verb use which stem is fairly readily predictable). For example: mourir: je meurs, present participle mourant, past participle: mort(e)(s) (other forms of the verb: meurt, meure, mourons, mourus, mouriez, etc.); écrire: j'écris, present participle écrivant, past participle: écrit(e)(s) (other forms of the verb: écrit, écrive, écrivons, écrivis, écriviez, etc.); plaire: je plais, present participle plaisant, past participle: plu (other forms of the verb: plaît, plaise, plaisons, plus, plaisez, etc.). Note that I used the same verb forms in the same order for each of the examples for the sake of comparison. Obviously they don't all match up perfectly, but the three forms of the verb I suggested for the header cover virtually all the permutations of the verb stems between them. It's not difficult to extrapolate from them to form the rest of the conjugation. Andrew Sheedy (talk) 01:51, 7 October 2016 (UTC)[reply]
But why specifically the present participle? There are other forms that can be used to show that particular stem. I also think it's important to show the future stem at least when it's not the same as the infinitive. The stem of the present participle I think is nearly always the same as the stem of the imperfect. So why not show the past participle and the first person singular of the present, imperfect, future, and perhaps subjunctive? Perhaps we can display these only when they are not obvious. And we can even give the present participle when it is completely irregular, such as for avoir (ayant). --WikiTiki89 15:43, 7 October 2016 (UTC)[reply]
I would be fine for doing that for irregular verbs, but I feel like it would be too cluttered if that many forms were included for every verb. I agree that it would be helpful to give the first person future, as there are often extra R's added and such in that tense. The present participle would be helpful for identifying second group verbs (but then so would the present subjunctive) and for forming other parts of speech, such as adjectives, but I don't think it has to be included. Andrew Sheedy (talk) 20:58, 8 October 2016 (UTC)[reply]

Proto-Celtic verb lemmas

[edit]

@CodeCat, Victar, UtherPendrogn, Nayrb Rellimer, Florian Blaschke, and anyone else who cares: Right now we have only two Proto-Celtic verbs, *ber- (which uses the stem as the lemma) and *brusū (which uses the 1st person singular present as the lemma). Does anyone object to my settling on the 3rd person singular present as the lemma form for Proto-Celtic verbs? That's what we're already using for verb lemmas for Proto-Celtic's ancestor (Proto-Indo-European) as well as for its best attested early descendant (Old Irish). This would entail moving *ber- to *bereti and *brusū to *bruseti. Is that OK with everyone? —Aɴɢʀ (talk) 17:29, 2 September 2016 (UTC)[reply]

What is used as the lemma for modern Celtic languages? --WikiTiki89 17:34, 2 September 2016 (UTC)[reply]
The imperative for the modern Goidelic languages, the verbal noun for the modern Brythonic languages. —Aɴɢʀ (talk) 18:40, 2 September 2016 (UTC)[reply]
That seems a little strange, but then what do I know. In any case, I definitely support your proposition. --WikiTiki89 18:44, 2 September 2016 (UTC)[reply]
What about the old Brythonic languages? WT:Lemmas has nothing. —CodeCat 18:45, 2 September 2016 (UTC)[reply]
I know Welsh mostly descends from 3.sg. --Victar (talk) 18:57, 2 September 2016 (UTC)[reply]
I've been using the verbal noun for Middle Welsh, too, but I've been thinking it might be good to use the 1st person singular present (which is what the Geiriadur Prifysgol Cymru does for literary Welsh) and have the verbal noun be separate (as the verbal noun is separate for the Goidelic languages). —Aɴɢʀ (talk) 19:00, 2 September 2016 (UTC)[reply]
SupportCodeCat 17:37, 2 September 2016 (UTC)[reply]
Abstain On one hand, PCelt's descendants are mostly 3.sg, but on the other hand, it's nice to have it in line with Latin, who's descendants are also not in 1.sg. *shrug* --Victar (talk) 18:44, 2 September 2016 (UTC)[reply]
Is there any common practice in reference works (aside from the infinitive, which some dictionaries use for everything)? Chuck Entz (talk) 19:04, 2 September 2016 (UTC)[reply]
Sounds good. I've been working on some Proto-Brythonic verbs myself. My userpage has a huge amount of WIP translations. UtherPendrogn (talk) 19:18, 2 September 2016 (UTC)[reply]
I've created a rudimentary inflection table for thematic verbs, {{cel-conj-them}}. It's still lacking many forms, as I'm not super well versed on Celtic verbs. I'd like to know especially which principal parts there are and which PIE verb stems they come from. From w:Proto-Celtic language I gather that the present, future, preterite active and preterite passive stems are principal, but their PIE origin eludes me.
The template is implemented with a module, Module:cel-verbs, and new classes can be added there fairly easily. The main issue I'm faced with is the layout of the table. The table on w:Proto-Celtic language has a lot of wasted space, I'd prefer something more compact, but I'm not sure what would work best. —CodeCat 20:00, 2 September 2016 (UTC)[reply]
Support I don't think there's an established practice (Schumacher, for one, uses only stems), but considering Old Irish uses the 3sg too, it makes sense. I'm generally a fan of using the 3sg because it is usually the most frequent and best attested form, and in certain verbs (such as meteorological or impersonal verbs), other forms will be rare at best (though not necessarily nonexistent: for example, in the Old Lithuanian corpus a verb form like "I snow" may be attested in the context of a tale with anthropomorphised clouds). --Florian Blaschke (talk) 01:08, 3 September 2016 (UTC)[reply]
I'm a little late, but I Support moving them to the 3sg. For what it's worth, Matasovic also only gives stems. —JohnC5 14:50, 7 September 2016 (UTC)[reply]

I've been working on a new verb conjugation table. Please let me know what you all think. User:Victar/Template:cel-conj-table --Victar (talk) 02:40, 7 September 2016 (UTC)[reply]

I don't think it's an improvement over the existing one. —CodeCat 12:11, 7 September 2016 (UTC)[reply]
That's seems certainly to be a tainted matter of personal opinion. --Victar (talk) 15:31, 7 September 2016 (UTC)[reply]
I definitely would not use MacBain's dictionary for anything. It's hopelessly out of date now, and wasn't all that up to date even when it was published. —Aɴɢʀ (talk) 13:54, 7 September 2016 (UTC)[reply]
Did he get everything right? Obviously not, but you cite the classic along with the modern. It's still a work in progress. --Victar (talk) 15:31, 7 September 2016 (UTC)[reply]

Please vote in "Poll: Description section"

[edit]

Please vote in Wiktionary:Beer parlour/2016/August#Poll: Description section.

Current winners:

  • "Description" = 3 actual support votes
  • "Shape" = 2 actual support votes (my vote is calling it second best) + 1 vote in favour of this section "if we do have it" in the Oppose section.

If enough people prefer "Shape" instead of "Description", I can change the whole vote Wiktionary:Votes/2016-08/Description before it starts: it would become a vote for having a "Shape" section.

If more people prefer "Description" instead of "Shape", it would confirm that the vote can start as-is.

The current results are basically a tie with my "second best" comment weighing a bit in the direction of supporting "Description". If nobody else participates on the poll, I think I'll start the vote as-is. --Daniel Carrero (talk) 14:43, 5 September 2016 (UTC)[reply]

The following needs to be posted on WT:NFE:

* [[Module:IPA]] and {{temp|IPA}} now support an additional <code>qual''N''=</code> parameter, to place a qualifying note before a pronunciation.

CodeCat 20:28, 5 September 2016 (UTC)[reply]

Done Done --Daniel Carrero (talk) 20:31, 5 September 2016 (UTC)[reply]

Sysop

[edit]

Can I have my sysopship back please? It's getting very frustrating not being able to properly patrol or edit protected pages. I also ask for Module:links, Module:th and Module:th-translit to be restored to the version that puts the transliteration code in Module:th-translit (where it ought to be) rather than Module:links, and ask that this be enforced by all editors. There are currently negotiations for a vote for Wyang's proposal, so it would be inappropriate for him to restore his version and continue the edit war before a vote on the matter has been held. —CodeCat 20:42, 5 September 2016 (UTC)[reply]

For the record, negotiations are happening at Wiktionary:Votes/2016-08/Enabling different kinds of romanization in different locations and the vote talk page.
I support giving back the tools to CodeCat, and to Wyang too. I support restoring modules and templates to the previous version. Whatever the merits of having two separate romanizations (I might even vote support!), I believe the status quo should prevail and that the new proposal should be properly discussed before implementation, especially in case of a huge disagreement like the one that we have now. --Daniel Carrero (talk) 20:48, 5 September 2016 (UTC)[reply]
Agreed And this also may be a good reason to implement Template Editor privileges here. —Justin (koavf)TCM 21:55, 5 September 2016 (UTC)[reply]
Support Why was CodeCat ever desysopped? --Florian Blaschke (talk) 22:20, 5 September 2016 (UTC)[reply]
There are two things that have to happen before I restore sysop rights:
  1. There has to be support from the community for it. This has been trickling in, and probably won't be a barrier.
  2. I have to be convinced that both parties will refrain from any actions that might start the edit war again.
The negotiations at Wiktionary talk:Votes/2016-08/Enabling different kinds of romanization in different locations are a start, but they mostly consist of some variant of "what about this?", followed by some variant of "you're not getting my point". We need to get beyond talking past each other and start talking about serious proposals. We also need to avoid dwelling on past behavior and start discussing what the future is going to look like. Chuck Entz (talk) 22:30, 5 September 2016 (UTC)[reply]
FWIW I am OK with restoring sysop privileges, provided both Wyang and CodeCat agree not to resume edit warring. I also think that Module:links should be restored to the status quo ante, with an appropriate vote to resolve the matter. In fact I asked Dan to create this vote in order try to resolve what I thought was the root of the conflict between CodeCat and Wyang. As it happens, Wyang has objected to the vote for various reasons, some of which concern whether the issue of the vote is the right one to be voting on and some of which object to having a vote at all. The amount of contention here indicates we clearly need a vote but I'm open to rewording it. However, this issue is orthogonal to the issue of sysop privileges. Benwing2 (talk) 22:32, 5 September 2016 (UTC)[reply]
My only concern is the restoration of existing practice to the Thai transliteration module, and the elimination of custom code from Module:links. If that is accepted then there won't be any edit warring from me, though I do ask what course of action I should take if Wyang restores his version of the modules without a vote to support it. The reason the edit war happened in the first place was because Wyang kept reverting me and no steps were taken to stop him, and he ignored all attempts I made to convince him to stop and wait for consensus/vote. So if Wyang is sysopped again, there needs to be a contingency plan in case he does the same again; some kind of guarantee that others will also step in instead of just me. —CodeCat 22:42, 5 September 2016 (UTC)[reply]
Translation: You want us to take your side on the edit war and enforce it for you. I happen to prefer your version, but this kind of talk isn't very helpful. Chuck Entz (talk) 23:27, 5 September 2016 (UTC)[reply]
Pretty much, yes. The alternative would be endorsing Wyang's edits without a vote to show such endorsement by the wider community. That doesn't seem like a proper option given how contentious the issue is. Major changes that are contentious should be voted on, yes? —CodeCat 23:55, 5 September 2016 (UTC)[reply]
(edit conflict) One part of the problem is figuring out exactly what the status quo ante would be: this started when Wyang added his code to Module:links to implement a very useful change for Thai transliterations/romanizations. CodeCat later extensively reworked the module, in the process removing the code (I'm not sure whether she noticed the code or recognized what it was at the time). This broke a number of Thai entries and several Thai editors asked what was going on, so Wyang added the code back. It's possible that CodeCat, if she was unaware of the earlier code, thought this was something entirely new- she certainly acted as if it were. She reverted his edit, and didn't handle the dispute very well. Wyang got upset and the edit war started. Wikitiki89 came up with a compromise that moved the code out of Module:Links, which CodeCat adopted, but Wyang didn't.
Do we revert it to:
  1. The state before Wyang's first edit? That would wipe out CodeCat's reworking of the module.
  2. The state before Wyang's second edit? (Dan Polanski's choice, if I understand correctly). That would break a number of Thai entries.
  3. The state after Wyang's second edit? (Wyang's choice)
  4. The state after Wikitiki89's edit? (CodeCat's choice)
The last two are the only ones that don't break anything, and either could be considered the status quo ante, depending on how you interpret Wyang's first edit. Chuck Entz (talk) 23:15, 5 September 2016 (UTC)[reply]
I don't see any point in restoring anyone's admin rights until the substance of the disagreement is resolved. As I see it, the destructive turn the conflict took is a serious matter, affecting important core software. If the talent involved in the matter cannot resolve it, perhaps someone else should. DCDuring TALK 23:44, 5 September 2016 (UTC)[reply]
There's already a vote that attempts to propose Wyang's changes so that a formal consensus can be made. But Wyang doesn't seem very cooperative in formulating the proposal, so it's mostly stuck. Since Wyang thus has no consensus for his proposed reinterpretation of transliteration modules, the status quo remains, which is that transliteration modules provide any kind of romanisation deemed desirable. This is what my and Wikitiki's edits attempted to do. If Wyang does not agree to a vote but forces his own interpretation through edit warring, what can be done? —CodeCat 23:59, 5 September 2016 (UTC)[reply]
@Chuck Entz: Hmm, when I wrote my comment I didn't check out the whole history carefully. Since the argument is about the presence or absence of a particular piece of Thai-specific code in Module:links, and if I'm not mistaken this didn't exist before the whole edit war started, then logically the status quo ante shouldn't include it. However, I don't completely understand the ramifications of this. Wyang obviously put the code there for a reason; but CodeCat and Wikitiki seem to believe that the same functionality can be achieved with this code in Module:th-translit. If this is true, then it should be taken out pending a vote to decide the underlying issues. Benwing2 (talk) 00:19, 6 September 2016 (UTC)[reply]
The reason the code was placed there by Wyang is because he believes that transliteration modules should only transliterate strictly: character by character. He therefore objects to the modification Wikitiki made, but at the same time, his reinterpretation of transliteration modules is not the agreed status quo. I argue that under the consensus interpretation, a vote is necessary for Wyang's proposal to restrict transliteration modules to just strict transliteration, and have an alternative module system/infrastructure for non-transliterative romanizations. I also believe that under this interpretation, the Thai transliteration code should be placed in Module:th-translit until a vote shows consensus to the contrary. And additionally, even if a vote passes to have separate infrastructures in our modules for transliteration and other types of romanization, the specific code for Thai does not belong in Module:links, but should be handled by said proposed infrastructure in a more general manner. —CodeCat 00:34, 6 September 2016 (UTC)[reply]
There was no consensus. What is being repetitively cited as "consensus" is how people perceive romanisations from the angle of languages not making such a distinction. Truth is, appropriate and purpose-oriented romanisation has been the norm in languages with a script-pronunciation discordance, and it has been the consensus for these languages. See for example the differential use of transcriptions and transliterations ({{ko-etym-native}}) in 미끄럽다 (mikkeureopda), by User:Visviva who created the bulk of our Korean entries. The core issue is “why do the harms outweigh the benefits if we keep the transliteration and transcription modules separate for these languages”, and the conclusion from the previous discussion is: "the envisageable harm is minimal and benefits are extensive". There is a demonstratable need to maintain the systems separate - our language editors routinely apply different romanisations when editing these languages, and printed dictionaries of these languages show that authors regard that the different modes of romanisation are suited to different purposes. The issue is not whether we should implement use romanisation X in translations right now; the issue is whether the system should be maintained to take this need into consideration and not deliberately confuse the concepts "transliteration" and "transcription" (where they truly make a difference), so that future edits in these languages are not discouraged. Wyang (talk) 03:20, 6 September 2016 (UTC)[reply]
What happens now? —CodeCat 19:57, 9 September 2016 (UTC)[reply]
This is up to Chuck. I'm not sure where things stand currently. Benwing2 (talk) 16:17, 11 September 2016 (UTC)[reply]

Proposal: Redirect all halfwidth and fullwidth forms to their "normal" counterparts

[edit]

When there are fewer active votes in the list, I'm thinking of creating a new vote for this proposal:

Redirect all halfwidth and fullwidth forms to their "normal" counterparts.

I feel this should be pretty uncontroversial, but let me know if someone has a reason to keep the halfwidth and fullwidth forms.

Previous discussions:

--Daniel Carrero (talk) 00:59, 8 September 2016 (UTC)[reply]

I have a minor objection: Why are single-character half-/full-width forms more important than words spelled with them? We obviously shouldn't duplicate all our entries in half-/full-width forms, so if we can get away without those, why can't we get away without the single-character ones? --WikiTiki89 14:01, 8 September 2016 (UTC)[reply]
Actually, CD was a redirect since 2013; I deleted it now. I agree with you about fullwidth words. I believe we don't want entries like  CD, LCD or bye bye, or even redirects like CDCD, LCDLCD, bye byebye bye. But I feel that the possibility of readers searching for single fullwidth characters is higher than for words. If a person searches for "CD" and finds out that we don't have that entry, they might try searching for " C" afterwards.
According to the pageview tool (link) the fullwidth entry got 197 views in the last 6 months. Halfwidth got 12 views. It's not a terribly huge number, but I feel a redirect to the normal forms wouldn't hurt.
In general, for any redundant Unicode characters, I feel it's good to have redirects from the alt form to the "normal" form. Based on that sentiment, I created Wiktionary:Votes/2011-06/Redirecting combining characters and Wiktionary:Votes/2011-07/Redirecting single-character digraphs. Both passed, in 2011.
For better communication, I should probably create a vote with the whole idea that I have in mind. "Voting on: Allowing all single-characters full- and halfwidth forms as redirects. Forbidding full- and halfwidth words, they should not exist even as redirects." --Daniel Carrero (talk) 17:06, 8 September 2016 (UTC)[reply]
Actually, I think the problem with many of your proposals is that you create a vote too soon. We should have a long discussion first and only after the discussion has died down and some time has passed should you create a vote (if there had been enough support). --WikiTiki89 17:23, 8 September 2016 (UTC)[reply]
Good point. But you can't always have a long discussion: sometimes, nobody, or just a few people, respond to my topic on the BP. If nobody else decides to weigh in on this topic about fullwidth characters, I believe I should create the vote anyway (eventually).
Concerning minor proposals that don't affect a lot of entries (I consider "redirect fullwidth characters, disallow fullwidth words" one of these) and minor policy edits that don't change actual regulations, I think it's okay to start a vote earlier than most other votes. But if creating votes too soon is a problem, I guess I could create a vote after the discussion disappears from the main Beer parlour page. Other proposals were discussed a lot (sometimes in multiple places) before the vote started. If you want, we can talk about specific past votes that I created, to see if I could have done any of them differently.
Then again, there are some proposals that were discussed already but I didn't create a vote for them. I see nothing wrong with creating a vote immediately for some of these, and pointing to the previous discussions. I may even create a new BP discussion just to point out that a new vote was created, and to see if everyone agrees with the wording of the vote. This is not the same as creating a new vote without discussion. --Daniel Carrero (talk) 18:13, 8 September 2016 (UTC)[reply]
I'll give you two rules of thumb: If the discussion is still going, it's too early to create a vote (unless it's an urgent matter). If the discussion has not had much input, try to attract more attention to it, or perhaps it is not important enough to be voted on. --WikiTiki89 18:20, 8 September 2016 (UTC)[reply]
All right, I'll have this in mind: "If the discussion is still going, it's too early to create a vote (unless it's an urgent matter)."
I partially agree with this: "If the discussion has not had much input, try to attract more attention to it, or perhaps it is not important enough to be voted on." In my opinion, the proposal "redirect fullwidth characters, disallow fullwidth words" is important enough to be voted on and appear on the WT:CFI as actual criteria for inclusion/exclusion of entries, but among the things that need to be voted on, this is not very important, because it affects few entries. --Daniel Carrero (talk) 19:04, 8 September 2016 (UTC)[reply]

I created Wiktionary:Votes/2016-10/Redirect fullwidth and halfwidth characters. --Daniel Carrero (talk) 13:39, 21 October 2016 (UTC)[reply]

pirates!

[edit]

FYI, September 19 is International Talk Like a Pirate Day. I would suggest doing the word-of-the-day as something pirate-related if possible. I think it would be great too if we can create an Appendix or Category of terms traditionally associated with pirate lore, such as "walk the plank." I know in my area (Maryland, USA) there are local businesses offering promotional discounts for customers who come in talking like pirates on September 19. I think a pirate vocabulary guide would be helpful not just for them, but for authors and storytellers as well. Nicole Sharp (talk) 05:24, 8 September 2016 (UTC)[reply]

Proto-Brythonic

[edit]

@CodeCat, Victar, UtherPendrogn, Nayrb Rellimer, Florian Blaschke, Anglom, Angr, Chuck Entz, and anyone else who cares:

Several books on Brittonic and Neo-Brittonic suggest that the name Gwydion was "Uidgen" or "Widgen" at this point in time, not Gwidyen, as here https://en.wiktionary.org/wiki/Reconstruction:Proto-Brythonic/Gw%C9%A8d%C9%A3en . Indeed, the "gw" shift seems to have happened from NB to Old Welsh, where it became Guidgen, then in Middle Welsh Gwydyen/Gwydyon and modern Gwydion. UtherPendrogn (talk) 19:01, 9 September 2016 (UTC)[reply]

Attestations at *gwir show that the change happened in all languages and is thus of Proto-Brythonic date. —CodeCat 21:24, 9 September 2016 (UTC)[reply]
Good. As to the name https://en.wiktionary.org/wiki/Reconstruction:Proto-Brythonic/Kadwall%E1%BB%8Dn , have I reconstructed it correctly? Some of the descendants are messy, I'm sorting them out right now. UtherPendrogn (talk) 22:02, 9 September 2016 (UTC)[reply]
The Irish descendants don't match up. Where did they get their -m-? It looks much more likely that they descend straight from the Proto-Celtic form, which was *Kat(u)wellamnos or similar. Gaulish is not a Brythonic language, it got its form of the name straight from Proto-Celtic. —CodeCat 22:11, 9 September 2016 (UTC)[reply]
I can also find nothing whatsoever of the Gaulish or Irish names, Google gives zero results. @Angr can you check this? —CodeCat 22:15, 9 September 2016 (UTC)[reply]
I got some Google Hits for the Gaulish name and variants of it, e.g. this, but it seems to be a place name rather than a personal name. I can find no trace of an Irish form "Cathfollomon". Cadwallon reconstructs Proto-Brythonic *Katuwellaunos, which in our notation would be *Kaduwellọn, from Proto-Celtic *Katuwelnāmnos. The Brythonic Catuvellauni have the same name. —Aɴɢʀ (talk) 06:28, 10 September 2016 (UTC)[reply]
Using Au or the dotted O is a matter of notation, but surely apocope is not? And why did you mention the forms? They are the ones I put.

EDIT: Oh I see now, sorry, I put the Early Brythonic form rather than the Proto-Celtic. Will rectify that and add the PC form.UtherPendrogn (talk) 12:18, 10 September 2016 (UTC)[reply]

It does not matter what form the descendant takes, surely? And I reconstructed the Irish ones thanks to the Dictionnaire de la Langue Gauloise by Xavier Delamarre. UtherPendrogn (talk) 12:22, 10 September 2016 (UTC)[reply]
Sorry, why are there Goidelic descendants under *Kadwallọn? —JohnC5 17:11, 10 September 2016 (UTC)[reply]
That shouldn't be there. Probably a mistake from copy/pasting the Celtic form. Removed now. UtherPendrogn (talk) 19:30, 10 September 2016 (UTC)[reply]
[edit]

If no one objects, I'll remove "and are listed in the left hand side of the entry" from WT:EL#Interwiki links. Some people complained about it in Wiktionary:Votes/pl-2016-02/Interwiki links, which passed in March 2016. I'd like to do this without a new vote.

Current text: "Interwiki links are used to point to the same word in foreign language Wiktionaries, and are listed in the left hand side of the entry. To point to the page palabra in the Spanish Wiktionary, use:"

Proposed text: "Interwiki links are used to point to the same word in foreign language Wiktionaries. To point to the page palabra in the Spanish Wiktionary, use:"

--Daniel Carrero (talk) 11:47, 10 September 2016 (UTC)[reply]

Done Done. Let me know if you wanted the mention of the "left hand side of the entry" back. In the vote, a few people were not very happy with that wording. --Daniel Carrero (talk) 02:14, 14 September 2016 (UTC)[reply]

Centralization of also-information

[edit]

For some time, I thought it would be good to entralize the {{also}} lists in a canonical entry, which would be the diacritic-free lowercase entry if available. The canonical form entry would have the full list while each other form would only link to the canonical entry using {{also}}. For instance, kaca would have a full list while káča would only link to kaca. This would remove a maintenace overhead while bringing only a minor incovenience to the reader; it would also make the tops of many pages less busy.

Does anyone like that idea? --Dan Polansky (talk) 09:03, 11 September 2016 (UTC)[reply]

I wouldn't object; I rarely need to see pages with accented titles, however. The obvious alternatives are (i) to have a bot regularly update the alsos (or even a template generate them on the fly?) based on a list of entry titles, or (ii) to use the Variations of __ pages like Appendix:Variations of "be" (but that's an extra click, and a waste of a page when there are very few variations). Equinox 10:30, 11 September 2016 (UTC)[reply]
See above discussion of updating contents of uses of Template:also.
If we "centralize", I would prefer that only one (or more) page(s) whose headword(s) had diacritics bore the complete list of headwords in the equivalence class. DCDuring TALK 12:05, 11 September 2016 (UTC)[reply]
The problem is that the average reader isn't going to click on the undiacriticed form if they don't see their diacriticed form there. Of course, most people are going to search the undiacriticed form to start with, but their system may have easy ways to type accents, but not macrons, háčeks, etc., so you can't rule the possibility out. Chuck Entz (talk) 14:39, 11 September 2016 (UTC)[reply]
This is true. On my German keyboard, it's easy to type â ê î ô û but not ŵ ŷ, so if I'm searching for a Welsh word with a circumflexed vowel, I'll search for the diacriticked form of the first five but the undiacriticked form of the last two. All that said, however, I'd prefer to keep the full list on each page, because you just never know where you're going to end up. —Aɴɢʀ (talk) 15:43, 11 September 2016 (UTC)[reply]
I don't think this is a bad idea, but it seems like it would be necessary to have a bot keep things updated no matter if we keep things updated on all pages (checking for new entries that have been created and need to be added to all the {{also}}s) or on one page (still checking for new entries to add to the centralized list, and for any additions of also to peripheral pages, which the bot would presumably remove). Given that, I do think the idea of having a bot update all the {{also}}s is better. Someone just needs to design and run that bot...! - -sche (discuss) 19:01, 11 September 2016 (UTC)[reply]
I thought @Isomorphyc had previously volunteered in the discussion above. I don't know whether he has all the skills, but he does run Orphicbot. DCDuring TALK 19:35, 11 September 2016 (UTC)[reply]
I think this would need 2 separate templates:
  • caca would have: "See also: Caca, caça, caçà, cáca, căca and ćaća" ({{also}} as usual)
  • Caca would have: "For more entries, see caca" ({{also-more|caca}} or something)
  • caça would have: "For more entries, see caca"
  • caçà would have: "For more entries, see caca"
  • etc.
@Dan Polansky, DCDuring, Metaknowledge: Thanks for pinging me. Actually, I have the code already to do most of this, including realtime updating. The only thing I haven't totally worked out is how I will handle the appendices. It turns out there are a variety of corner cases where users have entered more information into an {{also}} template than one would want, by default, to add, for example, transliterations into other scripts. My current policy has been to retain these where they have been entered, but not to propagate them to other entries. Because of this, centralising the lists will remove the potential for this type of user-generated information. To retain flexibility, my suggestion would be not to centralise the data. I would add that I believe every method for storing this data in modules has significant drawbacks.
For the issues about typing ease raised by User:Chuck Entz and User:Angr, I think users would learn to seek out the {{also}} templates if they were consistently available. I'll test this with the pageview data three months after I have updated to templates to see if an increase in newly linked words with diacritics is seen in aggregate. But I would point out this only partly solves the typing problem because if a word with diacritics has no corresponding entry in pure ASCII, there will be no also template in the easy-to-type location. I have looked at a few newer methods of improving ASCII searchability than which I have tried so far, but that is a different topic, and everything I have looked at has drawbacks. Isomorphyc (talk) 20:19, 11 September 2016 (UTC)[reply]

──────────────────────────────────────────────────────────────────────────────────────────────────── The horse may already have bolted from the stable, but is it really necessary for OrphicBot to add alternative forms of an entry in an {{also}} template when they are already listed under the "Alternative forms" section? I think that isn't very useful; {{also}} should perhaps be confined to accented forms (usually in other languages) or differently capitalized forms. — SMUconlaw (talk) 11:22, 12 September 2016 (UTC)[reply]

@Smuconlaw: Sorry for giving that impression. It did a little, but these are orthogonal changes because centralisation can be accomplished in the modules and templates without editing the pages. If the data are centralised, the important thing is only to have the actual template on each page; the arguments do not matter. I'll be glad to make the changes if necessary; they're not major. For your question: Wiktionary's normal principle is redundancy over normalisation, largely for reasons stemming out of the fact that we're not a database. Creating inter-template dependencies is not a good idea if it can be avoided. In this case, the exception you propose would be also confusing to users because each lemma in each language can have its own "Alternative forms" section, and the user would need to find the correct one out of potentially many. Moreover, the `correct' one may not even be in the language the user is expecting, defeating the purpose of a purely orthographical index. That said, if this or other exceptions are generally preferred I will implement them. For example, User:YURi has suggested omitting {{also}} links from misspellings to correct spellings, since that this is also redundant. Isomorphyc (talk) 14:57, 12 September 2016 (UTC)[reply]
Thanks for explaining. It's not really a big deal for me, but I was wondering whether it made sense in some cases to have both an "Alternative forms" section and the very same information in a "See also" statement at the top of the entry. — SMUconlaw (talk) 17:44, 12 September 2016 (UTC)[reply]

Matched-pairs — policy page

[edit]

I created Wiktionary:Votes/pl-2016-09/Matched-pair entries — policy page, to implement what was discussed in Wiktionary:Beer parlour/2016/June#Redirects to matched pairs. Feel free to discuss and propose any changes. --Daniel Carrero (talk) 14:04, 11 September 2016 (UTC)[reply]

Allow for easier input from the laity.

[edit]

I recently saw on TV an "educational" program that referred to an 'oyster knife' as a 'paring knife'. This inspired me to look up the term 'Shucking Knife' because this is what I have always called an 'oyster knife'. When I discovered that Wiki did not have a page or a link for 'shucking knife' I was confronted with the overly convoluted requirements that Wiki has in order to let you know that I am aware of a synonym for one of your terms. I had to 'think' much too hard.

Yes you're right it is a difficulty and yet we need to have some sort of minimum standard as well. Shucking knife definitely exists but if you look at shucking and shuck, shuck says '[t]o remove the shuck from (walnuts, oysters, etc.).' which makes me thing it's possibly just a knife for shucking, in the same way that a whittling knife is just a knife for whittling, and therefore does not need an entry. But in general use, being accessible for new editors while trying to maintain consistency throughout our format is a challenge, there's no two ways about it. Renard Migrant (talk) 22:42, 11 September 2016 (UTC)[reply]

Restoration of Sysop Privileges

[edit]

Given the amount of time with no action on the disputed issue, I'm prepared to restore sysop privileges to @CodeCat and to @Wyang if they will commit to not editing Module:links except for changes both agree to beforehand, at least until both agree that the conflict is resolved.

Please state here whether you agree to this. Thanks! Chuck Entz (talk) 23:58, 11 September 2016 (UTC)[reply]

Can someone else make the changes, then? If neither of us is allowed to edit it, that implies that there is a consensus for Wyang's preferred version. The reason I continue to press this is because I fear that if I don't, nothing will be done about it yet again. —CodeCat 01:39, 12 September 2016 (UTC)[reply]
@CodeCat, maybe you could provide a link to the exact revision of the module which you would say is the correct status quo? --Daniel Carrero (talk) 01:45, 12 September 2016 (UTC)[reply]
[1], [2], [3]. These three revisions ensure that the Thai transliteration code is placed in the Thai transliteration module where it belongs (according to the current consensus on treatment of transliteration modules), rather than in Module:links where it does not belong. —CodeCat 01:49, 12 September 2016 (UTC)[reply]
Do other people agree with reverting the modules to these exact versions?
I'll repeat what I said in another discussion:
  • I support restoring sysop privileges to both CodeCat and Wyang.
  • I support reverting the modules to the status quo, and in the face of this huge disagreement, I urge @Wyang to help in the creation of the vote before implementing any new proposal.
Correct me if I'm wrong: I seem to remember that some entries were already edited based on Wyang's system and reverting the modules to the status quo would break the entries. Still, IMO the status quo should prevail and the entries should be fixed. --Daniel Carrero (talk) 02:03, 12 September 2016 (UTC)[reply]
I also support restoring sysop privileges to both CodeCat and Wyang. In addition, I support restoring the modules to the status quo. Unfortunately, as Chuck pointed out, it's not totally obvious what this is, but in my mind, since the edit war specifically concerned references to Module:th in Module:links (+ supporting code), and since the references to Module:th weren't present in the module beforehand, the status quo should not include them: Specifically, it shouldn't include Module:th, 'phonetic_extraction' or the code that references 'phonetic_extraction'. Benwing2 (talk) 02:50, 12 September 2016 (UTC)[reply]
Back then there wasn't even any automated romanisation for Thai; restoring the previous version would simply wipe out the romanisations in thousands of Thai entries. I'm really confused. There was no consensus for CodeCat's edit, despite her claiming there is. I was only adding in transcription support at Module:links (which was lacking transcription support) per the consensus of the Thai editors, in a manner that is most appropriate for further editing in Thai and other similar languages. If you do not agree, voice your arguments other than voicing “I don't like it”! I spent so much effort arguing for why storing transcription and transliteration modules separate is beneficial in the long run, and what I got was non-participation and the indifferent “so what happens now?” (1, 2). Decision-making should not be like this - having people voice their opinions without having a critical appraisal of the arguments for and against makes the decisions arrived at highly prone to unintelligence. It shouldn't be the case that you can say your preference and expect it to be enacted without giving a reason. Why do the harms outweigh the benefits if we keep the transliteration and transcription modules separate for these languages, when our language editors routinely apply different romanisations when editing these languages, and printed dictionaries of these languages show that authors regard the different modes of romanisation as suited to different purposes? If it cannot be demonstrated that the harms do outweigh the benefits for these languages and there is no willingness to demonstrate, there is no justification for enacting this opinion or restoring the “previous version” which abolishes the functionality altogether. Wyang (talk) 03:54, 12 September 2016 (UTC)[reply]
(edit conflict) We're trying to achieve a compromise here. In my book, adopting a version more heavily weighed against one side than the other side even asked for isn't a compromise. What you're asking for basically breaks a large number of Thai entries that were modified in good faith by the Thai community after Wyang provided the capability for it with his first edit. Regardless of how things are going to end up eventually, that's too much collateral damage to make it a reasonable first step toward a compromise. Remember the story of how Solomon pretended he was going to cut a baby in half in order to see from the reaction of the two claimants which was the real mother? This is like cutting the baby in half first. Chuck Entz (talk) 12:41, 12 September 2016 (UTC)[reply]
So, over at the Grease pit, @Vahagn Petrosyan had mentioned that many languages require both transliteration and transcription. Do we think that the inclusion of both, if the transcription differs, could kill two birds with one stone? —JohnC5 17:05, 12 September 2016 (UTC)[reply]
That's what Wiktionary:Votes/2016-08/Enabling different kinds of romanization in different locations is supposed to address. But it's not going anywhere. —CodeCat 17:13, 12 September 2016 (UTC)[reply]
@Wyang: I have a question for you, and I'm sorry if you already explained it somewhere. I'm going to ask anyway: Given the benefits about your proposal that you explained, don't you think that Wiktionary:Votes/2016-08/Enabling different kinds of romanization in different locations has a good chance to pass? More importantly, is the linked vote satisfactory for you, or would you change something in the proposal? --Daniel Carrero (talk) 04:03, 12 September 2016 (UTC)[reply]
@Daniel Carrero: I believe the answer to your question is on the vote's talk page. —suzukaze (tc) 04:05, 12 September 2016 (UTC)[reply]
OK, but Wyang may still choose to help building the vote. If the vote explains the proposal correctly and passes, it will mean we are all on the same page and understand the implemented proposal.
In the previous discussion, Chuck Entz presented a few possible versions of the status quo to choose from. Is anyone interested in discussing what exactly is the right one? If no one objects, I'll just trust CodeCat and revert the three modules to the revisions that she mentioned. --Daniel Carrero (talk) 10:04, 12 September 2016 (UTC)[reply]
Why? I have explained the reasons of my objection well enough above, and in the previous discussions. Why do the harms outweigh the benefits if we keep the transliteration and transcription modules separate for these languages, when there is ample evidence suggesting the contrary? Nobody was interested in engaging in discussion to argue for the version that you are trying to restore. Why is reverting to a version which cannot be justified even being considered? Wyang (talk) 11:35, 12 September 2016 (UTC)[reply]
Please understand: It's not about whether the proposal is good, it's about whether other people agree with it, and are on the same page. That's why some of us are interested in having a vote, which would explain and record the proposal, and let others judge its merits. To put it another way: if the proposal is really good, the vote is probably going to pass and we'll do exactly as you proposed. --Daniel Carrero (talk) 11:57, 12 September 2016 (UTC)[reply]
We haven't had votes on the architecture of the modules, so I don't see what makes the "status quo ante" Wyang so sacred. If Wyang took the initiative to overcome a language(s)-relevant limitation of the module architecture, it seems to me that it merits our respect. If our architecture doesn't provide the required flexibility without some kind of kludges, so much the worse for the existing architecture. In this and on many other matters I favor accommodating decentralized decision-making. DCDuring TALK 12:33, 12 September 2016 (UTC)[reply]
Wyang's changes don't do anything that could not be achieved within our existing module framework. The three edits Wikitiki made to the modules, and which I proposed they be restored to, show that. The only reason he did it is because he doesn't like the framework (specifically, that transliteration modules do other kinds of romanization too). Therefore, I proposed that if he doesn't like our current consensus on what transliteration modules do and how they are used in other modules/templates, he should make a vote to change it. So far he hasn't shown any interest. Most of what has happened since then is several editors trying to get Wyang to cooperate on formulating a vote, while Wyang himself is skirting around the issue and avoiding a vote. Is this appropriate behaviour when someone's changes have been challenged? And would it be appropriate to allow said changes to remain in place when they have been challenged so heavily and the user is not prepared to let the community decide per vote on the issue? —CodeCat 13:56, 12 September 2016 (UTC)[reply]
As I said above, the only point revolved around in the “no”-camp is “I don't like it”, without any explanation given. Why do the harms outweigh the benefits if we keep them separate in these languages, when there is ample evidence suggesting the contrary? You keep citing your version as consensus, but where is the vote showing that? Using purpose-suited romanisation is the consensus for languages with a transcription-transliteration distinction ({{ko-etym-native}}, etc.). If you do not like this practice, you should bring this up in a discussion and explain your reasoning, aside from saying “I don't like it”. There is no point blaming the implementer for implementing what was already a custom in languages you are not involved in, and barring the improvement in the module infrastructure for these languages. Wyang (talk) 22:55, 12 September 2016 (UTC)[reply]
As I said before, there is no "no"-camp, just people that you need to convince. The burden of proof is on you. Once that's done, the vote should be able to pass. We are repeating the same arguments over and over. This discussion is going nowhere. I reverted the three modules to the revisions chosen by CodeCat. Feel free to discuss if I should have done something different. --Daniel Carrero (talk) 23:07, 12 September 2016 (UTC)[reply]
I reverted the edits I could revert. Discussion is still ongoing; you cannot voice your opinion and expect it to be enacted without justifying it. Any unilateral measure taken constitutes disrespect to the participants of discussion. Wyang (talk) 23:14, 12 September 2016 (UTC)[reply]
"you cannot voice your opinion and expect it to be enacted without justifying it" ... ha! I see some irony there, and it's amusing. But it may be just me. Seriously, if I did something wrong please someone step up and say what to do. I restored the modules again. --Daniel Carrero (talk) 23:21, 12 September 2016 (UTC)[reply]
You are insane. You did not even know what the contention was, and yet you feel empowered to trample on whatever modules you can get your hands on simply because you can. Wyang (talk) 23:28, 12 September 2016 (UTC)[reply]
Good grief. The diff you linked to does not indicate that I'm completely clueless about the contention. It does indicate that I was politely asking you for your opinion on the best way to word a vote. --Daniel Carrero (talk) 23:34, 12 September 2016 (UTC)[reply]
Asking for my opinion on the best way to word a vote... when it should not be relayed to a vote at all, because there is no argument input from people arguing we should confuse transliteration and transcription. There are numerous arguments for keeping the modules separate being put forth in the discussion, such as (1) our editors in these languages already implement the practice of using purpose-suited romanisation; (2) printed dictionaries in these languages use differential romanisation and deem the different modes of romanisation as suited to different purposes; (3) it conforms to existing language-specific module infrastructure developed for these languages; (4) it is prospectively designed, and does not discourage further improvements in these languages. But the arguments against? One: "I don't like it". It is unfair to use a vote to end a discussion, when one side is only interested in expressing their opinion and not giving any rationales for it. It is facilitating mindless decision-making. Wyang (talk) 23:46, 12 September 2016 (UTC)[reply]

──────────────────────────────────────────────────────────────────────────────────────────────────── You have failed to provide an accurate view of the opinions of other people. "But the arguments against? One: "I don't like it". is a straw man.

Could you please change your mind and be willing to cooperate in the vote? We could add your 4 points in the rationale. --Daniel Carrero (talk) 23:53, 12 September 2016 (UTC)[reply]

If I have failed to provide an accurate view of the opinions of other people, then could you please list the arguments against? We are still at a stage in the discussion where we are struggling to list any arguments from one side. This is way too immature to call on votes. Votes are evil. It allows such disproportionate argumentation to be easily distorted to produce an unintelligent consensus for the reason of sheer numbers only. Wyang (talk) 00:57, 13 September 2016 (UTC)[reply]

Bullying

[edit]

User:Daniel Carrero completely ignored the discussion and proceeded to revert the modules to a version he prefers and locked the modules. This is unacceptable bullying behaviour and shows no consideration for the rules of discussion.

(cur | prev) 23:16, 12 September 2016 Daniel Carrero (talk | contribs) . . (138 bytes) (-46) . . updated since my last visit (thank)
(cur | prev) 23:15, 12 September 2016 Daniel Carrero (talk | contribs) m . . (184 bytes) (0) . . (Protected "Module:th-translit" ([Edit=Allow only administrators] (indefinite) [Move=Allow only administrators] (indefinite))) (thank)

I urge other admins to please look into this abuse of power and take actions. Wyang (talk) 23:19, 12 September 2016 (UTC)[reply]

I don't see bullying going on. I see you refusing to coöperate with him when he seeks to help you resolve this dispute, however. It's easier to decry alleged abuses of power, but the right thing to do is work on moving forward. —Μετάknowledgediscuss/deeds 23:31, 12 September 2016 (UTC)[reply]
You might deliver that message to Dan. It certainly seems high-handed to me. DCDuring TALK 23:33, 12 September 2016 (UTC)[reply]
It's not bullying, but he shouldn't have done it. Fortunately Anatoli reverted the edits so I didn't have to.
When emotions are high is exactly the wrong time to take such actions- it's just throwing gasoline on the fire. Besides, they were completely out of process and I just don't see the consensus to act now. Chuck Entz (talk) 01:41, 13 September 2016 (UTC)[reply]
All right. --Daniel Carrero (talk) 01:46, 13 September 2016 (UTC)[reply]

No Middle Danish?

[edit]

It seems we do not have categories, language codes or anything for Middle Danish. Does Wiktionary subsume Middle Danish under Danish, and if so, why? Has this been discussed before?__Gamren (talk) 14:27, 12 September 2016 (UTC)[reply]

I bet it hasn't been discussed before. We can certainly create a language code for Middle Danish if no one objects; I'd suggest gmq-mda. —Aɴɢʀ (talk) 15:13, 12 September 2016 (UTC)[reply]
How big are the differences? —CodeCat 15:38, 12 September 2016 (UTC)[reply]
Oh we certainly can it's whether we should. Renard Migrant (talk) 16:07, 12 September 2016 (UTC)[reply]
You can see some samples at w:History of Danish#Medieval Danish. Maybe someone who knows Danish can tell us if that is as different from modern Danish as Chaucer is from modern English. —Aɴɢʀ (talk) 16:37, 12 September 2016 (UTC)[reply]
I can tell right away that the spelling is very distinct from what is used today, but I think a modern Danish speaker could figure that out, at least. However, what is described there is what I'd call Old Danish. The definitions on that page don't really sit well with me. What it calls Old Danish is what we'd just call Old Norse, and it was written in the same time as the Old Icelandic that many more are familiar with. w:Old Norse says: "The 12th-century Icelandic Gray Goose Laws state that Swedes, Norwegians, Icelanders and Danes spoke the same language, dǫnsk tunga ("Danish tongue"; speakers of Old East Norse would have said dansk tunga). Another term used, used especially commonly with reference to West Norse, was norrœnt mál ("Nordic speech")." So even the Icelanders said they spoke Danish, at the time. —CodeCat 16:45, 12 September 2016 (UTC)[reply]
Also consider the different definition given for w:Old Swedish. Those years are closer what I would expect for "Old Danish" as well. —CodeCat 16:46, 12 September 2016 (UTC)[reply]
I guess the next question is, how late are the words we're already calling Old Danish attested? If our Old Danish words are words/spellings attested up through the 15th century, then the reason we don't have Middle Danish is that what we're calling Old Danish developed directly into (early) Modern Danish. —Aɴɢʀ (talk) 16:57, 12 September 2016 (UTC)[reply]
There is a Middle Norwegian stage conventionally dated from 1350–1550, thus contemporary with Late Old Swedish. I think Late Old Swedish is sometimes called Middle Swedish (and Early Old Swedish consequently plain Old Swedish), but rarely, and same for Danish. However, Middle Icelandic is used for the same period. (In Faroese, it's the Old Faroese period.) --Florian Blaschke (talk) 03:14, 13 September 2016 (UTC)[reply]
Regarding chronology: Nudansk Ordbog and Den Danske Ordbog agree that (in approximate years, obviously): Old Danish lasted from 800-1100, Middle Danish 1100-1525 and Modern Danish 1525-present (DDO says 1500-present, but that's probably just a matter of precision). Regarding intelligibility: As a non-linguist speaker of Modern Danish, I cannot easily read Middle Danish, even if can recognize cognates once I know the translation. Compare: takær bondæ annær man mæth sin kunæ oc kumar swa at han dræpær anti mannen... with Tager en bonde en anden mand med sin kone, og sker det, at han ikke dræber manden... (see also Gammeldansk Ordbog, which places Middle Danish (gammeldansk) at 1100-1515, and furthermore separates it into older and younger periods, the division being at 1350). Regarding classification: I see that we have lots of references to Middle Danish, but they usually link to Danish entries (see eg. Storm, gilding, nettle). There is also at least one Danish lemma tryde, which I have no reason to believe exists in Modern Danish.__Gamren (talk) 16:58, 13 September 2016 (UTC)[reply]
800-1100 would conflict with the generally agreed definition of Old Norse, which was also spoken throughout that period. Essentially, if we adopt that definition, we'd have to say Proto-Norse split into Old Norse and Old Danish in the year 800, which is complete nonsense. —CodeCat 17:01, 13 September 2016 (UTC)[reply]
This is probably just a terminological question. Olddansk/runsvenska/Old East Norse was, as I understand it, one of two varieties of Old Norse, which we merge with Old West Norse (which is probably quite justified), and our Old Danish corresponds to gammeldansk, no? So the only question is whether Old Danish is the right word. The definitions I gave above correspond with our definitions (given by @Daniel Carrero, who may wish to say something) and the ones given in the WP article given above, but it is entirely possible this doesn't correspond to usage in Anglophone literature - I really wouldn't know! and I'm sorry if I made this a muddle.__Gamren (talk) 19:34, 13 September 2016 (UTC)[reply]

RevisionSlider

[edit]

Birgit Müller (WMDE) 14:56, 12 September 2016 (UTC)[reply]

Transliteration nomenclature vote

[edit]

I created this vote: Wiktionary:Votes/2016-09/Renaming transliteration. Please provide feedback on the talk page to help improve the vote as necessary. —CodeCat 16:08, 12 September 2016 (UTC)[reply]

If this vote passes, I assume we'll rename all pages in Category:Transliteration policies. I think this should be stated in the vote. --Daniel Carrero (talk) 16:16, 12 September 2016 (UTC)[reply]
Yes, it should. And the category itself will be renamed too of course. —CodeCat 16:18, 12 September 2016 (UTC)[reply]

WT:CFI should explicitly be for the main namespace

[edit]

WT:CFI (under a heading 'scope' perhaps) should explicitly state that it refers only to the main namespace. In other words (as a specific example) *montania is not subject to the rules here. Renard Migrant (talk) 20:39, 12 September 2016 (UTC)[reply]

CFI currently states that some things go in appendices, and reconstructions go in the Reconstruction namespace. I think it's better this way. Logically, there are some criteria for inclusion in the Reconstruction namespace; if there were no criteria for inclusion, you could include anything there.
The policy says: "Terms in reconstructed languages such as Proto-Indo-European do not meet the criteria for inclusion. They may be entered in the Reconstruction namespace, and are referred to from etymology sections." I disagree with that wording. It's true that we often say: Proto-Indo-European doesn't meet CFI., but I think this is a problemantic statement. Proto-Indo-European does meet CFI, and the correct course of action is to place it in the Reconstruction namespace.
Relatedly, Reconstruction pages and some appendices follow closely the entry format so, in my opinion, both WT:EL and WT:NORM should explicity mention exactly to what extent they apply to these pages. Related discussion: Wiktionary talk:Normalization of entries#Proposal: encompassing reconstruction pages. --Daniel Carrero (talk) 21:21, 12 September 2016 (UTC)[reply]
Reconstructions shouldn't be subject to some criteria for inclusion, but not these ones. I think any reconstruction from a reliable source should be considered a valid entry title. 'Reliable source' of course can be subject to criteria that we can all discuss before implementing. Renard Migrant (talk) 21:27, 12 September 2016 (UTC)[reply]
Suppose we use your idea as an actual, formal rule: "any reconstruction from a reliable source should be considered a valid entry title." Where can we place the rule? WT:PROTO is a good candidate, but I don't like how it has a long encyclopedic explanation of what a reconstruction is, instead of a simple link to Wikipedia or to a help page. I prefer policy pages to contain only regulations when possible. If we can delete all this stuff, I would be glad to place (voted and approved) criteria for inclusion of reconstructions in WT:PROTO. I would also like WT:CFI to link to WT:PROTO if we do that. What do you think? --Daniel Carrero (talk) 21:41, 12 September 2016 (UTC)[reply]
I think WT:PROTO if anything isn't really a policy page at the moment. It feels more like a Wikipedia entry. It's well-written but we just don't need that much. It also doesn't really contain much actual policy. Renard Migrant (talk) 21:49, 12 September 2016 (UTC)[reply]
WT:PROTO said: "It must not be modified without a VOTE." But I did not find a vote that confirms this in the first place, so I demoted it to Think Thank. --Daniel Carrero (talk) 21:56, 12 September 2016 (UTC)[reply]
"Any reconstruction from a reliable source", without further cavets, sounds like a bad guideline for reconstruction inclusion. This would allow the inclusion of all sorts of transcription variants of the same reconstruction (which we currently generally standardize away, though allowing them as redirects). More controversially, this would also allow the inclusion of reconstruction variants — cases where all researchers agree that a proto-form is to be reconstructed as the source of data Y, but disagree on what its shape was. I would propose that such disagreements should be covered as discussion within a single entry. --Tropylium (talk) 22:13, 17 September 2016 (UTC)[reply]
If there weren't a hundred million votes already taking places there's a couple I'd like to propose. Renard Migrant (talk) 21:00, 12 September 2016 (UTC)[reply]
What would you like to propose? --Daniel Carrero (talk) 21:21, 12 September 2016 (UTC)[reply]
On my talk page, Dan Polansky and I discussed having single words de jure meet CFI. Sometimes like doglike doesn't actually meet CFI as it's written now. Of course nobody would actually delete it but it would be nice to have to rules cover what actually happens. Renard Migrant (talk) 21:27, 12 September 2016 (UTC)[reply]
Good idea. I'd probably support that. (as discussed in: User talk:Renard Migrant#CFI and idiomaticity clarification) --Daniel Carrero (talk) 21:44, 12 September 2016 (UTC)[reply]
@Renard Migrant, Dan Polansky: How many active votes do you think we should have on {{votes}}, before you feel it's OK to create the new vote for single words meeting WT:CFI? --Daniel Carrero (talk) 00:13, 13 September 2016 (UTC)[reply]
The way I see it: CFI was designed to apply only to the main namespace. Thus, it should be clear that the rules currently at WT:CFI only apply to the main namespace. Of course we need inclusion criteria for other namespaces, and these criteria may also be added to the page WT:CFI, but in a separate section from the current rules that only apply to the main namespace, or may be on its own page. --WikiTiki89 21:49, 12 September 2016 (UTC)[reply]

Stress marks and syllable marks

[edit]

I've been working on putting syllable marks in lately, and I've noticed that the stress marks are interpreted as syllable marks when categorizing words by the number of syllables. When there are stress marks, do we need to put a syllable mark in front of the stress mark, e.g. should university be /ju.nɪ.ˈvɝ.sə.ti/ or /ju.nɪˈvɝ.sə.ti/? — justin(r)leung (t...) | c=› } 23:04, 12 September 2016 (UTC)[reply]

I was wondering the same thing. I asked about it to Metaknowledge in User talk:Metaknowledge#Dot together with the stress marker, and he replied there. --Daniel Carrero (talk) 23:09, 12 September 2016 (UTC)[reply]
@Daniel Carrero Thanks! Perhaps there should be something in the modules to prevent stress marks and syllable marks from being together. On a related note, should we be following the Maximal Onset Principle? — justin(r)leung (t...) | c=› } 23:48, 12 September 2016 (UTC)[reply]
I created Category:IPA for English using .ˈ or .ˌ and started populating it with any categories entries that seem to violate the rule that Metaknowledge described. If we really don't want a dot followed by a stress marker, then I believe the correct course of action would be fixing all entries in the category.
Concerning your question about the Maximal Onset Principle, if you directed it to me, I prefer if someone else more knowledgeable than me answered that instead. --Daniel Carrero (talk) 03:36, 13 September 2016 (UTC)[reply]
For English, I would follow the Maximal Onset Principle for stressed syllables first, and also make sure any stressed syllable with a lax vowel has at least one coda consonant. Once the stressed syllables are maximized, the unstressed ones will take care of themselves. In other words, happy should be syllabified /ˈhæp.i/, not /ˈhæ.pi/. That said, however, I do want to reiterate something I've said many times before: syllabification in English is far from obvious, and syllable boundaries are very often perceived to be located within consonants. Evidence suggests that the /p/ of happy is not exclusively in either syllable; rather it's simultaneously the coda of the first syllable and the onset of the second. But there's no convenient way to show that in IPA. For this reason, I personally am often very reluctant to mark syllable boundaries except in cases of vowel hiatus, where it's a convenient way of showing that a sequence of two vowels isn't a diphthong (e.g. Joey vs. Joy). —Aɴɢʀ (talk) 09:33, 13 September 2016 (UTC)[reply]
I don't have very strong feelings about putting the syllable boundary marker and the stress marker next to each other. Putting them both isn't wrong, but it certainly isn't necessary. —Aɴɢʀ (talk) 10:19, 13 September 2016 (UTC)[reply]
IPA is simply wanting a way to mark ambisyllabic consonants as found in West Germanic. We could add one as a house rule. /hæ‿p‿ɪ/ or something less ugly. Korn [kʰũːɘ̃n] (talk) 11:09, 13 September 2016 (UTC)[reply]
Another possibility would be listing both: "/ˈhæp.i/ or /ˈhæ.pi/". --Daniel Carrero (talk) 11:18, 13 September 2016 (UTC)[reply]
Definitely not that. That implies there are two possible syllabifications, and worse yet, that there's a way of distinguishing them. As for how to mark it, I think if we must mark it, then /ˈhæp.i/ is the least bad option. If we do invent a house notation, I'd rather use something that takes up less space, like /ˈhæpˇi/; we could define ˇ as meaning "the previous consonant is ambisyllabic". But if I'm honest, I'd really rather just stick to /ˈhæpi/, which is unambiugous, easy to read, and makes no theoretical claims as to syllabification. —Aɴɢʀ (talk) 12:22, 13 September 2016 (UTC)[reply]
Personally, I think /ˈhæ.pi/ is better than /ˈhæp.i/, because the latter looks to me like there is meant to be an audible break between the /p/ and the /i/. I agree that because of these problems, it's better to just have /ˈhæpi/. As for putting . before a stress mark, I think it's entirely unnecessary and thus oppose it. --WikiTiki89 13:46, 13 September 2016 (UTC)[reply]
I agree with Wikitiki. I think it would be better to omit the syllable marks entirely for English. Benwing2 (talk) 14:27, 13 September 2016 (UTC)[reply]
Purely from a user perspective, I'd prefer if a dictionary would have a house notation like /ˈhæṗɪ/, rather than omit information because of minor issues. Korn [kʰũːɘ̃n] (talk) 14:42, 13 September 2016 (UTC)[reply]

Not working for two-syllable words?

[edit]

I noticed that using syllable markers in IPA transcriptions now adds words to categories indicating the number of syllables the words have, but only if the words have three or more syllables. Thus, /əˈfɹʌnt/ or even /ə.ˈfɹʌnt/ does not add affront to "Category:English 2-syllable words". Why? — SMUconlaw (talk) 12:20, 26 September 2016 (UTC)[reply]

Please read the description of Category:English 2-syllable words. --Daniel Carrero (talk) 19:26, 26 September 2016 (UTC)[reply]
For the record, I oppose having obviously broken and non-working code in the mainspace, and also the passive-aggressive "supposedly"-categories that attempt to pin the blame on the existing, working entries. Equinox 21:50, 26 September 2016 (UTC)[reply]
The category title is awful, though it does do something to remove the illusion that we are anything but a work in progress. Why would we want to have categories that were conspicuously mispopulated? Should the offending code be neutered until it is emended? DCDuring TALK 00:58, 27 September 2016 (UTC)[reply]
I have removed the code. DTLHS (talk) 01:10, 27 September 2016 (UTC)[reply]
Thanks. DCDuring TALK 02:39, 27 September 2016 (UTC)[reply]
In case anyone cares, the categories were hidden, so the situation wasn't that bad. --WikiTiki89 17:39, 27 September 2016 (UTC)[reply]
Thanks. I care, but I hadn't checked. DCDuring TALK 20:39, 27 September 2016 (UTC)[reply]
I think it has been firmly established in previous discussions that the only way to have a syllabification for English Words is to have a human involved. I can understand the desire not to have the syllabification markers in the IPA codes for English words, Since I am interested in the categories for 1 or more syllables, I propose that a new template such as SYL be created and the categories for English 1-syllable terms, 2-syllable etc. be filled by the new template. In the process of creating the SYL "call" I can delete the dots (.) which I have placed in IPA template, and call the This will be harder than the previous method of just reviewing the already created list of words to see which are mis-classified, but I think this might be sufficient to handle your objections. If necessary, the code can create entries in a local user's category space. I volunteer mine. Since I don't know LUA enough, I suggest the new SYL module use the code previously supplied by Daniel Carrero. Bcent1234 (talk) 21:03, 30 September 2016 (UTC)[reply]
Why do we need a template? Why not just add the categories manually? You don't have to delete dots unless they happen to be problematic. --WikiTiki89 21:10, 30 September 2016 (UTC)[reply]
I'd rather use a template and not add these categories manually. They are too long ("[[Category:English 2-syllable words]]") and would be a pain to write or copy/edit. We can use a shorter template like {{syl|4}} to place the entry in a 4-syllable category. --Daniel Carrero (talk) 00:09, 1 October 2016 (UTC)[reply]
can a template be smart about the language or should use include en as a parameter or have en in the name? Bcent1234 (talk) 21:17, 1 October 2016 (UTC)[reply]
I'd rather use a template, but since this seems to be something we can't just do (witness the removal of the previous Lua code allowing this to be a work-in-progress) I am just going to put the category call in the pronunciation section of words, and start from scratch. I value syllabification, but don't want to make waves in other folks' domains. As a group project, I support making wiktionary useful for all who can access it. 13:45, 3 October 2016 (UTC)
@Bcent1234: You can use the template {{cln|en|X-syllable words}}, which puts the page in [[Category:English X-syllable words]]. I don't think a short template like {{syl|en|X}} is justified for this purpose. --WikiTiki89 17:52, 5 October 2016 (UTC)[reply]

What Needs to Happen

[edit]

The main obstacle to resolving this dispute is that neither CodeCat nor Wyang trust the process- for good reason. In past disputes, we've had an unfortunate tendency to put out the immediate fires and then sweep the issue under the rug. Faced with this possibility, both have tried to get things the way they want them so that they don't lose out when everyone gets tired of the issue and moves on. The one thing we don't want to do is to jump in and take unilateral action- that will just confirm the worst fears of the one who loses out.

We need to resolve this now, before it becomes out of sight, out of mind. The way to do this is to get down to discussing what the new configuration should look like, in concrete terms.

Notice I said "discussing". We simply haven't gotten to the point of drafting votes, because we're still all talking past each other- any vote will most likely not address the issues needed to resolve the dispute and will just complicate things. The correct sequence is to come to a consensus, and then draft a vote, if necessary.

I can't do anymore at the moment because I'm still at work and it's really late. I'll spend some time on my way home trying to come up with a way to get the discussion started. Please don't blow things up in the meanwhile... Chuck Entz (talk) 02:25, 13 September 2016 (UTC)[reply]

I would support passing additional information (such as the name of the calling template and perhaps more) to the romanization module. This would make the Thai-specific code in Module:links that started this whole dispute unnecessary. I still think that there should only need to be one romanization module even if it provides both transliterations and transcriptions. --WikiTiki89 13:50, 13 September 2016 (UTC)[reply]
Another detail that hasn't been mentioned much is that Wyang wants to pass link target to the Thai module in order to find the transcription on the linked page. There are numerous reasons why this is a bad idea. Wyang has mentioned that the performance impact of reading the text of a page in a module is not as bad as people might assume at first, but that is not even the only issue. The romanization module must be able to romanize full unlinked sentences (such as in usage examples) and even redlinks. This cannot happen if the module depends on the existence of the link target. Not only that, but it would produce incorrect results for links with alt text, since it would transcribe the linked form and not the displayed form. --WikiTiki89 13:55, 13 September 2016 (UTC)[reply]
Is the reason for passing additional information such as the name of the calling template so that the Thai module can show a transliteration in etymologies and a transcription in translation sections? I'm opposed to doing that; I think it would be extremely confusing. Better to show both types of romanization in all places, as I've mentioned before. Allowing this would be a major user-facing change and needs a vote (that's why I had Dan create the vote). If this vote passes, then I think we should still require that transcriptions are always shown, and transliterations are also shown in the places where it's desired (e.g. etymology sections). Benwing2 (talk) 14:23, 13 September 2016 (UTC)[reply]
According to Wyang, some entries already do this. It should probably be reversed if there is no consensus for it. Though with how Wyang is, he'll put up a fuss and start another edit war. —CodeCat 14:27, 13 September 2016 (UTC)[reply]
I think that transcription and transliteration need to be separated on some level. First of all, one is conceptually an attribute of the script, and other of the language. Thus changes to a transcription of a script will have to be applied to all trans* modules separately making human errors likely. Second, transcription should be available to overriding while transliteration should always be automatically generated. Also, in historical languages using Abjads, it should be noted that having both of these would be useful, as one is a factual shape of the word as found in the text and other an educated guess and both are necessary to explain some etymologies.
Regarding the question of whether both or one romanization should be displayed, I suggest that, no matter what is decided to be the default option, appropriate html tags be placed around the transliteration so that a custom .css file can hide these for users that understand the script in question (seeing anything written in Cyrillic repeated in Latin can be slightly annoying when you already are native in the script).
Yet I do not understand the details of our current implementation and why Wyang's changes are creating problems. If his way of doing this is indeed too harmful I support reverting it, but then please draft an alternative solution to this. Crom daba (talk) 17:36, 13 September 2016 (UTC)[reply]
The alternative solution was Wikitiki's changes, which Wyang reverted over and over again and I reinstated over and over again. Contrary to what you might think, Wyang's changes actually did not establish separate transliteration and transcription. It merely bypassed the fact that the Thai transliteration module was called "translit" by putting the code that would have gone in there in Module:links instead. I argued that such code did not belong there, but it still remains there after months of bickering over it. —CodeCat 17:43, 13 September 2016 (UTC)[reply]
So what was the issue that Thai editors were complaining about? Crom daba (talk) 17:58, 13 September 2016 (UTC)[reply]
Wyang? He was complaining that transcription code should not go in a "transliteration" module, even though it's the normal practice on Wiktionary to do so. Because he didn't want to put the code where it belonged, he started messing with Module:links instead, and that's where I stepped in, and now we have this situation. —CodeCat 18:39, 13 September 2016 (UTC)[reply]
The whole point is: transcription and transliteration utilities should be separately maintained in the module system, whenever there is a foreseeable possibility that purpose-suited romanisation may be useful for the language. The argument is how to design a module structure, specifically a romanisation infrastructure, that best supports the features of these languages and therefore the wishes of the language-editing community. We are not proposing that language A should use X format of romanisation, or that Akkadian/Tibetan romanisations should be written as such, or that different modes of romanisation should be used in different locations (cf. link); these are all highly language-specific questions that need to be addressed separately and individually in discussions among knowledgeable editors. Our role here is to envisage the language-specific romanisation requirements that may be proposed, and partition our stored romanisation utilities in a way that is most regular and easiest to invoke, and in a way that does not deter editors in these languages from contributing in a way they consider most appropriate for the language.
The crux is “foreseeable possibility” of purpose-suited romanisation for a language. The reason purpose-suited romanisation is relevant is due to the different natures of the two modes of romanisation: transliteration is spelling-based, thus more etymology-oriented, and transcription pronunciation-based. The case of abjads is slightly different, but the benefit of storing utilities still applies. Why is purpose-suited romanisation and hence transliteration-transcription utility separation relevant on Wiktionary? Because:
  1. It is already being implemented in these languages ({{ko-etym-native}}). It is the consensus of the language community on how romanisations should be differentially applied. It is unreasonable to demand that the practice of using purpose-suited romanisation, which has been adopted universally in a language (you do not edit) for nearly ten years, be “reversed” without supplying any reason.
  2. Printed dictionaries do the same. The following are all the previewable Tibetan-English or English-Tibetan dictionaries on Google Books:
    Tibetan-English: 1, 2, 3
    English-Tibetan: 1, 2, 3.
All the Tibetan-English dictionaries use transliterations to romanise, and all the English-Tibetan ones use transcriptions to romanise. Why? Because different modes of romanisation are suited to different purposes – transliteration for etymology and transcription for translation from English.
  1. It conforms to the existing module infrastructure for these languages. In languages observing a transliterative-transcriptive contrast or languages where transliteration is intrinsically impossible, the transliteration-transcription distinction is strictly adhered to when the language-specific modules were designed. Where transliteration is impossible, the term “transliteration” is not ambiguated to mean “transcription”; we do not have Module:zh-translit and Module:ja-translit, instead we use Module:zh/Module:zh-pron and Module:ja/Module:ja-pron to handle transcriptions. Where the transliteration-transcription distinction makes a difference on a romanisation level, modules are named and maintained unambiguously; there are Module:bo-translit and Module:th-translit for transliteration, and Module:bo/Module:bo-pron and Module:th/Module:th-pron for transcription. It is the consensus of how romanisation utilities are maintained in these highly script-pronunciation discordant languages.
  2. It makes maintenance easier. Maintaining the transliteration and transcription modules separately makes whatever preference there is for the romanisation output less difficult to achieve. Seeing that abjads were raised before, if we decide to apply juxtaposed transliteration-transcription for all abjads or languages X, Y, Z, we can just add in some brief code in the links module to concatenate the outputs of transcription and transliteration modules of these languages (one can also be manually supplied), as these modules have already been recorded appropriately in language_data. If one day we would like to remove transcriptions in romanisations for languages X, Y, Z, we could simply remove the brief code added in earlier, without having to go through all the *-translit modules and delete the transcription passages, wondering whether they should be kept somewhere before they vanish.
  3. Using page parsing to achieve romanisation has no demonstrable harm. Transcription is inherently more difficult than transliteration; it is nearly perfectly automatable for certain languages (e.g. Korean) but most of the time it needs to be achieved using additional tricks, and page parsing is one of the tricks. I cited w:Wikipedia:Don't worry about performance before and I still think it is also very relevant for the technical structure on Wiktionary. The possibility of using page parsing has made us realise that it is perfectly possible to obtain both the transliteration and transcription for a word when they differ greatly, and this is very exciting. I think all the Thai editors would agree that the implementation of parsing since early this year has made their work much easier (Wiktionary:Statistics, sorted by change in #gloss definitions), and I doubt anyone would be in favour of removing this functionality and having to supply romanisations manually. Likewise for Chinese templates.
  4. Having an additional functionality module which does something useful is always beneficial. As long as it is maintained adequately. This could be said of transcription modules using parsing to obtain the romanisations. Even though it will not be able to grab a transcription from uncreated entries, or entries which have no pronunciation information, this is an indication that those entries need to be improved. In the case of Thai, having some automatic romanisation is better than having none and having to supply one manually. In the end, we aim to encompass all words in all languages and utilities have to be adapted to ensure we are at our highest efficiencies while progressing towards that goal. I'm sure the functionalities of this site won't be limited to what is present at the moment. If we want to build a Thai transliterator and a Thai transcriber to romanise a Thai passage (similar to what Google Translate is doing simultaneously to the translation), or if we want to develop a tool to romanise a Tibetan text in different ways, having an infrastructure in place which does not confuse the utilities will be essential.
Very few things are improved all of a sudden. While there is no transcription consideration in the central modules and the transcription modules are not recorded, it is most appropriate to name and maintain the romanisation utilities accurately. When the transcription modules can be recorded in language_data like the transliteration modules, the code should be migrated and rewritten. Above are my rationales for keeping the transcription and transliteration utilities separate for these languages where the different modes of romanisation are contrastive. Wyang (talk) 07:02, 14 September 2016 (UTC)[reply]

News from French Wiktionary

[edit]

Hi all,

French Wiktionary is quite proud to publish every month a page with some fresh news about the project, Actualités. It is not targeting contributors but visitors and people interested into words. After 17 editions, we decided to translate our last edition of August into English, to make this publication available for you. It was quite a long job, so we are expecting your comments to know if it worth it, if we continue to translate our next editions or our previous editions too. Feel free to comments on any aspects of this publication, we are very open to improve it and our translation - as English is not my mother tongue. Thanks a lot to Andrew Sheedy (talkcontribs) and Pamputt (talkcontribs) for this translation! Noé (talk) 09:26, 13 September 2016 (UTC)[reply]

@Noé: Merci, mis amis (je sui americain, et no parle franc,ais...) Mis petites contributions. —Justin (koavf)TCM 13:54, 13 September 2016 (UTC)[reply]
@Koavf: In case you care, some corrections: mes amis, je suis, ne parle pas. --WikiTiki89 13:59, 13 September 2016 (UTC)[reply]
Je (ne) parle pas. UtherPendrogn (talk) 17:31, 13 September 2016 (UTC)[reply]
@Wikitiki89:, @UtherPendrogn: Merci! —Justin (koavf)TCM 22:49, 13 September 2016 (UTC)[reply]

Wikidata for Wiktionary: let’s get ready for lexicographical data!

[edit]

Hello all,

The Wikidata development team will start working on integrating lexicographical data in the knowledge base soon and we want to make sure we do this together with you.

Wikidata is a constantly evolving project and after four years of existence, we start with implementing support for Wiktionary editors and content, by allowing you to store and improve lexicographical data, in addition to the concepts already maintained by thousands of editors on Wikidata.

We have been working on this idea for almost three years and improving it with a lot of inputs from community members to understand Wiktionary processes.

Starting this project, we hope that the editors will be able to collaborate across Wiktionaries more easily. We expect to increase the number of editors and visibility of languages, and we want to provide the groundwork for new tools for editors.

Our development plan contains several phases in order to build the structure to include lexicographical data:

  • creating automatic interwiki links on Wiktionary,
  • creating new entity types for lexemes, senses, and forms on Wikidata,
  • providing data access to Wikidata from Wiktionary
  • improving the display of lexicographical information on Wikidata.

During the next months, we will do our best to provide you the technical structure to store lexicographical data on Wikidata and use it on Wiktionary. Don’t hesitate to discuss this within your local community, and give us feedback about your needs and the particularities of your languages.

Information about supporting lexicographical entities on Wikidata is available on this page. You can find an overview of the project, the detail of the development plan, answers to frequently asked questions, and a list of people ready to help us. If you want to have general discussions and questions about the project, please use the general talk page, as we won’t be able to follow all the talk pages on Wiktionaries.

Bests regards, Lea Lacroix (WMDE) (talk)

@Lea Lacroix (WMDE): Thanks to you and everyone at d: for working hard to try to integrate this project into Wikidata. —Justin (koavf)TCM 13:46, 13 September 2016 (UTC)[reply]

Open call for Project Grants

[edit]

Greetings! The Project Grants program is accepting proposals from September 12 to October 11 to fund new tools, research, offline outreach (including editathon series, workshops, etc), online organizing (including contests), and other experiments that enhance the work of Wikimedia volunteers. Project Grants can support you and your team’s project development time in addition to project expenses such as materials, travel, and rental space.

Also accepting candidates to join the Project Grants Committee through October 1.

With thanks, I JethroBT (WMF) (talk) 14:49, 13 September 2016 (UTC)[reply]

Quotation questions (redux)

[edit]

Last month, we had a discussion about quotations and what should be included and where. Several contradictory opinions were expressed. I'm willing to go make the changes to the quotations I added, but I don't think we quite reached consensus there on what to do. If this is not the right place to find consensus, please advise where I should take the questions at hand. Thanks! --Flex (talk) 17:04, 13 September 2016 (UTC)[reply]

I think yet another vote might be in order, alas. It's complicated for me at least because I don't necessarily dislike citations being shown together for different forms (perhaps per user setting), but I don't think they should be stored that way. See my comments in the discussion linked above. Equinox 20:14, 15 September 2016 (UTC)[reply]
Ok, since this month is about to expire, I'll put it on next month's cooler. --Flex (talk) 17:49, 28 September 2016 (UTC)[reply]

RFDO discussion for Template:character info

[edit]

I created an RFDO discussion for a high-use template. See: WT:RFDO#Template:character info. --Daniel Carrero (talk) 05:17, 14 September 2016 (UTC)[reply]

Deceased long-term user

[edit]

Eclecticology, one of Wiktionary's first editors, has died; see [4]. This was announced over at w:en:WP:AN, the en:wp administrators' noticeboard. As a very infrequent visitor here, I don't know your procedures for the accounts of deceased editors, but someone should remove his account's bureaucrat rights, since Wiktionary:Votes/2015-11/Eclecticology for de-admin and de-bureaucratting concluded in favor of removing both those user rights, but somehow only the administrator right was removed. Nyttend (talk) 12:02, 14 September 2016 (UTC)[reply]

Thanks for notifying us. It appears that the account does not have any user rights at the moment. —Μετάknowledgediscuss/deeds 19:53, 14 September 2016 (UTC)[reply]
Should the accounts of deceased users be permanently blocked in order to prevent hacking? —Aɴɢʀ (talk) 21:05, 14 September 2016 (UTC)[reply]
It's been done, but I see no reason to once rights are removed (in fact, it can be quite an annoyance if the block notification turns up on all their userpages when those userpages are still useful to other editors). —Μετάknowledgediscuss/deeds 21:11, 14 September 2016 (UTC)[reply]
If they get a cross-wiki block, as far as I know it doesn't show up on their userpages. --WikiTiki89 21:17, 14 September 2016 (UTC)[reply]
Some projects block accounts of deceased editors, and others don't, while some projects do other stuff (en:wp protects their userpages and adds a deceased-user template), so I figured I'd just announce it and let you regular editors follow your procedures. Nyttend (talk) 21:57, 14 September 2016 (UTC)[reply]
Eclecticology was involved in the establishment of Wiktionary, and was Wiktionary's first bureaucrat. He also created this very forum, the Beer parlour. RIP. --Yair rand (talk) 22:38, 14 September 2016 (UTC)[reply]
In case anyone wants to see this. Here's also the first ever BP discussion. --WikiTiki89 22:55, 14 September 2016 (UTC)[reply]
Sorry to hear it. But long may he live on in the edit histories! My opinion about blocking is that yes, we should do it where it is confirmed that somebody has died, just for the sake of security. A disused account might somehow be exploited or hacked; a blocked one generally can't be. Equinox 18:21, 15 September 2016 (UTC)[reply]

I just noticed that WT:ATTEST doesn't say anywhere that a word has to be attested in the language of the entry. Oversight? --WikiTiki89 22:27, 14 September 2016 (UTC)[reply]

We must have read each other's mind because I was thinking the exact same thing. I think that should be added in. It's an assumption that none of us really sought to codify before, but you know what they say about making things idiot proof. —CodeCat 22:29, 14 September 2016 (UTC)[reply]
Probably not an oversight. Plenty of words can be attested in words other than the language of the entry. Also, thanks for calling me an idiot. UtherPendrogn (talk) 22:33, 14 September 2016 (UTC)[reply]
If the shoe fits, UtherPendrogn. —CodeCat 22:36, 14 September 2016 (UTC)[reply]
Sometimes reports written in other languages are the only evidence about the existence and meaning of words in languages that were not reduced to writing until close to or after the time of their extinction or at least the loss of some of their vocabulary. This happens fairly often for names of organisms. Sometimes early explorers', missionaries', et al reports of the organism and a genus name or specific epithet are all that remains. I would think that some words in those languages could be reconstructed from multiple reports written in the language(s) of the explorers, et al. DCDuring TALK 23:10, 14 September 2016 (UTC)[reply]
Well here I guess we're talking about uses, not mentions or reconstructions. What you describe would pretty much be a reconstruction or maybe a mention. --WikiTiki89 23:14, 14 September 2016 (UTC)[reply]
Nothing is fool-proof, as the saying goes. And I worry that attempting to close a (debatably-existent) loophole that there's been no serious effort to game (a single user misunderstanding the rules does not strike me as a serious i.e. potentially-successful effort to re-interpret them) could cause more harm than good. What would be the effect on words in various extinct languages that are attested only embedded in works in other languages (e.g. an Ancient Greek text includes the only known few Paeonian words, a Spanish-language book gives the only known Ciguayo word)? I hope we can just rely on the majority to be as intelligent as we've been being, in discerning when a text is saying "and que is a word in French" versus when it's saying "and these are some words" and one user is just erroneously arguing "some" is French in that snippet. - -sche (discuss) 05:14, 15 September 2016 (UTC)[reply]
I'd appreciate not being called unintelligent if possible. UtherPendrogn (talk) 05:16, 15 September 2016 (UTC)[reply]
The alleged repercussions seem like a feature to me, not a bug. This might be a bigger can of worms, but I suspect that languages attested entirely by mentions perhaps shouldn't qualify for regular mainspace inclusion — not necessarily in terms of being moved to an appendix altogether, but they perhaps should be given substantially different treatment (e.g. in terms of entry layout) from better-attestable ones. --Tropylium (talk)
I'm not sure what those differences would be. Our current approach seems to handle them fairly well, actually. —Μετάknowledgediscuss/deeds 22:49, 17 September 2016 (UTC)[reply]
What I originally meant was that uses must used in the langauge of the entry. For mentions, I don't think it matters what language mentions them, as long as it can be deduced what language is being mentioned. --WikiTiki89 22:55, 17 September 2016 (UTC)[reply]
I think the passage on use-mention distinction covers this, and it's not a loophole. Something like "Venezia isn't a word in French" wouldn't count towards an attestation of Venezia in any language because it's not being used. Renard Migrant (talk) 23:29, 17 September 2016 (UTC)[reply]
Except that we allow mentions for some poorly attested dead languages as mentioned above. What I'm trying to say is that "I went to Venezia" cannot count as an attestation of the Italian word "Venezia", because the sentence is in English, even though this is a use not a mention (it can, however, count as an attestion of "Venezia" for English). --WikiTiki89 23:36, 17 September 2016 (UTC)[reply]

2nd Definitions vote

[edit]

I created Wiktionary:Votes/2016-09/Definitions — non-lemma to edit the next piece of WT:EL#Definitions.

This is basically a minor edit that converts two simple vote links into a single line of text. For this reason, I'm just creating the vote without prior discussion.

Let me know if this should be discussed further. If needed, we may postpone the vote. (which I find unlikely, but who knows) Feel free to edit the vote and change the wording. --Daniel Carrero (talk) 12:35, 15 September 2016 (UTC)[reply]

Actually, I expanded the voted text with a few bullet points. I believe these are already established rules to be documented. Hopefully, they shouldn't be controversial. --Daniel Carrero (talk) 14:12, 15 September 2016 (UTC)[reply]

bor vs. loan

[edit]

I'm thinking of creating a bot myself to implement the results of Wiktionary:Votes/2016-07/borrowing, borrowed, loan, loanword → bor. The vote passed with 14-5-3 (73.68%-26.32%) +1 late oppose.

But, as usual with template naming votes, even though apparently the tendency is the short name winning, (I voted support and I have my own arguments to back it up) there are people who voted oppose, defending the readability of the longer names. {{bor}} is a 3-letter name, like {{inh}} and {{der}} -- but "bor" does not really mean anything. Would people prefer using {{loan}} on all pages instead? --Daniel Carrero (talk) 17:55, 16 September 2016 (UTC)[reply]

We've already voted on this. It's a done deal. --WikiTiki89 17:58, 16 September 2016 (UTC)[reply]
14-4-4. Donnanz struck out his vote but it's still counted in the numbering. But whatever, a pass is a pass. Wikitiki89's right let's not open up the issue again a minute after it's been voted on. Renard Migrant (talk) 23:49, 16 September 2016 (UTC)[reply]
Donnanz did not strike out their vote, just a statement which was part of the vote.
I'm happy with that response. I also prefer {{bor}}. I was just checking to make sure. --Daniel Carrero (talk) 00:08, 17 September 2016 (UTC)[reply]

Proposed addition to WT:NORM: the plain space (U+0020) and newline (U+000A) are the only allowed whitespace characters

[edit]

Under this proposal, any other character that consists only of empty space, whether zero-width or with some width, is disallowed in the wikitext. This includes things like RTL and LTR markers, non-breaking spaces, halfwidth and fullwith spaces, and of course the plain old tab. This change, once implemented by a bot, should reduce the number of unwanted surprises with invisible characters. We could, perhaps, also introduce an edit filter that blocks any edits containing these characters, though we'd need to make an inventory of them first. —CodeCat 23:35, 16 September 2016 (UTC)[reply]

Support -- there's already a rule forbidding the tab, so it should be edited to disallow the others. --Daniel Carrero (talk) 23:38, 16 September 2016 (UTC)[reply]
What about HTML character entities? DTLHS (talk) 23:39, 16 September 2016 (UTC)[reply]
I think we can allow those, since they're visible to the editor. —CodeCat 23:42, 16 September 2016 (UTC)[reply]
Also, FWIW, what about a newline, which is considered to be a whitespace character? — justin(r)leung (t...) | c=› } 23:44, 16 September 2016 (UTC)[reply]
A good point. That one is allowed of course. Though I'm not aware of any character other than a newline that looks the same as a newline. —CodeCat 23:46, 16 September 2016 (UTC)[reply]
I want to keep fullwidth spaces for Japanese. Although, I will note that MediaWiki disallows the fullwidth space in page titles and automatically changes it to u0020. —suzukaze (tc) 00:02, 17 September 2016 (UTC)[reply]
Maybe we should allow different script-specific spaces in quotations and usage examples written in other scripts. Aside from the fullwidth space, are there other spaces like that? --Daniel Carrero (talk) 00:16, 17 September 2016 (UTC)[reply]
The whole point of the proposal was to eliminate invisible characters that people can't tell apart or reproduce. The average editor will expect that any empty space is a generic space. —CodeCat 00:29, 17 September 2016 (UTC)[reply]
We should still allow it, just only as an HTML entity or with a template. DTLHS (talk) 00:30, 17 September 2016 (UTC)[reply]
I would want to know more about the effect this would have on display of RTL scripts before supporting this. Lines with both RTL and LTR scripts can behave in very peculiar ways, and I don't want to make it worse. Chuck Entz (talk) 02:35, 17 September 2016 (UTC)[reply]
The LTR and RTL behaviour depends on control characters, mostly in Unicode category `Cf.' I believe this proposal mainly concerns whitespace, in category `Zf,' plus two pseudo-linebreaks in `Zp' and `Zl', and perhaps the control characters '\t' and '\r.' Isomorphyc (talk) 22:23, 19 September 2016 (UTC)[reply]
I intended it to include all nonprintable characters, though maybe I didn't make it clear enough. "[...] consists only of empty space, whether zero-width [...]". Control characters fit that description and indeed I would like to get rid of those, too, as they are invisible to the editor. Though of course, if it's not clear already, HTML entities for these characters are allowed by this proposal, so it's not as if we're banning them altogether, we're just banning them in their raw Unicode form because of the editing difficulties they cause.
As a side note, we have actual entries for control characters too, but they are all but inaccessible because of, predictably, technical issues. We should delete these entries. A dictionary shouldn't concern itself with encoding artefacts; you can't tell a space from a non-breaking space in a printed work, and there's no such thing as a control character in print either. —CodeCat 22:33, 19 September 2016 (UTC)[reply]
A cursory inspection confirms that we have no or very few RTL control characters, but a few thousand LTR characters which were probably superfluously copied in from various outside sources, for example in Patch and LoD. From a little bit of experimentation with Hebrew entries, I get the impression the LTR/RTL behaviour is handled below the wikitext level. My concern with HTML entities is that we don't want to degrade anyone's native wikitext typing experience by requiring native characters to be rendered either with HTML entities or inappropriate substitutes, as with CJK spaces. Since this depends on the wikitext rendering stack, I would have to do a little bit more experimentation to convince myself of this for a few other characters. But I'm we would agree about the result if this is roughly the principle you have in mind? Isomorphyc (talk) 23:20, 19 September 2016 (UTC)[reply]
  • As I understand it, the proposal still allows the use of &nbsp; in the edit box, just not an actual nonbreaking space itself. —Aɴɢʀ (talk) 08:05, 17 September 2016 (UTC)[reply]
    • In templates, if you're having trouble with a space not being shown, you should use &#32; instead of &nbsp;. The former encodes an actual space (which is what you want), the latter encodes a non-breaking space. —CodeCat 12:36, 17 September 2016 (UTC)[reply]


Sounds okay in theory, but (per Chuck) we should probably investigate existing entries containing the "forbidden" chars and see whether we are overlooking any legitimate use cases. Equinox 10:37, 17 September 2016 (UTC)[reply]
If we put in an edit filter I would worry about people trying to save a page and not being able to determine why the system says they can't (even after we bot replace existing uses, people will always try to copy and paste from other sources which will inevitably include control characters). DTLHS (talk) 17:17, 17 September 2016 (UTC)[reply]
Yes, any edit filter should only tag, not block. Google Books, for example, uses RTL/LTR marks around author names a lot, so anyone trying to helpfully add citations would be blocked. But if a bot is going to make periodic cleanup runs, even a tagging edit filter seems unnecessary. - -sche (discuss) 18:06, 17 September 2016 (UTC)[reply]
I wonder if there's any way to automatically remove or replace certain characters when the page is saved. DTLHS (talk) 18:14, 17 September 2016 (UTC)[reply]
Since the software automatically replaces e.g. "a" + "combining grave" with "à"; presumably the devs could update it to automatically replace nonstandard whitespace with a regular space (but I don't know if they would). - -sche (discuss) 03:33, 18 September 2016 (UTC)[reply]
They replace "a" + combining grave, with "à" because they are defined by Unicode as equivalent (i.e. they mean the same thing). Non-standard whitespace is not defined by Unicode as equivalent to spaces. So the devs probably would not implement this special case just for us. --WikiTiki89 09:40, 19 September 2016 (UTC)[reply]
@CodeCat: This is good, with caveats: I'm only counting about 3500 pages affected, with about 7500 characters total (given 14 `bad' whitespace characters plus \t -- you might have a wider list, as technically '\t' is a control character, not whitespace.) About 3500 of these characters are simply &nbsp literals and can be replaced. Probably `em space' (0x2003) and `thin space' (0x1100) should be done away with-- the next two largest categories. I believe the CJK `ideographic space' (0x3000) should stay. The others are not very common. If anyone is interested here is a list (I omitted user pages, talk pages, etc.) :
  • hex,char,name,count
  • 0xa0, ,NO-BREAK SPACE,3454
  • 0x2003, ,EM SPACE,1136
  • 0x2009, ,THIN SPACE,1100
  • 0x200a, ,HAIR SPACE,687
  • 0x3000, ,IDEOGRAPHIC SPACE,609
  • 0x2008, ,PUNCTUATION SPACE,282
  • 0x2002, ,EN SPACE,165
  • 0x1680, ,OGHAM SPACE MARK,45
  • 0x202f, ,NARROW NO-BREAK SPACE,40
  • 0x2028,(),LINE SEPARATOR,39
  • 0x2005, ,FOUR-PER-EM SPACE,17
  • 0x2004, ,THREE-PER-EM SPACE,4
  • 0x2007, ,FIGURE SPACE,1
Isomorphyc (talk) 21:38, 19 September 2016 (UTC)[reply]
We just need to decide which ones we want to convert to a HTML entity, and which to replace with something else like a regular space. I think the em space, thin space and hair space and most other different-width spaces can become regular spaces. The Ogham space should stay, that's actually a printable character, it represents the line on which Ogham letters are written. Also, what about zero-width characters like the LTR and RTL markers, or zero-width non-breaking spaces? —CodeCat 21:46, 19 September 2016 (UTC)[reply]
We would have to test to see that they are there accidentally. If somebody typed it with a keyboard, it should stay in most cases, I think; but if it was pasted in from somewhere, has zero width, and has no effect on the presentation, that is a good sign it should go. For what it is worth, to make this more concrete, here is an equivalent list of control characters with their counts in Wiktionary: User:Isomorphyc/Sandbox/Control Characters in Wiktionary. By inspection, it seems inappropriate to remove all of them and worthwhile to remove some. Isomorphyc (talk) 00:02, 20 September 2016 (UTC)[reply]
I think there needs to be an exception in cases where the character is part of the normal encoding of a script. The only example I can think of is Persian, where the zero-width non-joiner is used in compound words and before the plural morpheme ها () (for example: شب‌ها (šab-hâ)). It would be unfortunate to have to encode this as {{m|fa|شب&zwnj;ها|tr=šab-hâ}}. --WikiTiki89 14:01, 20 September 2016 (UTC)[reply]

Proto-Brythonic verb lemmas

[edit]

Should we reconstruct verb lemmas as absolute or conjunct 3rd person singulars, as in *ėɣɨd or *aɣ < *ageti? The former is more similar to the Proto-Celtic lemmas in form, but the latter more or less become the standard 3rd person singular in the daughter languages. Anglom (talk) 02:34, 18 September 2016 (UTC)[reply]

Or maybe the first person singular, since that's the usual lemma for Middle Welsh in academic material (notwithstanding the fact that we at Wiktionary use the verbal noun instead, a situation which I've been meaning to rectify but haven't gotten around to yet). I don't know what form is usually given as the lemma for the various stages of Breton and Cornish. —Aɴɢʀ (talk) 14:35, 18 September 2016 (UTC)[reply]
I thought about that, but the 3rd singular is usually the most commonly attested form in the earlier languages, it feels a little more justified to list them that way. Anglom (talk) 15:30, 18 September 2016 (UTC)[reply]
I favour the 3rd singular as well, though I'm not decided on absolute or conjunct. I think I'd prefer the form that descends from the Proto-Celtic lemma directly, but since we already don't do so for Old Irish, the point for Brythonic is moot. Since Proto-Brythonic and Old Irish are similar in terms of development, using the same form for them makes them easier to compare. —CodeCat 12:36, 19 September 2016 (UTC)[reply]

Separating transcription from transliteration

[edit]

We seem to be at an impasse on this issue, with discussion having died out again. Here are a few ideas to start discussion with:

  1. Why don't we have a separate pronunciation parameter? Not only could this be used for transcriptions, it would also be useful for disambiguating homographs like wind. The main drawback is that it could be overused/stuffed with information best left to pronunciation sections.
    The reason I bring this up is that our current romanization method routes everything through the |tr= parameter. For languages that have both transcription and transliteration, that leaves no way to tell which is being displayed. Having a separate parameter also makes it easier to set it up as a parallel to our current treatment of transliteration.
    1. |pr= seems the most logical name for such a parameter
    2. How would we distinguish between the two? I think we should leave transliteration as it is, and use a superscript in front for the transcription: (Transcr:fonɛtɪk spɛliŋ) (with the superscript linked to something informative)
  2. Either way, I don't think we should have language-specific special code in Module:links if we can avoid it: it's currently the seventh-most-transcluded page on Wiktionary, used by 4,889,303 pages. More importantly, it's often used dozens of times on a single page and in a few cases thousands of times. Just on general principles, the part of Module:links that's always executed should be only for things that are general in nature and can't be handled in more specialized routines. Even if the overhead is minimal, the clutter makes it harder to maintain. I can understand temporarily putting in a short-term kludge until a solution can be integrated into the regular module structure, but kludges have a way of growing as more special cases arise. They also are harder to understand/maintain: I don't think it would be obvious to most people that local phonetic_extraction = {["th"] = "Module:th"} has anything to do with transcription, and I'm not sure someone wanting to make changes related to transcription would look for the code where it is now.
    1. I think the best approach to integrating transcription would be to have a separate value for transcription modules in the Module:languages data submodules to parallel "translit_module"
      1. I propose naming it "transcr_module"
      2. I propose naming the entry-point function in these modules "pr()" to parallel the translit modules' "tr()"
      3. It would then be a simple matter of adding parallel code to what we have in module:links for transliteration

I obviously like my proposals, but feel free to tweak, rework or replace any or all of it. The only thing I ask is that we arrive at something concrete, and not more theoretical or who-did-what-and-why-I-don't-like-it talk. Thanks! Chuck Entz (talk) 02:18, 19 September 2016 (UTC)[reply]

If we lack cooperation between our Lua module editor, we'll have the situation where transliterations and transcriptions are handled by separate modules for Japanese, Chinese, Thai, Burmese, Tibetan, etc and have no integration with other main modules. Wyang's templates (linked to appropriate modules) like {{th-l}}, {{ja-r}}, {{zh-usex}} exist almost in a separate world. I'd like to be able to transliterate Thai or Japanese by passing Thai phonetic respelling/hiragana with spacing, capitalisation,e tc but also use the features common to other templates. --Anatoli T. (обсудить/вклад) 02:37, 19 September 2016 (UTC)[reply]
As stated elsewhere, I am very much in favor of this, though for a different reason. Vahagn and I had discussed how many languages with abjads or other writing systems require both a transliteration and transcription (Hittite, Old Persian, Mycenaean Greek, etc.). This would greatly reduce the amount of |tr= overloading necessary to represent these languages. —JohnC5 02:46, 19 September 2016 (UTC)[reply]
|tr= may mean either transliteration or transcription or a mixture of both. For most languages, including abjad-based, the transcription-like transliteration has been the preferred one. That is also the case for Thai but displaying the character sequence (i.e. the "real" transliteration) can still be used for various purposes.--Anatoli T. (обсудить/вклад) 02:54, 19 September 2016 (UTC)[reply]
@JohnC5 Would you mind pointing me to the discussion, or perhaps an example of the overloading scenario you have in mind? Sorry to write in such an old conversation. Thanks, Isomorphyc (talk) 02:40, 20 November 2016 (UTC)[reply]
@Isomorphyc: No problem at all! The Mycenaean under *h₁éḱwos and *(s)kleh₂w-, the Mycenaean and Old Persian under *tetḱ-, and the Hittite under *ǵónu, to name a few. If these are not sufficient, tell me. —JohnC5 02:56, 20 November 2016 (UTC)[reply]
@JohnC5: Thank you, this is perfect. Isomorphyc (talk) 11:30, 20 November 2016 (UTC)[reply]
I support this. Wyang (talk) 06:03, 19 September 2016 (UTC)[reply]
Sounds good, only I'd prefer it if we didn't bind transcription to phonetics, because for some ancient languages it would be preferable to write for example: (Sogdian) {{l|sog|૛ૣી૒ીૡ૏ો૏ૐ|tr=pš'x'rycyk}} (pašaxārēčik) without going into details of what exactly were 'a', 'ā', 'ē' or 'č'. Crom daba (talk) 08:33, 19 September 2016 (UTC)[reply]
What do you mean by "preferable". I want to know how to read/pronounce the word, so I want see "pašaxārēčik", as would be the case for Persian and other abjads. The actual string of characters can also be useful for etymologies or for people interested in learning the script.--Anatoli T. (обсудить/вклад) 08:39, 19 September 2016 (UTC)[reply]
Perhaps I wasn't clear. It is preferable to write "pašaxārēčik" rather than "pəʃɨxaret͡ʃjək" (don't quote me on this "reconstruction"). Obviously we need both transcription and transliteration (for one, because there still aren't any free fonts for Manichaean Unicode as far as I know). Crom daba (talk) 09:05, 19 September 2016 (UTC)[reply]

AWB access

[edit]

Hello. I would like to get permission to use AWB on the English Wiktionary. I will use it to update Romanian adjective templates to a new format, since it's too tedious to do manually. I've never used it before, but from what I can tell, it doesn't seem too complicated. Thank you! Redboywild (talk) 09:59, 19 September 2016 (UTC)[reply]

You look like a good candidate for AWB but unfortunately I have no idea how to give you access. Anyone know? Benwing2 (talk) 05:05, 20 September 2016 (UTC)[reply]
Never mind. All you do is edit the list on the AWB page. Done. Benwing2 (talk) 05:08, 20 September 2016 (UTC)[reply]
Thanks a lot! Redboywild (talk) 08:15, 20 September 2016 (UTC)[reply]

Statistics to guide improvements

[edit]

I've been experimenting with extracting data from a Wiktionary export (enwiktionary-20160901-pages-articles.xml). Along the way, I keep generating stats to help me get a feel for how the data is organized. Many of them seem like they would be of interest to Wiktionary staff and editors who have an eye to making improvements. So I thought I'd ask if that's correct.

Here is an example stat I generated last night. From the English set, in articles that have an =English= header and a =Noun= or other PoS header, here are all the distinct headword template names I found and their counts:

  • en-PP: 1
  • en-Proper noun: 7
  • en-abbr: 510
  • en-acronym: 67
  • en-adj: 97944
  • en-adjective: 64
  • en-adv: 16002
  • en-adverb: 18
  • en-comparative of: 1
  • en-con: 164
  • en-conj: 24
  • en-conj-simple: 36
  • en-conjunction: 13
  • en-cont: 375
  • en-contraction: 27
  • en-decades: 86
  • en-det: 72
  • en-initialism: 879
  • en-interj: 1346
  • en-interjection: 45
  • en-intj: 101
  • en-letter: 53
  • en-note-upper case letter plural with apostrophe: 2
  • en-noun: 207413
  • en-number: 39
  • en-part: 16
  • en-particle: 19
  • en-phrase: 106
  • en-plural noun: 1304
  • en-plural-noun: 6
  • en-prefix: 1012
  • en-prep: 373
  • en-prep phrase: 3
  • en-preposition: 21
  • en-pron: 315
  • en-pronoun: 65
  • en-prop: 329
  • en-proper noun: 23854
  • en-proper-noun: 176
  • en-propn: 11
  • en-punctuation mark: 2
  • en-suffix: 614
  • en-symbol: 52
  • en-usage-equal: 1
  • en-verb: 26654

As you can see, quite a bit of redundancy. And more than a few slated for deletion, like en-abr.

Some of the things I've found have motivated me to do some edits on articles with minor formatting errors. If there's interest, I'd be happy to supply more data like this. Thoughts? Jim Carnicelli (talk) 14:09, 19 September 2016 (UTC)[reply]

Are you sure about your regular expressions or other id. method? Just looking at one template, {{en-PP}}, which your listing says is used once, this special page reports that it is used on 137 pages, all but one of which is principal namespace.
I believe that the redundancy is principally attributable to redirects. eg, {{en-prop}}, {{en-proper-noun}}, {{en-propn}} are all redirects to {{en-proper noun}}.
How would we use these statistics for improvements? DCDuring TALK 14:51, 19 September 2016 (UTC)[reply]
Please bear in mind I'm new to this. I'm treading lightly because I know I'm surely missing an awful lot of context.
In generating the above list, I had already prefiltered based on the "ns" (I assume that's short for "name-space"), =English= header, and =<PoS>= header, with a finite list of the following parts of speech and pseudo-PoS: Determiner, Conjunction, Noun, Proper noun, Pronoun, Verb, Adverb, Adjective, Preposition, Interjection, Contraction, Prefix, Suffix, Affix, Particle, Numeral, Symbol, Initialism, Abbreviation, Acronym, Phrase, Prepositional phrase. So expect it to be a subset. Also, I'm using an entirely proprietary system. I'm not familiar with all the tools available within Wiktionary, so I don't expect I'll generate the same results.
Given that I've already found and corrected what I believe are minor mistakes in articles, like a missing =English= header in one case and an empty "=====" header-like line, I'm assuming there are many more such formatting errors. My goal is quite simply to help call them out in the event that others might be interested in studying and possibly correcting them. Just another set of eyes.
My personal interest in this has to do with being able to extract structured data like brief definitions and synonyms. I'm impressed so far to see that most of the term definition articles appear to follow a rigorous structure that is parsable. I'm presently focused on English single-word terms with an eye to computational linguistics tasks like part of speech tagging. I plan to make code I write freely available for creating condensed JSON-structured data. Thus far I've been able to transform Wiktionary's 4.7GB articles export into a 100MB JSON file with over 300k terms from the 4M+ articles. I'm struggling now with how best to parse out the head-word templates (e.g., "{{en-noun|-|adoxographies}}") and definition lines.
The above list is one trivial example I thought to include as an illustration. I just want to know that it's worth generating further lists. I don't want to take the time or trouble anyone if there's no interest. Jim Carnicelli (talk) 17:43, 19 September 2016 (UTC)[reply]
You may want to look into mwparserfromhell. It can simplify a lot of the work for you. —CodeCat 17:46, 19 September 2016 (UTC)[reply]
The headword template you mention has its code at Template:en-noun; unlike certain templates, this one is quite well documented. Templates change a lot (too often, really) so be prepared to revise your parser code very frequently. Equinox 17:47, 19 September 2016 (UTC)[reply]
Ah, thank you. I'll look into mwparserfromhell. Also, I have studied the Template:en-noun template. I appreciate how thoroughly it's documented. Some of the other head-word templates are a little less well documented. I was intrigued by finding several (e.g., Template:en-abbr) which are slated for deletion. I assume this means articles that use them still need cleanup. Jim Carnicelli (talk) 17:58, 19 September 2016 (UTC)[reply]
It was decided that "abbreviation", "initialism", etc. are not parts of speech, so we should use the appropriate PoS instead (e.g. TLC is a noun). Equinox 18:00, 19 September 2016 (UTC)[reply]

Proposal for bot redirects for numbers up to a million.

[edit]

Per the outcome of various recent deletion discussion relating to numbers, I propose to bot-create about four million redirects which will point otherwise non-idiomatic numbers between 101 and to 1,000,000 to Appendix:English numerals#Naming rules (short scale). The reason that this will come to about four million redirects is that I propose to redirect from:

There are other possible variations:

Basically, I'd like to have a bot redirect all commonly used ways of making all possible non-idiomatic number combinations up to one million. However, in saying this out loud, it sounds pretty crazy. Is this a bad idea? I want people who look up numbers to be taken somewhere for their trouble. bd2412 T 15:22, 19 September 2016 (UTC)[reply]

Maybe we could edit {{didyoumean}} to cause any absent page, whose title is a number, to redirect to the appendix? Like Amazing (redlink) redirects to amazing. --Daniel Carrero (talk) 15:32, 19 September 2016 (UTC)[reply]
In general, I support your approach, but I would not overload the template with this. There are other really cryptic SoPs like (S01E01) to handle... --Giorgi Eufshi (talk) 06:55, 20 September 2016 (UTC)[reply]
Strong oppose. In fact many of the higher numbers will be unattestable. But more importantly, I don't see any reason why numbers are more special than other SOP combinations. --WikiTiki89 15:37, 19 September 2016 (UTC)[reply]
Re: attestability, try picking any number at random between 101 and 1,000,000 and do a Google Books search for it. You'll be amazed at how many random references you will find to "437,214 cubic yards of material" or 808,777 hogs having been infected with a disease, or an "increase in book value of ledger assets 279,361". I can virtually guarantee that every single number up to a million (and probably for a good way up from it) is attested in some ledger, census, valuation, report, or record. bd2412 T 15:55, 19 September 2016 (UTC)[reply]
I would argue that all numbers up from 10 are SOP. 4, for example, is defined as "The cardinal number four." but really it means "A digit used to form numbers, whose value is four × 10ⁿ, where n is the digit placement counted from the right. In 432, 4 means four hundred. (don't get me started on real numbers and non-decimal bases)" --Daniel Carrero (talk) 16:02, 19 September 2016 (UTC)[reply]
@BD2412: Regarding attestability, I was referring mainly to the spelled-out forms. --WikiTiki89 16:49, 19 September 2016 (UTC)[reply]
And just to give you an idea of how crazy this idea is, we currently have 440,889 English lemma entries, and you're proposing to create 4,000,000 redirects to one appendix page. --WikiTiki89 17:46, 19 September 2016 (UTC)[reply]
The practical drawbacks to this seem larger than the small-to-nonexistent benefits, to me. Who is going to fail to know what 347654 means, but (be able to input it, and) be helped by an appendix? Who is going to fail to know what "four hundred seventy-two thousand, five hundred fifteen" means, but think to look up that whole string rather than the parts? If all numbers are bluelinks, it will drown any effort to see if e.g. a certain number happens to have an entry (due to being idiomatic), a slight drawback, but compared to a slight-to-nonexistent benefit. Having every possible number in this range, including e.g. strings identical to phone numbers, be bluelinks (which, when, edited by someone after the bot, won't show up in a noticeable place like Special:NewPages) also seems like an invitation to easy-to-miss vandalism. And as Wikitiki says, why are these more deserving/needing of entries than other SOP but (or and, or or) "regularly formable" strings? - -sche (discuss) 16:42, 19 September 2016 (UTC)[reply]
I also generally oppose doing this via redirects, however if we did something to affect search results which had the same effect I think it might be of use. - TheDaveRoss 16:51, 19 September 2016 (UTC)[reply]
If the search results can be tweaked to this effect, that would be a fine solution. bd2412 T 17:12, 19 September 2016 (UTC)[reply]
There's another practical problem here. 415 is four hundred and fifteen in the UK and four hundred fifteen in the US and it can't simultaneously redirect to both. But in general, the proposal has no merit because it proposes making redirects for things that aren't words in any language. I have plenty more specific objections, but I think that one alone is enough. Renard Migrant (talk) 17:59, 19 September 2016 (UTC)[reply]
AFAICT the proposal is to redirect all of "415", "four hundred and fifteen", and "four hundred fifteen" to the same appendix, which is technically doable, but I tend to agree it's not desirable. - -sche (discuss) 18:23, 19 September 2016 (UTC)[reply]
Not to mention that 415 is not only a number in English, but also in practically every other language, so it doesn't make much sense to redirect it to the appendix page on English numerals. --WikiTiki89 18:58, 19 September 2016 (UTC)[reply]
Strongly oppose creating the "wordy" ones like three hundred and sixty-seven. Frankly I think the numeric ones would be pretty dumb too but that is more arguable. Equinox 18:32, 19 September 2016 (UTC)[reply]
  • A significant benefit would be that we could dramatically reduce the number of times a new contributor tries to add full entries for the terms. And we might be able to reduce the number of discussions of some of the inane matters relating to numbers that appear in some of our discussion pages.
  • Could we accomplish the goal of directing users to appendices by some other means?
As I understand it, we could accomplish the entry-prevention goal by protecting the pages for which we think we don't want entries or, perhaps, by an edit filter. DCDuring TALK 19:06, 19 September 2016 (UTC)[reply]
Oppose; just feels totally wrong, and will add tons and tons of unnecessary entries. I think we should have numbers up through 100, plus 200, 300, ... 900, plus powers of ten above that; partly I want these entries for translation purposes, since many languages have non-SOP ways of expressing them. (Plus any non-SOP numbers of course -- 101, 411, etc.) Benwing2 (talk) 04:58, 20 September 2016 (UTC)[reply]
I agree with respect to the numbers we should have. The question is what to do about numbers we shouldn't have, but which readers may for whatever reason either look for anyway, or try to create anyway. bd2412 T 13:01, 22 September 2016 (UTC)[reply]
  • I think you mean hard redirects, the things using #REDIRECT syntax. I am not very excited even about hard directs. Does anyone collect statistics about the number of page accesses of non-existent entries? This could give us an idea of whether to perform the redirects at least for 1 to 10,000, or the like. Since the would be #REDIRECT things, they would not show up in the number of entries, I figure. --Dan Polansky (talk) 17:14, 8 October 2016 (UTC)[reply]

Declension tables versus usage notes

[edit]

I'm wondering how to treat a certain phænomenon. If certain grammatical forms replace other forms, or create new ones, should that be put into the declension table or the usage notes?
Examples: German subjunctive forms are now used as imperative forms, for phrases like "let's go". And most importantly for me: Low German optative forms replace, piece by piece, Low German preterite forms in the course of 400 years. So should I add the optative forms as alternative forms into the declension tables or make a note about this as usage notes? Korn [kʰũːɘ̃n] (talk) 12:16, 20 September 2016 (UTC)[reply]

If it's something that applies to all or most verbs across the board, then it shouldn't be in a usage note as the usage note would have to appear on every single verb entry. Maybe there could be a footnote within the inflection table itself saying something like "Increasingly used as the preterite" or whatever. —Aɴɢʀ (talk) 12:41, 20 September 2016 (UTC)[reply]
Sorry, yes, when I say usage note, I do mean one in the table. Cf. vri. Korn [kʰũːɘ̃n] (talk) 12:53, 20 September 2016 (UTC)[reply]
I think that's fine, especially for a historical language. For a modern language we might not want to list all obsolete forms in inflection tables. (Though TBH I do have a tendency to put obsolete inflected forms in Irish declension tables, so maybe I'm being hypocritical.) —Aɴɢʀ (talk) 15:12, 20 September 2016 (UTC)[reply]

Proto-Nostratic

[edit]

@Angr, Chuck Entz, Anglom, JohnC5, CodeCat, Wikitiki89 Should we include some Proto-Nostratic words? If so, how would they be organised? We obviously can't put them as ancestors to PIE and Native American words without extensive proof~they're linked, of which there is little...? Some words could definitely be linked though, like PIE heu and Native American iw, both originating from a common ancestor (PN?). UtherPendrogn (talk) 20:10, 20 September 2016 (UTC) https://en.wiktionary.org/wiki/User:UtherPendrogn/k%CA%BCo An example of a word. UtherPendrogn (talk) 20:18, 20 September 2016 (UTC)[reply]

Nostratic is silly, founded on extremely poor data and poorer assumptions, and flies in the face of what rigour historical linguistics may claim. If there is sufficient reason to compare a form of unclear etymology with one in another language with no sure relationship, that is acceptable, but by no means should Nostratic "terms" be linked to or given serious consideration. —Μετάknowledgediscuss/deeds 20:19, 20 September 2016 (UTC)[reply]
Is there a better accepted ancestor to PIE? UtherPendrogn (talk) 20:22, 20 September 2016 (UTC)[reply]
Not really. Pre-PIE features are postulated based on internal reconstruction, but there's no higher node phylogenetically that has acceptance in academic linguistics. —Μετάknowledgediscuss/deeds 21:25, 20 September 2016 (UTC)[reply]
Does this mean I should stop making Sino-Caucasian entries? Crom daba (talk) 23:46, 20 September 2016 (UTC)[reply]
In my opinion, yes. Even if that's phylogenetically valid (which I doubt), it can't really be reconstructed to the standards expected by most historical linguists. —Μετάknowledgediscuss/deeds 00:23, 21 September 2016 (UTC)[reply]
What do you mean by "Native American iw"? Are you referring to Amerindian? Having worked a little with Uto-Aztecan and Yuman, I'm more than a little skeptical about that. There are former American Indian phyla such as Hokan and Penutian that have been mostly abandoned for lack of evidence (though there's evidence for some of the subdivisions)- the trend seems to be going away from unification rather than toward it (except for Dene-Yeniseian). As for Nostratic itself, everyone who believes in it seems to have a different combination of constituent families. Chuck Entz (talk) 03:36, 21 September 2016 (UTC)[reply]
Yeah, I've often wondered about Dene-Yeniseian. I had to read Vajda's paper in college and found it very convincing. Also, I believe there was recently a paper showing genetic evidence that the two peoples spent a significant period in the Bering Strait before splitting East and West. I remember that from a discussion with some professors I met from Diné College who also recalled a time when a Ket speaker came to the Navaho nation and discussed apparent cognate words in the two languages. But then again, all of this still remains too circumstantial. —JohnC5
I'm friends with an Athabaskanist who told me all the Athabaskanists she knows are pretty much convinced by Dene-Yeniseian. But it's definitely the exception rather than the rule for new suggestions of high-level groupings to be accepted by the wider linguistics community. —Aɴɢʀ (talk) 12:13, 21 September 2016 (UTC)[reply]
Do we have any Athabaskanists working on here? If any trusted Athabaskanist wanted to begin adding PDY forms, I'd be prepared to make a code for it and point PY and PND at it. —JohnC5 14:21, 21 September 2016 (UTC)[reply]
As I recall, some earlier BP discussions settled on basically the following rules of thumb for forms in Nostratic etc. macrolanguages:
  • the comparisons themselves can be mentioned in etymology appendices for PIE etc., if properly cited;
  • they cannot be created as their own reconstruction entries, with the special exception of Proto-Altaic;
  • they cannot be mentioned in mainspace entries.
I would support a compact appendix (or set of appendices) that listed the members of alleged Nostratic etymological groups together with reconstructions used by different authors, though (as said, no two groups of Nostraticists substantially agree on anything, so e.g. Illich-Svitych's Nostratic ≠ Dolgopolsky's Nostratic ≠ Bomhard's Nostratic). For that matter, I would even support proto-entries as soon as you can provide two unconnected sources (not e.g. from one scholar + one of his students) who can both agree on what the term's descendants are and what its reconstruction should be ;)
Re OP though, nobody considers Amerind to be "Nostratic". "Amerind" itself is a hypothetical macrofamily of a similar size as Nostratic; what you'd use to link them is "Borean" or perhaps "Proto-World" (the likes of which should probably be banned entirely from Wiktionary, being another order of magnitude more speculative than the likes of Nostratic or Amerind or Sino-Caucasian). --Tropylium (talk) 20:42, 22 September 2016 (UTC)[reply]
  • I recall reading that the emerging evidence from archaeology is painting a picture of multiple waves of migration from the Old World to the New over the span of thousands (tens of thousands?) of years, which would seem to make any such "Amerindian" family quite moot. ‑‑ Eiríkr Útlendi │Tala við mig 21:28, 22 September 2016 (UTC)[reply]

I plan to clean house in WT:RFD.

[edit]

There are a number of months-old RFDs that have received little or no discussion. I'm giving fair warning that I plan to close all of these as no consensus in the next few days, unless an actual consensus develops quickly. Cheers! bd2412 T 13:04, 22 September 2016 (UTC)[reply]

The default in RFD is no objection, since the proposer themselves is generally in favour of the deletion. With no response, that's 100% in favour, therefore delete. —CodeCat 13:32, 22 September 2016 (UTC)[reply]
It's not that straightforward in the first half-dozen discussions. They have at least one half-hearted objection to deletion. What then? bd2412 T 13:43, 22 September 2016 (UTC)[reply]
Then it's no consensus. —CodeCat 14:16, 22 September 2016 (UTC)[reply]
Which causes the entry to be kept, I believe. --Daniel Carrero (talk) 14:19, 22 September 2016 (UTC)[reply]
That means an erroneous entry might be kept by virtue of sufficiently great user apathy towards the topic. If we turn it around, correct entries might go for the same reason. We don't have a better alternative, huh? Jury duty or something. Korn [kʰũːɘ̃n] (talk) 15:20, 22 September 2016 (UTC)[reply]
I've done my jury duty. --WikiTiki89 15:35, 22 September 2016 (UTC)[reply]
Inspiring. (Not sarcasm.) Maybe we can have a (collapsed or optional or something) list of RFDs/RFVs without any replies (= With only one signature.) in the watchlists? Like we have with the votes. Korn [kʰũːɘ̃n] (talk) 16:33, 22 September 2016 (UTC)[reply]
Not all my votes were to delete. I hope I didn't accidentally vote twice. DCDuring TALK 18:06, 22 September 2016 (UTC)[reply]
I don't there's a problem with erroneous entries being kept, as they can just be sent to RFV, where the default is to delete. And I think it's better to err on the side of keeping an SOP entry when in doubt, than to delete it just because not enough people care. Andrew Sheedy (talk) 21:41, 23 September 2016 (UTC)[reply]
RFD is for redundant entries, not for erroneous, for which we have RFV. So echoing Andrew Sheedy. --Dan Polansky (talk) 17:09, 8 October 2016 (UTC)[reply]
Not really: no objection and the only poster the nominator => no consensus since one person does not consensus make; that is my position. --Dan Polansky (talk) 16:56, 8 October 2016 (UTC)[reply]

User:Embryomystic form-of edits

[edit]

Embryomystic has been fiddling around with form-of entries for a while now. Some of it is ok, but they've also replaced the perfectly-valid {{plural of}} with {{inflection of}} just for the sake of it. Now, they have their eyes set on Spanish and Portugese, and seem to replacing {{masculine plural of}} and similar generic templates with some language-specific templates that do the same thing. I objected to this but was ignored, so I'm bringing it to wider attention here. Generic templates should always be used if possible, and replacing them with custom templates for no reason is pointless. —CodeCat 19:44, 22 September 2016 (UTC)[reply]

You were not ignored, just disagreed with. I didn't create the Portuguese templates, but I find them useful, and I've been adding them to Portuguese adjective form entries that don't have them, and just recently I created parallel Spanish and Italian templates. I realise now that when I started doing something similar with Catalan that I was stepping on your toes, and I didn't object to you reverting Catalan entries, as you yourself had made similar templates for Catalan, but I don't really see why there's a problem with adjective forms being sorted into relevant subcategories as the Portuguese ones have been for some time now. embryomystic (talk) 19:50, 22 September 2016 (UTC)[reply]
Subcategorising non-lemma forms is mostly a pointless exercise that nobody benefits from, and therefore it's not worth the increased complication introduced by not using generic templates. —CodeCat 19:57, 22 September 2016 (UTC)[reply]
By this logic, should we delete Category:English adjective comparative forms? --Daniel Carrero (talk) 20:02, 22 September 2016 (UTC)[reply]
I wouldn't oppose it, unless someone can come up with a real use case. To me, this is no different from categorising Latin verb forms as "1st person forms", "singular forms", "indicative forms", "active forms" and so on. Categorising for the sake of it, not because anyone is ever going to have a use for it. Subcategorising lemmas is useful, but non-lemmas not really. —CodeCat 20:08, 22 September 2016 (UTC)[reply]
As someone whose browsing as a user trying to find words was more than once hindered by non-exhaustive categorisation, I'm leaning towards too much rather than too little. Korn [kʰũːɘ̃n] (talk) 20:36, 22 September 2016 (UTC)[reply]

Let's get rid of the "Quotations" header

[edit]

Wiktionary:Quotations says "Longer lists of quotations may find a more appropriate place in a separate section, as they would hamper readability for people only interested in the definitions." In this case, I think that the quotation really belongs on a separate citations page. The point of citations pages is to avoid cluttering up the entry with information that is not directly relevant to the words and definitions, but may still be useful for some readers (and for WT:CFI). So I propose abolishing this practice/header altogether, and moving its contents to the citations page. —CodeCat 21:37, 22 September 2016 (UTC)[reply]

I think I would support that. Equinox 21:39, 22 September 2016 (UTC)[reply]
I support removing the "Quotations" header, and adding {{seemoreCites}} in individual senses. This past vote might be relevant: Wiktionary:Votes/2016-02/Removing "Quotations". --Daniel Carrero (talk) 21:47, 22 September 2016 (UTC)[reply]
I also support. I don't think it is used terribly often as it is. - TheDaveRoss 21:56, 22 September 2016 (UTC)[reply]
Support wholeheartedly. The quotations sections are little more than clutter. Andrew Sheedy (talk) 21:57, 22 September 2016 (UTC)[reply]
Support. I've always found it weird that this header was even there, and it's annoying to see it on random entries. PseudoSkull (talk) 22:03, 22 September 2016 (UTC)[reply]
Oppose using citations page to hold citations that could easily go under the definition line. DTLHS (talk) 00:29, 23 September 2016 (UTC)[reply]
I always thought that the quotations used with definitions were just a selection of all the citations found on the citation page. That is, that one is a subset of the other. —CodeCat 01:09, 23 September 2016 (UTC)[reply]
I don't know what other people think citations pages are for. In my mind it is for quotations of as yet to be defined terms and for senses that are being researched, and the contents should be moved to the main entry if it is possible. DTLHS (talk) 01:15, 23 September 2016 (UTC)[reply]
I figured citation pages were just for collecting all the citations, the more the better? —CodeCat 01:18, 23 September 2016 (UTC)[reply]
Like I said, this might just be me. I would be interested to know what other editors think citation pages should be used for. DTLHS (talk) 01:19, 23 September 2016 (UTC)[reply]
My opinion is this:
  • The Citations: page should be used to collect an indefinite number of citations, the more the better. Getting citations from the internet is okay too, if the sense is already attestable through durably archived media such as Google Books.
  • Each sense should have only a small number of citations in the main page, which should preferably be representative and unambiguous, concerning that particular sense.
  • It would be nice if the citations in the entry were always a subset of the citations in the Citations: page, if the Citations: page is a big one. It is normal to add a new citation in an entry without copying it in the Citations: page, and I'm okay with that if there are only one or a few quotations.
  • The "Quotations" section in entries seems to be useless. If it is used simply to point to the Citations: page, the link could be added below each sense, when applicable. If it contains one or more quotations where it is unclear to what sense they belong, they can't be "representative and unambiguous" as suggested above and should be in the Citations: page until we figure out what to do with them.
  • Usage examples and quotations complement each other, so I oppose if people remove usexes just because the entry/sense has quotations.
--Daniel Carrero (talk) 01:34, 23 September 2016 (UTC)[reply]
I agree that usexes and quotations are complementary. I think Wiktionnaire does an exceptional job of illustrating definitions with a balance of both (relatively speaking, we are rather lacking in this area). Andrew Sheedy (talk) 01:38, 23 September 2016 (UTC)[reply]
I think of quotations in entries as just a special kind of usex: a usex that's attested and cited from another work. They're meant to illustrate the use of the word in that meaning, using an example from "out in the world" instead of something we made up. I don't think one should be favoured over the other, we should simply pick what works best in the particular situation. If none of the cites illustrate the use particularly well, a made-up example would do better. —CodeCat 17:54, 23 September 2016 (UTC)[reply]
I've always understood citations pages to be used the way CodeCat describes, in addition to hosting citations for senses that have yet to be added (or where the intended sense is unclear). I think they should eventually hold as many citations as is practical, to demonstrate as wide a range of use as possible (including various time periods, regions, registers, and genres). Andrew Sheedy (talk) 01:36, 23 September 2016 (UTC)[reply]
Quotations For what it's worth, that's how I think of Citations as well: that namespace has a large chronology of uses from which we pick a handful of particularly illustrative ones to show in the definition in the main namespace. —Justin (koavf)TCM 02:58, 23 September 2016 (UTC)[reply]
Question. The title of the thread is "Let's get rid of the 'Quotations' header", but could anyone explain for me exactly what the "Quotations header" is and where it appears? It seems from some comments that it is not the "[quotations ▼]" link that is seen next to some definitions, but if not that then what? Mihia (talk) 17:45, 23 September 2016 (UTC)[reply]
@Mihia: The entry abyss has a "Quotations" section. It contains the text:
It is a section, like "English", "Noun", "Etymology", "Pronunciation", etc. --Daniel Carrero (talk) 17:51, 23 September 2016 (UTC)[reply]
I see, thanks. My comment then would be that if the "Quotations" section was not there to provide a link to the "Citations" page, then probably many people would not notice that the Citations page existed. However, I don't know on what basis you would put actual quotations in that section rather than using the inline "[quotations ▼]" method. Mihia (talk) 19:27, 23 September 2016 (UTC)[reply]
There are places where the section contains quotes as well, see halcyon. - TheDaveRoss 14:42, 28 September 2016 (UTC)[reply]
So on what basis is it decided whether to put quotations below the definition ("[quotations ▼]"), or in a "Quotations" section, or on a separate "Citations" tab? Mihia (talk) 20:05, 28 September 2016 (UTC)[reply]
Support. - -sche (discuss) 18:58, 23 September 2016 (UTC)[reply]
Support, and use the citations namespace for citations where meaning is not clear or is for a definition we don't have yet. Wherever possible, citations go under the sense they are supporting. Renard Migrant (talk) 16:52, 24 September 2016 (UTC)[reply]
Oppose using a Beer parlour discussion for something that did not make it in a fairly recent vote: Wiktionary:Votes/2016-02/Removing "Quotations". It looks like an unintentional forum shopping. --Dan Polansky (talk) 16:48, 8 October 2016 (UTC)[reply]

Vote about not nesting headings inside stuff

[edit]

Based on Wiktionary:Beer parlour/2016/August#Proposed addition to WT:NORM: headers cannot be nested inside things, I created Wiktionary:Votes/pl-2016-09/No headings nested inside templates or tags. --Daniel Carrero (talk) 04:01, 23 September 2016 (UTC)[reply]

IMO your vote is not well-phrased. What you want to disallow is something like this (i.e. where the header is surrounded by newlines):
{{foo|bar=
==English==
}}
Something like this: {{foo|==English==}} where there aren't any newlines isn't such a problem, and might conceivably actually occur. (In general, embedded newlines in templates cause lots of parsing problems, even using mwparserfromhell.) Benwing2 (talk) 05:22, 23 September 2016 (UTC)[reply]
@CodeCat, do you wish to comment here? This was her idea. In any event, I hope people don't start using {{foo|==English==}} without discussion, it seems weird and without precedent. Is there any possible use for this, even a hypothetical one? --Daniel Carrero (talk) 18:37, 23 September 2016 (UTC)[reply]
I recall we do it on talk pages. But this proposal is for entry space only (I don't say mainspace because entry space includes Reconstruction: too), so it doesn't matter. —CodeCat 18:54, 23 September 2016 (UTC)[reply]
Just in case, I added a note in the vote to remind people that the proposal only affects entries. --Daniel Carrero (talk) 18:57, 23 September 2016 (UTC)[reply]

Proposal for an extension to a few different page creation templates in the case of the search query containing more than one word

[edit]

I propose a new addition to one search query template and one creation template. Now, honestly, I'm not really sure how someone would do this, but I'm sure that you LUA experts out there on this site probably could have some idea.

The rationale of both proposals is that if we do this, I believe the amount of (especially new) users creating SOP entries ignorantly of WT:CFI may decrease. I know that a lot of you may be thinking "Oh, well we already linked to CFI so I'm assuming that the creators of every entry are going to sit there and read that entire page to find the part about SOP (and fully understand it)." Let's face it; people don't read terms of service, etc., pages all that often, especially not fully. People are eager to go ahead and start creating entries. So we should at least include a little more right in front of their faces. And almost the entire WT:RFD page is dedicated to finding out whether or whether not a multiple-worded entry is SOP, so perhaps we really should include something that mentions SOPs in these two templates.

I can't find the actual templates themselves on here after I've searched, so fill me in on their titles please.

The extra texts will not appear if the query does not have 2 words or more. PseudoSkull (talk) 16:23, 24 September 2016 (UTC)[reply]

Proposed text 1

[edit]

For large airplane:

"Wiktionary does not yet have an entry for large airplane.

  • You may Create this entry or add a request for it.
  • You can also look for pages within Wiktionary linking to this entry. This may help if, for example, large airplane is an inflected form of another word; although Wiktionary does not have the entry for large airplane, the base form may be listed as linking to this entry.
  • If you think this may be a misspelling, try browsing through our indices (e.g., the index of English words) for the correct spelling.
  • Perhaps there is a page large airplane in our sister encyclopedia project, Wikipedia.

Try searching Wiktionary:

  • If you have created this page in the past few minutes and it has not yet appeared, it may not be visible due to a delay in updating the database. Try refreshing the page, otherwise please wait and check again later before attempting to recreate the page.
  • If you created a page under this title previously, it may have been deleted. Check for large airplane in the deletion log. Alternately, check here.
  • Please also check large and airplane separately, as the definitions of those terms may collectively give you the meaning of large airplane."

For big strong girl:

"[...]

  • Please also check big, strong, and girl separately, as the definitions of those terms may collectively give you the meaning of big strong girl."

For five-edged:

"[...]

Comments

[edit]

Proposed text 2

[edit]

For large airplane: "Wiktionary does not yet have an entry for large airplane.

  • To start the entry, type in the box below and click "Save page". Your changes will be visible immediately.
  • If you are not sure how to format a new entry from scratch, you can use the preload templates to help you get started.
  • If you are new to Wiktionary, please see Help:Starting a new page, or use the sandbox for experiments. Also make sure your entry meets our criteria for inclusion. Especially check to make sure that the definition of large airplane does not equal the sum of the definitions of large and airplane. "

For big strong girl: "[...]

  • If you are new to Wiktionary, please see Help:Starting a new page, or use the sandbox for experiments. Also make sure your entry meets our criteria for inclusion. Especially check to make sure that the definition of big strong girl does not equal the sum of the definitions of big, strong, and girl. "

For five-edged: "[...]

  • If you are new to Wiktionary, please see Help:Starting a new page, or use the sandbox for experiments. Also make sure your entry meets our criteria for inclusion. Especially check to make sure that the definition of five-edged does not equal the sum of the definitions of five and edged. " PseudoSkull (talk) 16:23, 24 September 2016 (UTC)[reply]

Comments

[edit]

General comments

[edit]

General comments about both proposals as a whole should go here. I wanted to bring it up here before possibly starting votes, especially since someone might have better wording for the new additions than I did and might want to reword. PseudoSkull (talk) 16:23, 24 September 2016 (UTC)[reply]

If it's possible to have the "no entry" page point users to the entries on the individual components of multi-word strings, then doing so is a good idea. But I don't know if it's possible without adding some javascript, which however we could probably do (I think some javascript is what adds a link to the search-results page when you search for a term and it's present as a translation of something). - -sche (discuss) 22:17, 26 September 2016 (UTC)[reply]
I am reluctant to implement this kind of change unless we have real evidence that it will make a difference. You have said "I believe [it] will decrease". How do you know? AFAIK, the issue is that a lot of anonymous users just don't read anything before they type. Changing the warning text won't change their behaviour. Equinox 22:43, 26 September 2016 (UTC)[reply]
I would also add that most people who create SoP entries don't do so out of what you call "ignorance", but because they disagree with policy and believe those entries should genuinely exist. Equinox 22:45, 26 September 2016 (UTC)[reply]

Not seeing See also

[edit]

Searching for "Gamergate" I typed gamergate into the Wiktionary search box. I clicked on the first entry and then scrolled down to the definition section. It did not refer to Gamergate, so I added a separate definition for that term. I was quickly reverted. Why? Because there is already a separate Gamergate page. I'm guessing my behavior is not uncommon for most casual users.

Has the project given any thought to either - (a) putting the "see also" information in the appropriate definition section (particularly helpful if there are multiple language definitions) rater than at the top of the page or (b) combining lower case and capitalized versions of words into one article? Butwhatdoiknow (talk) 13:17, 25 September 2016 (UTC)[reply]

Generally, I tend to assume that readers can see things that are at the very top of the page. That said, there is a solution to this, which I have implemented by adding a ===See also=== section pointing to the same link. By the way, I saw the definition you tried to add, and it was pretty clearly biased. That's not acceptable on Wiktionary regardless. —Μετάknowledgediscuss/deeds 16:57, 25 September 2016 (UTC)[reply]
Μετάknowledge - First, thank you kindly for making the change.
Second, I ask that you reconsider your assumption that many casual readers, focusing on the definition section, will not lose sight of everything else, including something at the beginning of the entry - particularly when they arrive at a page for the exact word they are looking up (or so they would assume, not noticing the capital/lower case difference). If you do so then I further request that you consider working to make it standard practice to do what you did for gamergate in all cases where there are separate capital/lower case pages.
Finally, I ask that you keep Hanlon's razor in mind when you consider whether a proposed definition is biased. In my case I tried in good faith to fit the opening paragraphs of the Wikipedia article into a single sentence. You evidently concluded that I failed in this attempt. But that is no reason to go immediately into chastisement mode. Butwhatdoiknow (talk) 00:35, 26 September 2016 (UTC)[reply]

In the past, I already proposed and then performed a bot run to do a replacement where {{etyl}} had "-" as the the second parameter. I'd now like to do the same, but more generally with all instances of {{etyl}}, replacing with either {{cog}} or {{der}} depending on the second parameter. This doesn't add or remove any information, as "der" adds to the same categories as "etyl". However, it does make things a lot easier for future editors who want to replace "der" with "bor" or "inh" as appropriate, because then it's a matter of changing the three letters of the template name. —CodeCat 13:23, 25 September 2016 (UTC)[reply]

Support. --Daniel Carrero (talk) 13:49, 25 September 2016 (UTC)[reply]
  • What will you do about things like From {{etyl|de|hu}} thieves' argot {{m|de|Fühbar}}.? —Μετάknowledgediscuss/deeds 16:51, 25 September 2016 (UTC)[reply]
    • Nothing. —CodeCat 17:23, 25 September 2016 (UTC)[reply]
      • I dunno, I've been using CAT:etyl cleanup as a way of finding terms for which a decision needs to be made whether they're inheritances or not. I've been working on the assumption that if an entry uses {{der}} it means someone has deliberately made the decision not to use {{inh}}, but that if an entry uses {{etyl}} it probably means no one stopped to think about the difference. But if your bot empties the Etyl cleanup categories automatically, then I'll have no way of knowing which entries have already been thought about and which haven't. —Aɴɢʀ (talk) 19:18, 25 September 2016 (UTC)[reply]
        • You can use the derivation categories. Granted, they won't be emptied out, but if you go through them systematically in alphabetical order, you'll eventually cover them all. —CodeCat 19:49, 25 September 2016 (UTC)[reply]
          • But the derivation categories include everything using {{der}}, regardless of whether a human editor deliberately used {{der}} instead of {{inh}} or a bot automatically used {{der}} without considering {{inh}}. The derivation categories will be far too big for me (or anyone else, probably) to feel any motivation to work through them. As a result, inheritances will stay in the derivation categories indefinitely, thus rendering completely useless the distinction we only fairly recently decided to make between inherited and noninherited terms. —Aɴɢʀ (talk) 20:22, 25 September 2016 (UTC)[reply]
  • Oppose this automatic change -- I also agree with Angr here. Any instance of {{etyl}} + {{m}} is pretty clearly the old format, and can thus be easily identified as an entry that needs conversion. Meanwhile, any instance of {{der}} is impossible to distinguish from an intentional use of {{der}}, and thus cannot be easily identified for any further processing.
FWIW, I often come across JA entries where we used the {{etyl}} + {{m}} templating in the past, because that's what we had, and now we need to use {{bor}} as the term is clearly a borrowing (such as スプーン (supūn, spoon) which has already been converted, or タオル (taoru, towel) which hasn't yet). ‑‑ Eiríkr Útlendi │Tala við mig 21:43, 25 September 2016 (UTC)[reply]
Support, done in the right way of course. No rush, better have to a separate {{etyl}} and {{m}} than a broken entry. Renard Migrant (talk) 17:51, 26 September 2016 (UTC)[reply]
Even if the aim is to get rid of {{etyl}}, it seems we could indeed use something like Benwing2's {{ader}}: this would allow non-etymologically focused editors to add etymologies without having to research if they are "borrowings" or "derivatives" or what. --Tropylium (talk) 20:16, 28 September 2016 (UTC)[reply]

"In other projects" in the sidebar"

[edit]

Requested feedback Entries such as Wikipedia have "in other projects" in the sidebar and link to Wikipedia articles on a topic. For some reason, this entry only links Danish, Dutch, English, and German articles. Why? There are definitely articles on Wikipedia in other language editions of the encyclopedia. For that matter, there is material on Wikipedia on (e.g.) Commons. Why are these languages displayed? If the thinking is that these are all Germanic languages, then why not Scots (which is mutually intelligible)? Can someone explain this to me or direct me to policy discussion about it? —Justin (koavf)TCM 01:34, 26 September 2016 (UTC)[reply]

The {{wikipedia|lang=xx}} template is what puts them there. --WikiTiki89 14:48, 26 September 2016 (UTC)[reply]
@Wikitiki89: Excellent. Are there any best practices about this? E.g. should we have one for c: as well? —Justin (koavf)TCM 19:26, 26 September 2016 (UTC)[reply]
I don't see why you would want to link to Commons- just include the image or audio file on the page. DTLHS (talk) 02:50, 27 September 2016 (UTC)[reply]
To play the devil's advocate, consider this argument in favour: the entry foot shows one photo of one foot (which seems like just the right number to me), but what if you wanted to see more photos of feet? A link to Commons helps with that. —Μετάknowledgediscuss/deeds 03:11, 27 September 2016 (UTC)[reply]
@DTLHS: To encourage users to edit cross-wiki and to spread more knowledge. Why would we link to an encyclopedia? —Justin (koavf)TCM 03:12, 27 September 2016 (UTC)[reply]
Are you talking about doing it automatically whenever something is transcluded from Commons? Because I think we would need a Mediawiki extension for that- not something we could do ourselves. Otherwise you can just use {{commons}} and {{PL:commons}}. DTLHS (talk) 03:16, 27 September 2016 (UTC)[reply]
@DTLHS: Well, that is actually a good question, since having inline templates seems to make the sidebar links redundant. Why do we have both? —Justin (koavf)TCM 05:24, 27 September 2016 (UTC)[reply]

{{bor}} and {{inh}} should also categorize into "Foo terms derived from Bar"

[edit]

Using e.g. {{bor|fr|en|foo}} puts the term into CAT:French terms borrowed from English but not CAT:French terms derived from English. This seems transparently wrong. I will change this myself unless there is really strong objection. Benwing2 (talk) 02:48, 27 September 2016 (UTC)[reply]

But Category:French terms borrowed from English is a subcategory of Category:French terms derived from English. --Daniel Carrero (talk) 02:50, 27 September 2016 (UTC)[reply]
Hmm. This is true but hardly obvious. I'm an experienced Wiktionarian and didn't notice this. What I did notice is that Category:French terms borrowed from English and Category:French terms derived from English are in quite different parts of the tree. It still seems very wrong to me that changing from {{der}} to {{bor}} removes a term from Category:French terms derived from English, because the term is still derived from English. Benwing2 (talk) 02:56, 27 September 2016 (UTC)[reply]
Well, we do categorize English nouns in both Category:English nouns (which is more specific) and Category:English lemmas (which is more generic). I'm not sure if I agree with your proposal, because I was under the impression that it's clear enough that "borrowed from English" is a subset of "derived from English". Maybe if more people want that, it wouldn't be harmful... It would occupy more space on the list of categories of an entry, but it could also help navigation. --Daniel Carrero (talk) 03:05, 27 September 2016 (UTC)[reply]
First of all I do think it's a problem that dog isn't in those categories. But secondly, it's not a useful distinction the way you think it is because {{etyl}} categorizes into the "derived by" category which isn't necessarily a non-borrowing. Benwing2 (talk) 13:32, 27 September 2016 (UTC)[reply]
This is a question that has a risk of turning into quite the slippery slope. Suppose that a user does not know or care about the differences between the individual Slavic languages: should we do them the favor of additionally duplicating the contents of categories like Category:English terms borrowed from Polish and Category:English terms derived from Old Church Slavonic in it?
Or suppose that a user does not care for a language distinction that we make on Wiktionary. Should we provide for them a way to group together e.g. the contents of Category:Zenaga terms derived from Latin, Category:Tarifit terms derived from Latin etc. as Category:Berber terms derived from Latin?
Duplicating words in parent categories is however basically a manual work-around to what is a software problem. What seems to be really being sought here is the ability to view all terms contained ultimately inside one category, even after subcategorization. We already have the ability to preview sub-subcategories, so previewing subcategory terms should probably be possible too…? --Tropylium (talk) 20:05, 28 September 2016 (UTC)[reply]
Oppose as well. The subcategorisation already indicates that one is a subset of the other. —CodeCat 13:54, 27 September 2016 (UTC)[reply]
Who cares? Not like anyone reads these categories. It's an administrative issue only. Renard Migrant (talk) 14:31, 27 September 2016 (UTC)[reply]
That's not true. I've seen users at WT:FB and communicating through email say that they look through our categories for words. —Μετάknowledgediscuss/deeds 15:52, 27 September 2016 (UTC)[reply]
I have been under the impression that the "X derived from Y" categories are by now essentially obsolete, and that we should aim to clean them up into more specific etymological ones (with "X borrowed from Y" and "X inherited from Y" categories as the initial step): after all, we have been cleaning up instances of {{etyl}}. So I would oppose going right back. There may be room for double categorization for directly-borrowed vs. indirectly-borrowed terms, but at the very least, inherited vs. non-inherited terms should not be in the same category (where applicable).
— In terms of the category "tree" shown at the top, having to locate specific etymological categories by language family trawling such as French language » Terms by etymology » Terms derived from other languages » Indo-European languages » Germanic languages » West Germanic languages » English always seemed like a pain to me, and I am happy that {{bor}} does away with this. --Tropylium (talk) 19:51, 28 September 2016 (UTC)[reply]
Derived terms category is not obsolete, it's the category for words borrowed into older stages of the language and for calques. Korn [kʰũːɘ̃n] (talk) 08:23, 29 September 2016 (UTC)[reply]
  • I can't overstate my support. Very frankly, I always thought the absence of the supercategories was a bug (!) caused by oversight and I'm shocked that anyone would actually support this situation. I regularly browse categories as a user. And, to name a recent example from my life, if 'Romanian given names derived from Greek' are not listed under 'Romanian given names', this dictionary becomes a, pardon my French, fucking hassle to use, since the category 'Romaninan given names' does not list - as I would certainly expect from the name - all Romanian given names on this site. Why would we (the editors) make me (the user) click through 15 categories and actively prevent me from having an overview, and why would we exclude data from showing a category which it already is part of? Korn [kʰũːɘ̃n] (talk) 08:23, 29 September 2016 (UTC)[reply]
Strongly support categorizing borrowed and inherited terms also into the derived-terms category, as well as into their more specific category. Alternatively, "derived terms" could be left as the exclusive category of "der" if another macro-category were set up ("English terms from Spanish"?). I recall this being discussed before or while the current threesome of templates was being set up. Often I only want a list of terms from language X present in language Y, and don't care how they got there (direct borrowing, chain of borrowing, etc). - -sche (discuss) 20:51, 29 September 2016 (UTC)[reply]

So is this enough consensus to start implementing the complete categorisation or should this go into a vote? Korn [kʰũːɘ̃n] (talk) 22:22, 7 October 2016 (UTC)[reply]

I think it needs a vote... In this discussion, the proposal clearly has consensus but there are some people opposed to the change. I created Wiktionary:Votes/2016-10/Populating "derived from" categories with borrowed and inherited terms. --Daniel Carrero (talk) 11:43, 8 October 2016 (UTC)[reply]
Scratch that, I implemented the proposed change. Feel free to revert, discuss, etc. --Daniel Carrero (talk) 14:35, 25 October 2016 (UTC)[reply]

Vote about disallowing triple-braced template parameters in entries

[edit]

Based on Wiktionary:Beer parlour/2016/August#Proposed addition to WT:NORM: no template parameter expansions, I created Wiktionary:Votes/pl-2016-09/No triple-braced template parameters in entries. --Daniel Carrero (talk) 05:02, 29 September 2016 (UTC)[reply]

Some may be false positives or inside of comments. DTLHS (talk) 05:50, 29 September 2016 (UTC)[reply]
Chinese was a false positive due to a couple of stray extra braces. Most of the others look like bad substs. I don't see the point in a vote to outlaw something that seems to be strictly unintentional. Chuck Entz (talk) 07:03, 29 September 2016 (UTC)[reply]
The purpose is to not make the bot responsible if it mangles badly-formatted entries, thereby making bots easier to write. —CodeCat 12:38, 29 September 2016 (UTC)[reply]
So this is about blame? If someone doesn't comply with the rule, then the fact of the matter is that the bot may make matters worse. I understand that good programming practice is to test input for conformity to what the program needs and to bypass what does not conform. DCDuring TALK 13:05, 29 September 2016 (UTC)[reply]
Wikitext is completely freeform. Anyone can write anything at all in it. If we don't put checks on it, bots become impossibly complicated to write. The simpler the format, the easier it is to understand for both humans and computers, which will attract more new editors. —CodeCat 13:09, 29 September 2016 (UTC)[reply]
If people aren't even aware that they're leaving this stuff in the entry, they're not going to be influenced by any rules, especially since there's usually no way to spot this without looking at the wikitext after a save. I think the best way to deal with this is a filter that warns and tags, but doesn't disallow (we don't want to make editing impossible for entries with existing hard-to-fix examples). There may be some intricacies regarding the timing of template expansion vs. the timing of the filter, so it may require some tinkering to keep it from producing false positives, but it should be doable. Of the editors mainly responsible for the list above, @Rajasekhar1961 and @Aryamanarora are relatively inexperienced, but @Equinox and @Renard Migrant are veterans and would know better than to do this on purpose. Chuck Entz (talk) 14:06, 29 September 2016 (UTC)[reply]
Could always make an abuse filter which warns/prevents saving an entry with such a formation. Making the wikitext more uniform with no downside is a no-brainer for me. - TheDaveRoss 13:40, 29 September 2016 (UTC)[reply]
Would people understand the message enough to realize that triple curly brackets are a problem? Impenetrable technical explanations followed by a rejection of one's edits doesn't sound terribly nice. —suzukaze (tc) 17:53, 29 September 2016 (UTC)[reply]
Perhaps something along the lines of "Your edit contains invalid markup; check for the use of triple braces ("{{{example}}}") which should be removed and resubmit. If you need further assistance check in at the Grease Pit." It could also just be a flag to prompt further scrutiny and keep track of when the syntax is used. - TheDaveRoss 18:00, 29 September 2016 (UTC)[reply]
Completely agree with DCD. Requiring a "useful assumption for parsers" suggests that people want to use crappy, non-watertight parsers that don't do proper validation of what they are parsing. In practice, having a rule won't guarantee the rule is actually followed (since entries are free text, as stated). And if a bot mangles 1000 entries then we still have 1000 mangled entries, even if we can conveniently say "oh well it was the fault of that Chinese IP that made one edit in 2007". Equinox 13:34, 29 September 2016 (UTC)[reply]
The one at stop is being used inside <math></math> tags with some TeX meaning and not as a template parameter. --WikiTiki89 14:46, 29 September 2016 (UTC)[reply]
The one at stop does not accurately reflect the source text. It should not have been "{{{1}}}". --Daniel Carrero (talk) 14:52, 29 September 2016 (UTC)[reply]
That's beside the point (but if you want to fix it, coalescing, dyonically, and sbottom also do that). --WikiTiki89 15:20, 29 September 2016 (UTC)[reply]
I have investigated and found that most of these come from improperly substed templates. Is there any way to create an edit filter that can detect the use of subst:? Also some of these come from the "New Entry" links that you get when searching for an entry that doesn't exist (specifically, all except for Basic, Noun, 3rd person, and Participle contain these template parameters, but they all probably need some cleanup). --WikiTiki89 15:20, 29 September 2016 (UTC)[reply]
Yes gab, merir, my bad, you used to be able to subst: {{wikipedia}} but it's been modified so you can't. Could an admin change that? It's ridiculously easy to do. Renard Migrant (talk) 17:59, 29 September 2016 (UTC)[reply]
I hope you mean {{w}}. --WikiTiki89 18:36, 29 September 2016 (UTC)[reply]
Yeah, that's the one! Renard Migrant (talk) 19:39, 29 September 2016 (UTC)[reply]
I've fixed it, by the way. --WikiTiki89 20:00, 29 September 2016 (UTC)[reply]

About that list of entries: Wikitiki89 fixed some, Renard Migrant fixed others, I fixed the rest. --Daniel Carrero (talk) 06:37, 10 October 2016 (UTC)[reply]

Coptic construct states

[edit]

How should Coptic construct states be entered (examples at ⲣⲟ (ro) and ⲟⲩⲱⲙ (ouōm) - this affects verbs, nominals and prepositions)? Coptologists have the convention of adding a hyphen after nominal state forms and equal signs to pronominal state forms, but the equal signs aren't going to work. So should those forms be hyphenated or entered bare? Lingo Bingo Dingo (talk) 13:31, 29 September 2016 (UTC)[reply]

I suggest what we do for Hebrew, which is display the hyphen (or equals sign) in the link, but not in the target of the link. Thus:
ⲣⲟ (rom (nominal construct state ⲣⲉ- or ⲣⲁ-, pronominal construct state ⲣⲱ=, plural ⲣⲱⲟⲩ=)
We should probably create {{cop-noun}} to make this easier. --WikiTiki89 15:24, 29 September 2016 (UTC)[reply]
See {{=}} by the way, also something like 1== in a template should produce an equals sign. Renard Migrant (talk) 17:51, 29 September 2016 (UTC)[reply]
In fact I had to use that trick to get {{=}} to display! Renard Migrant (talk) 17:56, 29 September 2016 (UTC)[reply]
I didn't need to do that in my example above. --WikiTiki89 18:34, 29 September 2016 (UTC)[reply]
No and I wasn't claiming you did, just pointing out that that's what {{=}} is for. Renard Migrant (talk) 20:52, 2 October 2016 (UTC)[reply]
Good idea, though maybe there should be hyphens in the target since they're usually directly attached to other forms (dissimilar to construct states in Semitic languages) and many aren't stand-alone forms. Lingo Bingo Dingo (talk) 13:22, 1 October 2016 (UTC)[reply]
@Lingo Bingo Dingo: Usually or always? That's an important question. Pronominal construct states perhaps don't need their own entries, but rather a table such as at בן listing out the full pronominal forms. --WikiTiki89 17:55, 5 October 2016 (UTC)[reply]
@Wikitiki89 Coptic used continuous script, but modern publications separate words. Nominal states vary in the modern convention depending on context, the editor's preference and word class (monosyllabic prepositions are usually connected to their nominals, for nouns and verbs that's more rare), pronominal states are always prefixed to a pronominal suffix, whether it is a noun, verb or preposition. Lingo Bingo Dingo (talk) 12:17, 6 October 2016 (UTC)[reply]
@Lingo Bingo Dingo: Oh. It seems that the "modern conventions" are very similar to Hebrew, so we can follow the same approach. --WikiTiki89 14:06, 6 October 2016 (UTC)[reply]
@Wikitiki89 Fine with me. Tables don't have to be a priority though, as pronominal suffixes are relatively regular in Coptic. Lingo Bingo Dingo (talk) 11:42, 10 October 2016 (UTC)[reply]
@Lingo Bingo Dingo: If it's that regular, it should be pretty easy to make a table template. --WikiTiki89 15:44, 10 October 2016 (UTC)[reply]

Grants to improve your project

[edit]

Greetings! The Project Grants program is currently accepting proposals for funding. There is just over a week left to submit before the October 11 deadline. If you have ideas for software, offline outreach, research, online community organizing, or other projects that enhance the work of Wikimedia volunteers, start your proposal today! Please encourage others who have great ideas to apply as well. Support is available if you want help turning your idea into a grant request.

I JethroBT (WMF) (talk) 19:52, 30 September 2016 (UTC)[reply]