Wiktionary talk:Todo
Add topicArchive
[edit]Old discussions have been archived to Wiktionary talk:Todo/archive.
Common misspellings
[edit]A great many misspellings occur in our entries, even in headers. See Wiktionary:Todo/Misspellings, and add to it. Then, we can periodically search for and eliminate instances of the listed misspellings. - -sche (discuss) 05:40, 2 March 2014 (UTC)
Stray spaces
[edit]Stray spaces appear in a number of predictable/easily-findable circumstances, such as this. - -sche (discuss) 06:55, 13 July 2014 (UTC)
Instances of people starting with pl2= or other numbered parameters >1
[edit]According to TemplateTiger, this was the only example of someone using {{en-proper noun}}
and starting with pl2= rather than with the unnamed first parameter. It might be fruitful to check if other templates have been used in the same way, i.e. with a second plural form ("pl2=") declared prior to any first plural was declared, particularly if (as here) no first plural is automatically displayed, and/or pl2= is set to whatever the automatically displayed plural would be. - -sche (discuss) 19:23, 1 August 2014 (UTC)
Tool for manually finding misspelt or unsupported parameters
[edit]TemplateTiger is a good tool for finding which entries use certain parameters of a given template, regardless of whether or not the template supports those parameters. Among other things, this can allow one to find misspelt or mistaken parameters, like the "compound=" or "current=" parameters formerly used in save-all and barrel roll, or the following misspellings of "head=" : "head]", "haed", "hwad", "heead". - -sche (discuss) 20:35, 1 August 2014 (UTC)
Middle dots as decimal points
[edit]Doremítzwr seems to have used middle dots as decimal points(??) in some entries, e.g. the depth measurement here. These should be located and cleaned up. - -sche (discuss) 03:59, 20 August 2014 (UTC)
Look for brackets in the displayed text of pages
[edit]If someone could examine the displayed text of pages (as opposed to the wikitext) and look for instances of {{, [{, {[, [[, }}, ]}, }], or ]], that would probably be informative. I imagine most occurrences of such strings are the result of mismatched brackets or bot-errors breaking templates across lines. - -sche (discuss) 20:11, 27 August 2014 (UTC)
- I suppose one would have to have some sort of local wiki markup parser to do this. DTLHS (talk) 20:36, 27 August 2014 (UTC)
- Is mwparserfromhell of use? - -sche (discuss) 20:54, 27 August 2014 (UTC)
- I believe mwparserfromhell will only give you valid templates / links- it's not going to tell you if something is malformed. DTLHS (talk) 01:04, 28 August 2014 (UTC)
- Is mwparserfromhell of use? - -sche (discuss) 20:54, 27 August 2014 (UTC)
- Wiktionary:Todo/bad links. I just looked for lines where the number of occurrences of "[[" doesn't match that of "]]". Technically links can extend over multiple lines (but they probably shouldn't). Looking for malformed templates is much harder. DTLHS (talk) 03:15, 29 August 2014 (UTC)
- That looks like a very useful list; thank you! I've cleaned up a few entries already. One idea I may suggest in the GP is that we try to make an abuse filter that tags edits that leave a page with more [[s than ]]s or {{s than }}s, to alert us to new instances. I think abuse filters can do that; there's one than warns people against <ref> without <references/>. - -sche (discuss) 06:52, 29 August 2014 (UTC)
A bot could check for an remove commas after {{circa}}
(which itself adds a comma, making an additional comma superfluous), like so. - -sche (discuss) 19:43, 10 June 2015 (UTC)
Random excessive whitespace
[edit]Like this. - -sche (discuss) 21:47, 20 June 2015 (UTC)
Latin infinitives glossed as first-person forms
[edit]I've noticed several entries like this one, where the infinitive (not the first-person form) of a Latin word is given, but it is glossed as a first-person form. This is obviously incorrect regardless of whether one prefers to lemmatize infinitives or first-person forms. - -sche (discuss) 02:50, 29 June 2015 (UTC)
Untemplatized links to dictionaries
[edit]Should be found and templatized like [1]. I will try to do this myself. - -sche (discuss) 02:19, 7 July 2015 (UTC)
English terms spelled with Æ/Œ not marked as archaic/obsolete
[edit]For example, [2]. Some are valid (Æsir) but most are not. - -sche (discuss) 05:37, 30 July 2015 (UTC)
- @-sche User:DTLHS/cleanup/english ae oe DTLHS (talk) 20:15, 20 August 2015 (UTC)
- Thank you! If it's not too difficult, would it be possible to remove inflected forms of lemmas which also have Æ/Œ (e.g. œcologies, plural of œcology) — in such cases, it's sufficient that the lemma be marked; the plurals are generally not any more obsolete than the lemmas. Plurals of lemmas that don't contain Æ/Œ (e.g. cassiæ, plural of cassia) should stay on the list, since in those cases the plurals usually are more obsolete than other possible plurals. If that's too much bother, don't worry about it — I'll go through the entries on the list with AWB and can easily ignore œcologies-type entries. - -sche (discuss) 22:24, 20 August 2015 (UTC)
- I don't really have an easy way to distinguish them, sorry. DTLHS (talk) 22:35, 20 August 2015 (UTC)
- Thank you! If it's not too difficult, would it be possible to remove inflected forms of lemmas which also have Æ/Œ (e.g. œcologies, plural of œcology) — in such cases, it's sufficient that the lemma be marked; the plurals are generally not any more obsolete than the lemmas. Plurals of lemmas that don't contain Æ/Œ (e.g. cassiæ, plural of cassia) should stay on the list, since in those cases the plurals usually are more obsolete than other possible plurals. If that's too much bother, don't worry about it — I'll go through the entries on the list with AWB and can easily ignore œcologies-type entries. - -sche (discuss) 22:24, 20 August 2015 (UTC)
- Since I’m responsible for a sizeable chunk of these, I feel obligated to express my regret that I’m making you clean these up. I was pretty ignorant and immature back then, but I realise now that I was acting inappropriately. --Romanophile (talk) 08:47, 23 August 2015 (UTC)
Miscellaneous additional periodic tasks
[edit]- For reasons discussed in this old thread, some uses of the label "proscribed" on entries already labelled "colloquial", "informal", etc are not sensible. Searches like
insource:"lb en informal proscribed"
,insource:"lb en proscribed informal"
etc catch them. - -sche (discuss) 20:42, 4 February 2018 (UTC) - Check for miscapitalized labels; see Wiktionary:Grease pit/2015/August#Miscapitalized_labels. - -sche (discuss) 21:07, 4 February 2018 (UTC)
- Watch out for quotations which are on the same line as their bibliographic particulars (an old list was at Wiktionary:Todo/Single-line quotes; entries of the sort seem to persist). - -sche (discuss) 14:30, 19 February 2018 (UTC)
- Check that no more WT:Todo/Linked language names in trans tables remain. - -sche (discuss) 14:30, 19 February 2018 (UTC)
- Watch out for instances of "from from", and (a bigger problem on WP) "an" + consonant sounds that aren't used with "an" in any standard dialect (e.g. I just fixed one each of "an special" and "an school"). - -sche (discuss) 14:30, 19 February 2018 (UTC)
- The label "idiomatic" is arguably useless (Wiktionary:Beer parlour/2014/January#(idiomatic)). The labels "dialect(al)" and "regional" should generally be replaced with more specific information (I brought this up in the BP or maybe GP but I forget precisely where). - -sche (discuss) 23:13, 3 March 2018 (UTC)
Misformatted/indented quotes
[edit]Maybe this is already covered here, e.g. Special:Diff/48477724. – Jberkel 21:39, 4 February 2018 (UTC)
- It's not easy to determine automatically that a line contains a quote and not some other type of content. DTLHS (talk) 21:45, 4 February 2018 (UTC)
- No, it isn't, but we could at least catch some specific types of malformation like the one in that diff, probably even with an edit filter (what do you think, @Chuck Entz?). #: '''[0-9][0-9][0-9][0-9]''' is another red flag (though may have some false positives?). - -sche (discuss) 21:59, 4 February 2018 (UTC)
- @Jberkel User:DTLHS/cleanup/quote template line starts. Disclaimers: not all of these are errors, this is only for English entries, and this won't catch anything that doesn't use a quote-X template. DTLHS (talk) 22:42, 4 February 2018 (UTC)
- @DTLHS: that's a good start, thanks, i'll work my way through the list. Jberkel 22:52, 4 February 2018 (UTC)
- @DTLHS done. will this list get regenerated from the next dump? – Jberkel 10:24, 6 February 2018 (UTC)
- Sure, if you want me to. DTLHS (talk) 16:18, 6 February 2018 (UTC)
- @DTLHS could you run this again with the new dump? ta. – Jberkel 23:40, 23 February 2018 (UTC)
- @Jberkel Done. DTLHS (talk) 01:58, 24 February 2018 (UTC)
- @DTLHS again, please. – Jberkel 10:17, 24 March 2018 (UTC)
- @Jberkel Updated. DTLHS (talk) 23:01, 25 March 2018 (UTC)
- @DTLHS again, please. – Jberkel 10:17, 24 March 2018 (UTC)
- @Jberkel Done. DTLHS (talk) 01:58, 24 February 2018 (UTC)
- @DTLHS could you run this again with the new dump? ta. – Jberkel 23:40, 23 February 2018 (UTC)
- Sure, if you want me to. DTLHS (talk) 16:18, 6 February 2018 (UTC)
- @DTLHS done. will this list get regenerated from the next dump? – Jberkel 10:24, 6 February 2018 (UTC)
- @DTLHS: that's a good start, thanks, i'll work my way through the list. Jberkel 22:52, 4 February 2018 (UTC)
- @DTLHS: thanks, I'm slowly getting there. What is the point of the quotations header, isn't it redundant with Citations: and the sense-related quotes? – Jberkel 23:40, 25 March 2018 (UTC)
- Probably, it's just a lot of work to get rid of it. DTLHS (talk) 23:41, 25 March 2018 (UTC)
- @DTLHS: As a first step, couldn't we (semi-automatically?) shove them under the carpet (aka Citations:), then deprecate the L4 header? – Jberkel 10:52, 26 March 2018 (UTC)
- I don't like using citation pages to hold quotations that could be underneath sense lines. DTLHS (talk) 18:22, 26 March 2018 (UTC)
- Right, but that's 100% manual work. Best to leave it then. – Jberkel 19:32, 26 March 2018 (UTC)
- I don't like using citation pages to hold quotations that could be underneath sense lines. DTLHS (talk) 18:22, 26 March 2018 (UTC)
- @DTLHS: As a first step, couldn't we (semi-automatically?) shove them under the carpet (aka Citations:), then deprecate the L4 header? – Jberkel 10:52, 26 March 2018 (UTC)
- Probably, it's just a lot of work to get rid of it. DTLHS (talk) 23:41, 25 March 2018 (UTC)
English multiple etymologies, categorized
[edit]Possible task: find pages in Category:English terms with multiple etymologies that don't have multiple etymology sections (checking for pages that don't contain "=Etymology 2=" seems like one obvious way of doing that), which should probably be removed. In the other direction, look for pages that do have "=Etymology 2=" within an English section and aren't in this category yet. - -sche (discuss) 06:42, 30 May 2018 (UTC)
Male and female given names separately on same line
[edit]...should be combined like this. I will try to search for instances of this myself later. - -sche (discuss) 21:56, 15 June 2018 (UTC)
Form-of templates used as etymologies
[edit]...like this, should be cleaned up. - -sche (discuss) 00:14, 27 January 2020 (UTC)
Labels with wrong language code
[edit]As here. (Should try to catch these systematically.) - -sche (discuss) 18:33, 29 September 2020 (UTC)
Words with the "religion" label specific to one religion
[edit]Many entries with the {{lb|en|religion}}
label are specific to Christianity (or rarely to another religion such as Buddhism) and should use the more specific label instead, for example "use". (Several other entries should not use the label at all, like Jew or Calvinist.) - -sche (discuss) 10:55, 26 November 2020 (UTC)
The following discussion has been moved from Wiktionary:Requests for cleanup (permalink).
This discussion is no longer live and is left here as an archive. Please do not modify this conversation, but feel free to discuss its conclusions.
I just listed this as a WT:TODO task because I expect it'll keep being an issue even after we fix the existing cases, but: numerous entries in "Terms borrowed from Proto-Foo" categories (like Category:Terms borrowed from Proto-Slavic) were not actually "borrowed" by the L2 language in the way we use the word; see e.g. here. (Surprisingly, one English word apparently was borrowed from Proto-Indo-European, ghrelin.) - -sche (discuss) 08:37, 11 June 2018 (UTC)
Words that are obviously gendered but not defined as such
[edit]https://en.wiktionary.org/w/index.php?title=hatchet_man&diff=66526741&oldid=66526489 —Fish bowl (talk) 09:58, 28 April 2022 (UTC)
T:en-plural nouns that (maybe) aren't
[edit]There is a longstanding issue of how to handle group names like "the Abenaki", "the Venda", etc. Many are listed as plural-only using the template above, but there are in fact cites of the singulars ("a Venda") and of the plurals ("Vendas"), so my impression is that these are supposed to be recast as singulars which can have either regular plurals (Vendas) or invariant plurals (Venda). I spy quite a few of these at Special:WhatLinksHere/Template:en-plural_noun. - -sche (discuss) 21:37, 13 August 2022 (UTC)
""double quotes"" in T:qfliteral
[edit]I came across one entry, and did a database dump search and found three more entries, which 'manually' wrote quotation marks inside T:qfliteral, which itself adds quotation marks, resulting in ""double""; this might be worth checking for once a year or something. είναι κινέζικα για μένα, αυτά μου φαίνονται κινέζικα, εντελώς αβέβαιο, durante beneplacito. - -sche (discuss) 06:48, 3 January 2024 (UTC)
find duplicate definitions
[edit]Spitballing: a recurring issue is that when an entry has multiple etymology sections, people only look at the first one and, if not seeing the sense they seek, add it there or add it as a new etymology section (without noticing it is already present in another etymology section). Examples that I can find offhand are the cases de-duplicated in diff of e and diff of linn. I wonder if we could make a list of entries (in a given language: say we start with English) that have multiple etymology sections, and then winnow it to only cases where the definitions in ety 1 and 2 have "important" words in common, for example by 1. retaining cases where definitions had any words in common other than words on a list of "unimportant" words like "the", "of", "and", "for", "from", etc (and also excluding where one of the definitions was a non-gloss like "past tense of foo"), and 2. looking at the results and expanding the list of "unimportant" words, thus progressively winnowing the list of entries that have "important" words in common until it's a manageable size to put the "sets of definitions with words in common" on a page and let a human look them over and spot duplicates. Not saying this is a priority, and not sure if it's feasible, but I'm mentioning the idea. - -sche (discuss) 17:53, 2 May 2024 (UTC)