Wiktionary:Todo
- This page is for cleanup jobs. Request jobs are at Wiktionary:Task lists.
This page lists cleanup requests affecting multiple entries. These may include updating templates, categories or generic entry structure, but not specific terms, which should be tagged with {{rfc}}
and put on WT:RFC. Therefore, tasks that have previously been divided across discussion and user pages are grouped together in one place where they are easier to find.
Frequently updated todo lists
- Todo Lists project
- Please see WT:Todo/Lists for a set of regularly updated cleanup lists.
- JeffDoozan's cleanup lists
- See User:JeffDoozan for lists of entries with formatting or layout errors.
Regular tasks
In this section, you will find relatively easy cleanup tasks.
Categories
Updated live.
- Category:Non-empty category redirects: 13 currently.
- Category:Pages with broken file links: 48 currently. Many of these are spelling errors or remains of file moves and easy to fix.
- Category:Pages with module errors: 6 currently.
- Category:Pages with ParserFunction errors: 2 currently.
- Category:Pages using deprecated templates: 0 currently.
- Category:Categories that are not defined in the category tree: 334 categories currently.
- Category:Translation table header lacks gloss: 14,255 pages currently.
- Category:Pages with incorrect ref formatting: 54 currently.
- Category:Orphaned documentation subpages: 29 currently.
- Category:Limit of template reached: 3 pages currently.
Special pages
Updated a couple of times a week.
Todo lists
Updated weekly.
- Wiktionary:Todo/Lists/Blank or extremely short pages .. 25 results .. last updated 17:25, 24 November 2024 (UTC)
Semi-regular tasks
Usually dump-analyzed:
- Unhelpful abbreviations — These should use the full term.
- Occasionally, soft hyphens or other invisible/zero-width characters
(|||)
sneak into the content of entries or even the pagenames; the soft hyphens should be removed; the other characters should be discussed. - People sometimes type {[, }] etc when they mean {{ / }}. It is useful to periodically scan dumps for instances of this. Here is some regex:
([^\[\{]\[\{[^\[\{]|[^\[\{]\{\[[^\[\{]|[^\]\}]\]\}[^\]\}]|[^\]\}]\}\][^\]\}])
. Simply searching for]}
will not work, because there are many valid instances of it, e.g.{{m|en|a [[link]]}}
. - Every few months, check for instances of the common but nonstandard headers "Alternative form", "Alternative spelling" and "Alternative spellings" (which should be "Alternative forms") and "Usage note" (which should be "Usage notes"). Many other nonstandard headers exist, but none are as common as those. Also, no L1 headers should exist in the main namespace (language headers should always be L2, and all other headers should always be L3 or more). See User:Erutuon/mainspace headers for a full list of non-language headers and User:Erutuon/mainspace headers/possibly incorrect for a list of possibly incorrect headers.
- Check for entries using modifier letters or deprecated IPA characters.
- Search for (using the site search function) and fix
"Etymology 2" -"Etymology 1"
and other cases of higher-number etymologies without the full complement of lower-number etymologies. - Check for misindented quotations (pages with a line containing
{{quote-
but not starting with #* or ##*) - Check for entries that use Template:sense, Template:a, manual formatting, etc. instead of
{{lb}}
- People, and some other online dictionaries, write /e/ where the actual IPA symbol is /ɛ/, e.g. [1], [2], [3]
To be monitored manually:
- Check periodically for misspellings.
- Check periodically that things in Category:English countable proper nouns aren't mislabelled common nouns.
- Periodically fix entries in Category:Terms borrowed back into the same language that are not twice-borrowed, like this.
- Periodically fix entries in "Category:Terms borrowed from Proto-" categories (see [4]) like Category:Terms borrowed from Proto-Slavic that were not actually "borrowed" by the L2 language (e.g. here). (list)
- Check for incorrect characters (for instance, ي vs ی and ك vs ک in Arabic, Persian, Urdu, Azeri, and other languages) that look identical in certain positions and are often mixed up. Lists of instances in common linking templates are found at User:Erutuon/wrong script.
- Check for transliterations containing incorrect characters. These usually indicate errors of some kind. User:Erutuon/bad transliteration lists transliterations with non-Latin characters, except for those with CJK script and Latin script, such as
{{t+|ja|言葉|tr=ことば, kotoba}}
. - Periodically fix entries in Category:English terms derived from Greek that are in fact from Ancient Greek (
grc
) rather than (Modern) Greek (el
). - Periodically check /Citations without citations
- Periodically check for translations that aren't templatized (but use bare links, like * French: [[foo]])
Also:
- Uses of the language code
aaa
(Ghotuo) in translations tables are often vandalism. pdc
(Pennsylvania German) andpdt
(Plautdietsch),aja
(Aja/Adja of Sudan) andajg
(Adja/Aja of Benin) need to be kept separate.
Useful search queries
- insource:/\# \(\[\[/ -insource:/Chinese/
- Mostly, the matched texts need to use
{{label}}
template. It's also possible to search by a specific label (# ([[botany]]
)
- Mostly, the matched texts need to use
- insource:/=.\{\{sense/
- Mostly
{{sense}}
should be preceded with an asterisk.
- Mostly
- insource:/\=\=Etymology 2/ -insource:"Etymology 1"
- Entries with Etymology 2 but not Etymology 1 (there are a few false positives where "Etymology 2" is inside a comment)
- Instance of "the the" in the entry - these errors keep on occurring, it's pretty crazy. GreyishWorm (talk) 01:53, 12 November 2022 (UTC)
- Has "trans-top" but not in English or Translingual lemmas/non-lemma forms
- Special:Search/insource:"Wikipedia.org", Special:Search/insource:"Wiktionary.org"
- Unwanted language code in
{{w}}
: insource:/\{w\|en\|/. The language code (for other than English Wikipedia) is in the|lang=
parameter, so{{w|en|...}}
links to the page "En" on English Wikipedia. The "en" can be replaced in the search box to search for other language codes, in which case they would be fixed by either deleting the language code in the first parameter or adding "lang=" in front of it. - Pages with templates missing parameter
|1=
: Special:WhatLinksHere/Unsupported_titles/`lcub``lcub``lcub`1`rcub``rcub``rcub` (you can use namespace filtering to weed out userspace, and change the "1" to view other positional parameters) - Fodder for
{{etymid}}
- hard-coded links to a numbered etymology section: Special:Search/insource:/\#Etymology+[0-9]/
If the search gives a warning (and even if it doesn't!), see Help:CirrusSearch for ways of making the search much less demanding on the servers and much more likely to provide a complete list of problem entries.
All subpages
Subpages of Wiktionary:Todo :
2013
> This is the list of entries, as of the last database dump, that contain Slovene translations with the gender m ("masculine"). They should most likely be changed to use either m-an (+ "animate") or m-in (+ "inanimate"), since that distinction has grammatical consequences in Slovene. (?)
—RuakhTALK 14:34, 11 September 2013 (UTC)
2015
In many cases, these are unnecessary and cause problems. - -sche (discuss) 18:16, 21 January 2015 (UTC)
- What are LTR marks and how should one improve the entry? --A230rjfowe (talk) 21:00, 15 July 2015 (UTC)
- What are RTL marks and how should one improve the entry? --A230rjfowe (talk) 21:00, 15 July 2015 (UTC)
- They are invisible characters that otherwise behave like strongly left-to-right characters (such as Latin letters) or strongly right-to-left characters (such as Arabic letters), in that they influence the direction of surrounding characters that do not have a defined text direction. So they are sometimes used to change the direction of characters in text. For instance, on Wiktionary, where text direction is generally left-to-right, punctuation characters can be forced to render right-to-left by sandwiching them between Arabic letters and a right-to-left mark.
- But CSS should be used to change text direction instead, whenever possible. On Wiktionary, we do this by adding classes that have the correct CSS properties: for instance, enclosing Arabic text in
class="Arab"
, which has the CSSdirection: rtl; unicode-bidi: embed;
applied to it in MediaWiki:Common.css. This is done automatically by most linking templates. - You can read more in w:Left-to-right mark and w:Right-to-left mark and w:Bidirectional text. — Eru·tuon 16:50, 16 October 2019 (UTC)
- Regenerated. - -sche (discuss) 14:48, 19 February 2018 (UTC)
A partial list of pages where at least one language section simply states, in plain text, without using {{etyl}}
, that it derives from German, French, Latin, Greek, Ancient Greek, Chinese or Spanish. - -sche (discuss) 17:43, 25 January 2015 (UTC)
- Regenerated (1469 entries). - -sche (discuss) 14:44, 19 February 2018 (UTC)
A list of entries which are labelled as being Canadian, or American, but not both. It is likely that many should in fact have both labels. See Wiktionary:Beer_parlour/2015/March#North_American_English_vs_Canadian_and_American_English for a bit of background. - -sche (discuss) 05:00, 7 March 2015 (UTC)
Erroneous Greek characters
Any place that the character ϕ is used in place of φ or ϑ in place of θ in a string that is marked as being grc
or el
should be listed so that an editor can look them over and fix mistakes. I just found one lying around in a {{term}}
, which made me think that these shouldn't be overly hard to find. —Μετάknowledgediscuss/deeds 21:01, 12 May 2015 (UTC)
- @Metaknowledge: Never knew this page existed. Ironically I came across this why searching for incorrect uses of ϕ. For future reference, here is the search for ϕ and here is the search for ϑ (other incorrect characters are ϖ ϛ ϰ ϱ ϐ ϵ ϲ ϗ ȣ; there may be more). --WikiTiki89 13:20, 21 April 2017 (UTC)
- If nothing has been done about this, I can make Module:script utilities search for these characters when it tags text, and add a tracking template or a category. — Eru·tuon 23:50, 20 May 2017 (UTC)
- @Metaknowledge, Wikitiki89: Done. — Eru·tuon 00:02, 21 May 2017 (UTC)
- @Erutuon: It's never done, people will keep adding them. --WikiTiki89 15:03, 22 May 2017 (UTC)
- Oh sorry, you were referring to having Module:script utilities search for them. It's not that nothing has been done, I went through and removed over a hundred of these. But again, people will keep adding them. --WikiTiki89 15:05, 22 May 2017 (UTC)
- Right. I just found one in polypharmacy... 🙄 — Eru·tuon 18:14, 22 May 2017 (UTC)
- Oh sorry, you were referring to having Module:script utilities search for them. It's not that nothing has been done, I went through and removed over a hundred of these. But again, people will keep adding them. --WikiTiki89 15:05, 22 May 2017 (UTC)
- @Erutuon: It's never done, people will keep adding them. --WikiTiki89 15:03, 22 May 2017 (UTC)
- User:Erutuon, should we add an actual cleanup category to entries using these? I just cleaned up hypophora, which had no indication on the page itself (that I noticed) of the problem (though someone who knew where the tracking template was could find the page). I'm going at ask in the WT:GP if we could catch these with an edit filter. - -sche (discuss) 11:13, 12 November 2022 (UTC)
- @-sche: Unfortunately that isn't a good idea because adding a category will cause changes in parsing in certain cases. We don't do that at all when language-tagging text at the moment and so language-tagged text can be used in cases where a link wouldn't be allowed. For instance, if language-tagged text is inside the text of a page link (
[[some page|{{lang|grc|ϑ}}]]
), adding a category link (the equivalent of[[some page|{{lang|grc|ϑ}}[[Category:Bad Ancient Greek text]]]]
) next to it would break the page link. — Eru·tuon 21:28, 13 November 2022 (UTC)
- @-sche: Unfortunately that isn't a good idea because adding a category will cause changes in parsing in certain cases. We don't do that at all when language-tagging text at the moment and so language-tagged text can be used in cases where a link wouldn't be allowed. For instance, if language-tagged text is inside the text of a page link (
- As of a while ago, I implemented (with an IP's help) one filter which warns against a few of the most-wrong of the characters above, which has already helped some users to replace them before saving their edit, and another filter which silently tracks all of the characters. - -sche (discuss) 02:38, 2 August 2023 (UTC)
Not click characters
All over the dictionary, e.g. in the name and content of !nawas and in this translation, ! turns up for ǃ, and I wouldn't be surprised to find other substitutions for click consonants. The best way I can think of to find such uses is: create a list of all languages that use clicks, or as a presumably easier-to-make approximation of that a list of all Khoisan languages, then search a database dump for all translations, language sections, and {{m}}
/{{l}}
s of those languages that contain !. I've just cleaned up the few pages which misused ! in their pagenames (only 31 pages on Wiktionary used ! in their pagenames at all). - -sche (discuss) 18:42, 25 August 2015 (UTC)
2017
Check IDs
As discussed at Wiktionary:Grease pit/2017/May § Adding ids to enable linking to headwords, we need to check for sense ids in {{senseid}}
and the |id=
parameter of headword templates that are on the same page and have the same language and have the same id string: that is, those that would create the exact link when input into an entry linking template. Each sense id for a given language on a given page should be unique. — Eru·tuon 16:57, 19 May 2017 (UTC)
Usage note template naming
User:-sche/Usage note templates lists some usage-note templates which could be moved to fit our usual naming scheme, as described on the page and [5]. - -sche (discuss) 22:01, 26 May 2017 (UTC)
Possibly mislabeled affixes
Wiktionary:Todo/interfixes: These look like interfixes, but are labelled "prefixes" or "suffixes". - -sche (discuss) 19:57, 8 June 2017 (UTC)
- Regenerated (per request on my talk page). Note that some, e.g. for Navajo, may be fine as they are. - -sche (discuss) 03:34, 15 February 2020 (UTC)
Pronunciation audio files
User:DerbethBot/Add manually: DerbethBot adds pronunciation files to entries, but some audio files need to be added manually. (See also User:DerbethBot for more info.) -- Curious (talk) 12:00, 11 June 2017 (UTC)
2018
Entries where label language does not match entry language. – Jberkel 00:01, 28 February 2018 (UTC)
Terms not restricted to legal jargon
Quite a few entries with usage notes like this are labelled {{lb|en|law}}
, but are in fact in general use and not at all restricted to legal jargon (so the label should be removed). - -sche (discuss) 00:10, 23 December 2018 (UTC)
2022
Broken interwiki links
You can help repair the broken links to Wikipedia, Wikispecies, Wikimedia Commons and Wikisource at the subpages of User:This, that and the other/broken interwiki links. For each page listed, one of the following three things should be done: (1) correct the spelling, pluralisation, lowercase/uppercase of the link, add a |lang=
parameter etc., (2) remove the link template altogether if not appropriate, or (3) create a redirect on the other wiki (many redirects on other projects were valid but have since been deleted). This, that and the other (talk) 03:14, 2 February 2022 (UTC)
See above. 70.172.194.25 00:59, 1 April 2022 (UTC)
See the description on the subpage itself. 70.172.194.25 19:55, 10 April 2022 (UTC)
Invocations of templates where the first parameter is a language code, but it does not match the language header. Similar to the above, but captures a wider range of templates. This, that and the other (talk) 10:20, 7 May 2022 (UTC)
To find compound terms not linked Dunderdool (talk) 21:39, 24 July 2022 (UTC)
Terms from Webster's 1913 dictionary
Thousands of them are at Category:Webster 1913 (and have been around since almost the beginning of Wiktionary!). Often only one or two terms in Webster's dictionary have not been assimilated and modernized into Wiktionary, sometimes more. GreyishWorm (talk) 17:51, 22 October 2022 (UTC)
- At one term there were 29,000 entries, according to archive.org P. Sovjunk (talk) 14:10, 27 April 2024 (UTC)
- >21,000 as of today. GreyishWorm (talk) 15:11, 12 November 2022 (UTC)
- <17,000 Ñobody Elz (talk) 08:25, 5 June 2023 (UTC)
- <16,000 Creeps like you (talk) 12:29, 2 July 2023 (UTC)
- <15,000 Worm spail (talk) 17:09, 28 August 2023 (UTC)
- <14,000 Denazz (talk) 08:07, 20 December 2023 (UTC)
- <13,000 Denazz (talk) 21:11, 23 February 2024 (UTC)
- <12,000 P. Sovjunk (talk) 07:11, 29 April 2024 (UTC)
- <11,000 Denazz (talk) 13:32, 21 July 2024 (UTC)
- <10,000 Denazz (talk) 13:05, 10 September 2024 (UTC)
- <9,500 P. Sovjunk (talk) 21:51, 30 September 2024 (UTC)
- <9000 P. Sovjunk (talk) 13:21, 15 October 2024 (UTC)
- User:This, that and the other/Websterpedia contains a subset of the category which have corresponding Wikipedia pages.
2023
Shorten them, or convert to quotations. This, that and the other (talk) 01:24, 19 June 2023 (UTC)
Many undated quotes. Chioshio (talk) 03:10, 19 June 2023 (UTC)
"Raw" inflection tables in entries
Numerous entries contain hard-coded, non-templated inflection tables. Languages especially affected include Hunsrik, Pennsylvania German, Albanian, Old Marathi, and Sanskrit. Some of them have probably been subst'ed by accident, but in other cases, no inflection template exists. The development of a new one will be necessary.
See the search, which currently returns 198 pages. This, that and the other (talk) 05:45, 14 August 2023 (UTC)
- In the English Wikipedia I instituted a system whereby certain accidentally substituted templates (cleanup tags) were easily de-substituted. I think someone else improved it so that they de-substituted themselves, though this required some magic somewhere. Rich Farmbrough, 15:43, 13 December 2023 (UTC).
Non-standard superscript Wikipedia links
[6] This, that and the other (talk) 11:23, 3 October 2023 (UTC)