Template talk:taxlink

From Wiktionary, the free dictionary
Latest comment: 8 months ago by JeffDoozan in topic Convert to Lua
Jump to navigation Jump to search

Future development

[edit]

This template or a derived version may be used to put entries with multipart taxonomic names into distinct categories for each of their missing components. DCDuring TALK 02:54, 9 September 2012 (UTC)Reply

Use with Template:term

[edit]

{{term||{{taxlink|Pinus|genus}}|lang=mul}}, ie, as SECOND, NOT FIRST parameter, allows {{taxlink}}'s functionality to be combined with almost all of that of {{term}}. Use as second parameter prevents error messages from Luacization of {{term}} and provides the user with a link to the Wikispecies entry, if any, or to a Wiktionary entry once created. DCDuring TALK 08:46, 16 June 2013 (UTC)Reply

RFC discussion: August 2013–April 2014

[edit]

The following discussion has been moved from Wiktionary:Requests for cleanup.

This discussion is no longer live and is left here as an archive. Please do not modify this conversation, but feel free to discuss its conclusions.


This is creating masses of red linked categories where I don't understand what the categories even mean, for example Category:Entries missing taxonomic name Cornus sericea. Two things, why would we want divide these up by specific taxonomic names (in this example Cornus sericea) instead of putting everything in Category:Entries missing taxonomic name. Secondly, the categorization shouldn't work in all namespaces. Definitely not the user namespace. Appendix probably. But not all namespaces. Mglovesfun (talk) 21:00, 25 August 2013 (UTC)Reply

This approach finds the missing taxonomic names that commonly used based on occurences of {{taxlink}} to focus new-entry creation on those. Usually, in addition to those so categorized, there are also some number of other entries that use the missing taxonomic name unlinked. One good way to clean up the missing (red) categories is to add the taxonomic names that are missing and then add the missing ordinary links to those taxonomic name entries, also cleaning up the superfluous occurrences of {{taxlink}}. I'd be glad to provide the entry starter that I use for species names to anyone interested. If you take a look at Special:WantedCategories you will see that a large portion of the missing categories at the bottom of the first page and thereafter are for missing taxonomic names. I have started at the top and have added entries for all of those with five or more items in the category.
It works in User namespace because I have some lists of species in my user space that are not worth making an appendix for. I suppose I could just work from those lists as they include the species of most interest to me for one reason or another.
You may also note that there are several thousands of entries with red categories that have nothing to do with taxonomic names. DCDuring TALK 00:19, 26 August 2013 (UTC)Reply
Now that I have finally had adequate success with bot runs that cover this, I have eliminated this categorization from {{taxlink}}. This should be shown in the next run of Special:WantedCategories, c. 12/16-17. DCDuring TALK 15:38, 15 December 2013 (UTC)Reply


[edit]

I seem to be using this template in the wrong places. Where is it appropriate to use this template? — This unsigned comment was added by BoBoMisiu (talkcontribs) at 19:30, April 5, 2015‎.

I just noticed this. Do you still have the problem? DCDuring TALK 13:55, 3 October 2016 (UTC)Reply
@DCDuring: Thank you for the reply. I have not been contributing much in Wiktionary over several months and do not remember what this was about. If I remember, I will ping you. —BoBoMisiu (talk) 00:13, 4 October 2016 (UTC)Reply

To be implemented

[edit]

Please use these where warranted in anticipation of the implementation.

The optional parameter "nospe=1" should be used if there is no current link at Wikispecies. It deactivates linking. If this is used there there should be another link to Wikispecies, as under an External links header to a higher taxon that included the taxon of the headword.

The optional parameter "obs=1" should be used to indicate that there is positive indication that the taxon is obsolete. This typically arises in the etymology of a higher taxon which is derived by suffixation to such an obsolete genus name. This will also automatically deactivate linking to Wikispecies.

Experiment (inactive)

[edit]

Experimentally, this template categorizes each templated item itself in a category which includes the name of the missing item. This permits the use of Special:WantedCategories to count all the entries that have templated use of the name to help speed the creation of entries for the most missed names. For now it is restricted to genus names. In the near future, it will be switched to family names, then species names, cycling through those until only single-page wants remain. With the creation of these additional entries it will probably be useful to cycle through more than once, before adding other levels of taxonomic names, such as orders, tribes, subfamilies etc.

noshow?

[edit]

Could someone explain in the documentation what |noshow= does? I see it used in entries, but am not sure whether to use it or not because I don't know what it's for. — Eru·tuon 17:41, 12 November 2016 (UTC)Reply

The parameter populates a maintenance category Category:Entries using the taxlink template. I use the category to keep track of taxonomic entries that are changed, without having to further clutter my watchlist. I also use it to check on vernacular-name entries that use {{taxlink}}, mostly to see whether the name is spelled correctly and is the one currently applicable. As a result, there is no harm and some benefit from not including it. Among other things I learn how little interest there is in editing taxonomic name entries. DCDuring TALK 14:24, 13 November 2016 (UTC)Reply

Non-italicized elements of nomenclatures

[edit]

Would it be possible to tweak the template so that some elements of a nomenclature can be specified not to be italicized? For example, at subspecies, Pinus nigra subsp. salzmannii should appear as Pinus nigra subsp. salzmannii. However, {{taxlink|Pinus nigra subsp. salzmannii|subspecies|Pinus nigra ''subsp.'' salzmannii}} does not achieve that effect. — SGconlaw (talk) 18:12, 29 July 2017 (UTC)Reply

@Sgconlaw: Module:italics can generate the correct italicization automatically, so maybe it should be invoked to generate the displayed form if the rank is species, subspecies, variety, form, and whatever else is italicized. — Eru·tuon 18:27, 29 July 2017 (UTC)Reply
How would that be done within templates {{taxlink}} and {{taxoninfl}}? DCDuring (talk) 19:52, 29 July 2017 (UTC)Reply
OK, I have no idea how to invoke a module. See my pathetic attempts in the history of the template. — SGconlaw (talk) 19:58, 29 July 2017 (UTC)Reply
I think I fixed it. See my testcases. — Eru·tuon 19:59, 29 July 2017 (UTC)Reply
Thanks! — SGconlaw (talk) 20:00, 29 July 2017 (UTC)Reply
As to invoking modules: the word directly after export. is the function name, which you place in the first parameter of {{#invoke:modulename}}: {{#invoke:modulename|function name}}. So you don't include the export. part. — Eru·tuon 20:03, 29 July 2017 (UTC)Reply
@DCDuring: You can see how I inserted it. If the rank is specified as species, subspecies, variety, use the module with {{#invoke:italics|i|taxonomic name}}. (I modified Module:italics to make it possible.) — Eru·tuon 20:03, 29 July 2017 (UTC)Reply
It should also be specified as italics for genus, subgenus, form, section, subsection (the latter four respectively abbreviated subg., f., sect., subsect.). There may be others, but I'll try to follow the pattern as I discover more. There are many, many thousands of improper non-italicized instances of {{taxlink|XXX|genus}}.
And thanks for doing the work. I hadn't noticed (remembered?) when you first created the module. DCDuring (talk) 20:11, 29 July 2017 (UTC)Reply
I saw where to add such not-to-be-italicized text (etc.) in module:italics. DCDuring (talk) 20:19, 29 July 2017 (UTC)Reply
Excellent. I added those ranks to the "switch" statements in this template. — Eru·tuon 20:37, 29 July 2017 (UTC)Reply
You were a participant in the discussion during which I created Module:italics, back in October. But I don't think it ever got implemented anywhere till now. — Eru·tuon 21:13, 29 July 2017 (UTC)Reply
Glad to have helped you guys find a use for it! — SGconlaw (talk) 21:16, 29 July 2017 (UTC)Reply
See Category:Entries using missing taxonomic names, the subcategories of which show the various taxonomic ranks and rank-like words used in {{taxlink}}. There are some which are sufficiently rare that I don't remember or never determined the proper display. DCDuring (talk) 22:12, 29 July 2017 (UTC)Reply
Another complication is that ALL taxa of viruses are italicized, except for Virus itself. It is governed by a separate body: the International Committee on Taxonomy of Viruses.
@Erutuon: And lastly - probably also most annoyingly - {{taxlink}} defaulted to italics, so that all uses of {{taxlink}} above genus rank are enclosed in wikitext 's. As a result, at present all taxa above the rank of genus appear italicized if used in {{taxlink}}. See Cyprinodontidae#Hyponyms for some examples. I think we should allow for the legacy behavior until we make all the changes using AWB or, better, a bot. We could code {{taxlink}} for proper functioning with the legacy bad practice. Does it pay to deprecate the old approach and have another template to cleanly implement the new, more desirable approach. DCDuring (talk) 22:33, 29 July 2017 (UTC)Reply
Regarding virus names, perhaps a parameter |virus=1 would be a good idea. Regarding the switch to new behavior, how about creating {{taxlink-new}} with the auto-italicization and reverting {{taxlink}} back to its old behavior? — Eru·tuon 22:37, 29 July 2017 (UTC)Reply
One other aspect of the taxonomic codes is that they would like taxonomic names to be italicized when in normal text and in a contrasting text style like simple non-italic when in italicized text. I've never gotten the implications for embedding {{taxlink}} with no other text in format templates that would have other text normally appear in italics.
I was hoping there was some way of having my cake and eating it too and also eating it tomorrow. That is, I would like to be able to use {{taxlink}} as the name for the currently coded template AND not have to change template names or behavior for the current uses of {{taxlink}} (also no new parameters). As previously implemented, {{taxlink}} was relying on using wikitext italic formatting as operating compatibly both inside and outside {{taxlink}}. All of the usage of {{taxlink}} that I consider to have been proper work this way. All super-generic taxa have depended on it. I don't see how this could be done an strongly suspect that it is logically impossible. But, sometimes I am surprisedawestricken by ingenious software solutions. DCDuring (talk) 23:05, 29 July 2017 (UTC)Reply
Wait, wait. Could instances of template {{taxlink}} for supergeneric taxa could retain the legacy approach, with the new behavior limited to genera and below? I see that viruses would require a new parameter. {{taxlinknew}} could be the new proper coding. I add most of the new instances of {{taxlink}} and monitor the rest so effective replacement of the {{taxlink}} by {{taxlinknew}} for new entries. DCDuring (talk) 23:20, 29 July 2017 (UTC)Reply
Yeah, you can use {{#switch:}} to determine which ranks use which output format. That's what I've already done to determine which ones should use Module:italics. So I gather the old behavior was that every rank was italicized in the template, and un-italicized by surrounding the template with italics ('' '')? I suppose then I could just add italicization as the default (for those not using the module). — Eru·tuon 23:39, 29 July 2017 (UTC)Reply
YES.
I sort of knew that my crude template methodology would eventually bite me. In fact, it had already bitten me in that de-italicizing doesn't work inside most templates that italicize by default. I've often wonder why such templates should override formatting embedded in the their text or, at least, my template-implemented formatting. DCDuring (talk) 00:54, 30 July 2017 (UTC)Reply

Certain ways of showing taxonomic names of hybrids

[edit]

See [[Orchidinae]] for problematic display of nothogenera. The display contains a spurious " ' '" and is bold (linked problems). I haven't looked at nothospecies etc. yet, so don't undertake too much. DCDuring (talk) 16:30, 31 December 2017 (UTC)Reply

Cultivar, variety

[edit]

I believe you did use the template correctly, @Victar, but this template is insufficient. Apart from it not recognizing cv., var., conv., which can be added to the template code, it seems that cultivar names like Vitis vinifera ssp. vinifera Sultana syn. Sultanina should not be italicized, while variety and convariety names should be italicized, eg. Prunus domestica ssp. insititia var. pomariorum alias Prunus domestica ssp. insititia convar. pomariorum. In the end this template will have to be modulized. Fay Freak (talk) 09:49, 17 November 2020 (UTC)Reply

Fortunately, these are few and are likely to remain few for quite some time. Estimates are that there are 8.7 million species of plants and animals (Fungi? Chromists? Protozoa? Bacteria? Archaebacteria? Viruses?). 1.2 million of the plants and animal species have been described. We have about 5,300 species from all 7 of these "kingdoms". I am not expected that we will have more than 100 cultivar names in the next decade in the normal course of things (ie, no contexts or dares). DCDuring (talk) 21:20, 24 February 2024 (UTC)Reply

Convert to Lua

[edit]

@DCDuring, since the template is already invoking Lua to handle the italics, I don't think there's any performance or memory penalty for converting the whole function to Lua. Additionally, using Lua, it's not "expensive" to check the existance of the page, so it would no longer be necessary to manually remove {{taxlink}} when the page exists. See User:JeffDoozan/taxlink for a potential Lua-based replacement with lots of tests to verify that the old and new templates function identically. I've tried to add real usages of all possible parameter combinations, but please add any additional tests if you think it's could be missing something. Using Lua opens the possibility of adding some more advanced features such as checking that Translingual exists on the target page, in case anything like that would be worthwhile. What do you think? JeffDoozan (talk) 17:26, 24 February 2024 (UTC)Reply

Most importantly, I would like to keep the ability to compile lists of "wanted" taxonomic and English vernacular name pages ordered by the number of incoming links. Now, I run Perl scripts against the XML dumps from time to time and add the most "wanted", counting the number of taxlinks or verns for each name. The automatic formatting is fine to have, but I would still need a way to indicate which taxlinks were no longer "wanted" because the "wants" had been fulfilled. Yes, there are Latin and German capitalized noun pages that are homonyms for taxonomic names. I don't know about other languages. I should do something similar with {{epinew}} and {{epilang}} to make sure that the most common missing ones are added.
If I were to think big, I would also like to run taxlink items against accepted taxonomic names and synonyms from the Catalogue of Life to catch some of the spelling errors and changes of acceptance, both. from 'accepted' to 'synonym' and vice versa. DCDuring (talk) 21:10, 24 February 2024 (UTC)Reply
@DCDuring: Are you saying that right now you use the existance of {{taxlink}} in the XML dumps as a signal that it's a "wanted" entry so if we start keeping {{taxlink}} around after the target entry exists, you won't be able to use that tool anymore? Did you write that script yourself? If so, could you adapt it to build a hash of all the page names that include ==Translingual== while it's scanning the XML and use that to ignore "completed" taxlinks? If not, I've already written a bunch of scripts that generate reports from every XML dump, I could easily write another one to generate a report of wanted taxlinks.
If we switch to the Lua template, it can verify that the target page exists and includes ==Translingual== or ==English==, which seems easier than doing something manually like {{eipnew}} or {{epilang}} (I'm not familiar with those templates or the process of using them).
Validating taxlinks against the Catalog of Life dataset would be possible using offline scripts, since they offer downloads of the database. Thinking even bigger, it might be possible to use their database to detect italicized species names and apply {{taxlink}} so that all of our species links are nicely labeled. JeffDoozan (talk) 22:20, 24 February 2024 (UTC)Reply
I count the instances of {{taxlink|Taxonomic name|rank}} to get "wanted" pages, ordered by number of wants. Similarly for {{vern}}. Once upon a time I could use Special:Wantedpages for a few, but now those pages are clogged with so many items that will remain red for this century. I have outlasted most of those who provided technical help, so I like something that is simple enough so that I can run the procedure myself with my very limited technical chops. DCDuring (talk) 22:51, 24 February 2024 (UTC)Reply
That's reasonable. Do you have the technical chops to adjust your script to account for pages that exist? If not, would you be open to help adjusting your existing script so you can continue to run it yourself? I understand that this template is a central part of a very long-established process that you've successfully run for many years. It's amazing that there are 46,000+ {{taxlink}}s that you've almost single-handedly added. From the perspective of a relative newbie, it seems like it would be a win for the project if we can retain the information that you put into every {{taxlink}}, with the added bonus of saving you from having to manually remove it. JeffDoozan (talk) 23:22, 24 February 2024 (UTC)Reply
As for validating against Catalog of Life: it has its uses, but every taxonomic source has its own version of what's accepted. There are taxonomic codes that spell out in great detail how to determine if a name is validly published, spelled correctly, etc., and how to decide which name has priority. Beyond that, it takes expertise in the taxonomy of the particular field to determine whether a particular name is describing the same taxon as another name, and advances in the science (especially the advent of molecular biology) have tended to overturn those judgments at an astonishing rate in recent decades. Comparing the taxonomy of Wikispecies and Wikipedia shows quite a bit of disagreement, and most of the reference works that give taxonomic names for specific terms are completely out of date in comparison to those two. I have just enough background and just enough references bookmarked to figure out what the outdated names are referring to most of the time, so I help out where I can. DCD doesn't have any background, but he still manages to come up with something useful and worthwhile. Chuck Entz (talk) 04:54, 25 February 2024 (UTC)Reply
I only intend for CoL to be used spell-checking and not even to be the definitive source for that. I have learned which taxonomic databases tend to be more authoritative and/or current: PoWO, WoRMS, LPSN, ICTV, MycoBank, ~MSW, ~NCBI. WP tends to be better than Wikispecies, but Wikispecies is more convenient for copypasta. DCDuring (talk) 16:02, 27 February 2024 (UTC)Reply
Thank you for the clarification and explanation, Chuck. I should have known that with something as huge as taxon names that's been ongoing for so many years that it wouldn't be just DC doing all the work (just, as you say much of the hard work!). JeffDoozan (talk) 16:39, 25 February 2024 (UTC)Reply
I would probably just rely on links being blue. I believe that 46K is the number of pages with {{taxlink}}. Many pages have more than one instance. There are about 96,000 distinct taxonomic names in instances of {{taxlink}}. I'm lucky if I get 50 entries added in a month. When others help, they invariably add stub entries and mostly for items with few incoming links. I try to improve our existing entries to make them less stubby, which is much more demanding that removing 500 instances of {{taxlink}} in a month. I try to add images, gender, etymology, hyponyms, hypernyms, derived terms (mostly for genera), links to external databases (not just sister projects), etc.
As 100K taxonomic names (of all ranks, all kingdoms) is a small percentage of the 1.2MM described species of plants and animal (no fungi, chromists, protozoa, archaebacteria, bacteria, and viruses), you can see that some prioritization is essential.
What might be useful would be to add enclose all instances of taxonomic names that we have entries for (ie, that use {{taxon}}) in {{taxlink}}. It should be easy to extract the required info from the entries, even stubby ones. The instances of conflicting rank (Is a taxon a class, an order, a suborder, an infraorder, a clade, etc.?) would need to be manually reconciled. DCDuring (talk) 15:57, 25 February 2024 (UTC)Reply
"I would probably just rely on links being blue" I'm not sure what you're referring to, can you clarify this?
By which I mean that, when it put the wikiformatted list on one of my user subpages, those items that already had entries would appear as blue links, so I would direct my attention to the red ones. DCDuring (talk) 20:03, 25 February 2024 (UTC)Reply
"What might be useful would be to add enclose all instances of taxonomic names that we have entries for (ie, that use {{taxon}}) in {{taxlink}}. It should be easy to extract the required info from the entries, even stubby ones." This is interesting, can you give me an example of what this would look like? JeffDoozan (talk) 16:39, 25 February 2024 (UTC)Reply
For each taxonomic name for which we have an entry, {{taxon}} and the entry it appears on have the required information to (re)generate {{taxlink}} (or a simpler format-only version). For each such entry {{PAGENAME}} has parameter 1, the taxonomic name, and parameter 1 of {{taxon}} has what is needed for parmeter 2 of {{taxlink}}. One the properly parameterized instance of {{taxlink}} is created, it can replace all instances of the taxonomic name, whatever their formatting or linking.
For the vast majority of pages with {{taxon}} there is only one instance of {{taxon}}. For most of those that have more than one, there is probably a real ambiguity that requires manual attention. The ambiguities are usually one of three kinds:
  1. The taxon has been used with different ranks, eg, class, subclass, order, but consisting at least approximately the same organisms, requiring a decision as to what its default presentation should be.
  2. The taxon has been applied to different sets of organisms, eg, plants, animals, fungi, protists, chromists, bacteria, archaebacteria, viruses.
  3. Two or more instances of {{taxon}} referring to the same set of organisms differ only in where they are placed (eg, different families or a family vs. order), but have the same 'rank'. This case may be addressed in code where the instances of {{taxon}} appear in different subsenses, but it is almost certainly true that this would not be worth the extra coding.
Another complication is that all taxa, of any rank, in Archaebacteria, Bacteria, and Virus are italicized. At present, I am not sure that we have systematically presented content in all of these. In principle, there should be a named parameter in {{taxon}} that is set as "i=1" for all of these. I could make an effort to add "i=1" where it is needed using wording in the definition or in hypernyms or hyponyms. DCDuring (talk) 20:03, 25 February 2024 (UTC)Reply
So, just to take a random example, the page Capra aegagrus hircus contains a single {{taxon}}: {{taxon|subspecies|family|Bovidae|domestic goat, formerly and sometimes Capra hircus}}. Using that data we can generate {{taxlink}} (or equivalent) as {{taxlink|Capra aegagrus hircus|subspecies}} and then use that to replace the text "Capra aegagrus hircus" (with or without italics or links) anywhere it appears in the mainspace? This sounds almost too good to be true! With over 20,000 pages containing {{taxon}}, even after discarding any pages with multiple {{taxon}}s, this should make it relatively painless to re-apply many of the {{taxlink}}s. What exactly should be done with in the case of Archaebacteria, Bacteria, and Virus? Can they use {{taxlink}} as-is, or would it need to be expanded with something like a |v=1 flag? JeffDoozan (talk) 21:37, 25 February 2024 (UTC)Reply
That is how it could work for the most common cases. Archaea, Becteria, and Viruses need i=1, but the current version of {{taxlink}} doesn't do anything with that parameter. Though in principle we could derive the proper italics from data that would appear in a full taxonomic entry, many of the entries lack the key data, so my manually adding "i=1" to {{taxon}} for those kingdoms is quicker than completely filling out the entries. I have been working on some of the virus and archaea entries to make sure that they have i=1 in {{taxon}} and {{taxoninfl}} (the headword template). I have begun working on bacteria with the same objective, but came across some major name changes that have slowed the process. There are only a couple of hundred more to work on on my current list, but there may be an equal number that are not on that list and would take longer to identify. Not having the italics would sad be for the higher taxa in these three kingdoms, but would not take long to correct manually. DCDuring (talk) 02:24, 26 February 2024 (UTC)Reply

──────────────────────────────────────────────────────────────────────────────────────────────────── As of the 2/20/2024 XML dump, there are 20,836 {{taxon}}s on 20,160 different pages. 626 pages have two {{taxon}}s, and no pages have more than two. The only {{taxon}} that appears outside of English or Translingual is ꠍꠁꠔꠦ ꠝꠞꠣ. Here's a list of all the pages with multiple taxons in case that's helpful for you. And here's a long list of all taxons, with the regenerated taxlink and a large sample of the ~38,000+ proposed string replacements where the bot can replace an existing string with a {{taxlink}}. This is still a work in progress as it currently matches taxons inside filenames and other places where we don't want to make replacements.

Thanks a lot. I will try to take a look after I get back from a dental visit today. DCDuring (talk) 15:45, 27 February 2024 (UTC)Reply

For the proposed replacements, I limited it to taxons that contain a space to avoid matching terms that are probably not taxons like This, Paris, Arizona and Satan. I'm not sure if there's a good way to determine whether or not to convert a single word to a taxlink - maybe only if it's already linked and in italics and is on a line containing specific keywords? I haven't revewied the proposed replacements to see if there are any bad two word combinations that shouldn't be replaced with taxlink.

What might work in many circumstances is to detect which capitalized single-word candidate terms have only a Translingual L2. DCDuring (talk) 15:45, 27 February 2024 (UTC)Reply
@User:JeffDoozan. I finally looked at your proposed replacement. You apparently selected parameter 2 instead of parameter 1. That is:
{{taxoninfl|Rosa canina|i=1}}
  1. {{taxon|species|family|Rosaceae}}
should yield {{taxlink|Rosa canina|species}}, not {{taxlink|Rosa canina|family}}. DCDuring (talk) 23:43, 27 February 2024 (UTC)Reply
Good catch. All of the previous links are now updated now with the correct data. JeffDoozan (talk) 01:28, 28 February 2024 (UTC)Reply
This is a good idea. I included single-word taxons and filtered out taxons with multiple L2s so there now ~45,000 proposed replacements. I updated the sample with the new results. JeffDoozan (talk) 21:31, 27 February 2024 (UTC)Reply

What should {{taxlink}} do if |i=1? Skip the call to italics.i and just italicize the entire string?

AFAIK at this time there are no non-italicized elements in accepted taxonomic names above the rank of genus, except for those in Virus, Bacteria, and Archaebacteria. DCDuring (talk) 16:07, 27 February 2024 (UTC)Reply

You mentioned earlier that you add a lot of hyper/hyponyms. Would some sort of {{taxhyper}} and {{taxhypo}} templates be helpful? If so, how should they work? JeffDoozan (talk) 22:38, 26 February 2024 (UTC)Reply

Possibly. If entry-existence testing is really cheap, some repetitive typing could~ be eliminated. Even if it is expensive, 'subst'ing such a template would usually save keystrokes and only be expensive once, not every time the entry page is loaded. DCDuring (talk)
I believe checking entry-existence is cheap enough that we can use it without worrying about memory or speed. I tested a page with > 5,000 unique Lua {{taxlink}}s referencing pages that exist (which I think is a slightly more expensive check than pages that don't exist, since it returns some data about the page) and it worked fine. If it turns out to be too expensive on certain pages, it's easy to adjust. It may even be possible to check existence or do other aggressive validation only on the page preview and not on the live page, which could be helpful. JeffDoozan (talk) 21:31, 27 February 2024 (UTC)Reply
I have gone through (all/most/many?) of the entries in the "kingdoms" Archaea, Bacteria, and Virus and included "i=1" in {{taxoninfl}} in all of those that ought be italicized, but would not ordinarily be done by {{taxlink}}, ie, those above the rank of genus. It was something that needed to be done just to fix the entries. There are obsolete/archaic/data "taxa" that are arguably for archaea, bacteria, and viruses (note lower case). As they apparently predate the official codes for these kingdoms, italics should not apply. If I am wrong, it wouldn't be too hard to correct the dozen or so entries of this type.
This means that {{taxoninfl|i=1}} is a rather reliable indicator of suprageneric taxa that should be italicized. DCDuring (talk) 23:30, 27 February 2024 (UTC)Reply
  • @User:JeffDoozan
    1. "Unsafe": Any link from a Translingual-only entry to one of these is very, very, very likely to be safe. There is a possibility that links originating in an Etymology section of a taxonomic entry could be to a proper noun (proper noun of a personal name (any language), placename (any language), or name of classical Latin individual, family, deity) or to a German noun.
    2. Multiple {{taxon}}s: Since what we most need to do is get italics right, any of these where all of the {{taxon}}s in a given entry show only generic or subgeneric ranks can be assigned the highest rank, usually 'genus'. This is particularly true where either the {{taxon}}s appear as subsenses and have the same rank (parameter 1). Similar rules can apply where the link originates from a Translingual L2 that is only Archaea, Bacteria, or Virus (where i=1 is appropriate) or where there is no Archaea, Bacteria, or Virus (where i=1 would be wrong). These latter things may be too complicated at this time. DCDuring (talk) 15:54, 28 February 2024 (UTC)Reply
@User:DCDuring
  1. Good idea, matching unsafe taxlinks in all non-etymology sections of Translingual entries successfully adds a taxlink for Paris on lutetianus plus 90 other fixes that would otherwise have been missed.
  2. Can you give me a list of what strings are generic or subgeneric ranks, ordered from highest to lowest? For reference, all of the existing ranks in our taxons (as of 2/20 data export) are "clade; class; cohort; division; epifamily; family; form; form classification; form genus; genus; grandorder; group; ichnogenus; informal group; infraclass; infradivision; infrakingdom; infraorder; infraphylum; kingdom; magnorder; morph; morphological group; nothogenus; nothospecies; oogenus; order; parvorder; phylum; section; series; serovar; species; strain; subclass; subdivision; subfamily; subgenus; subgroup; subkingdom; suborder; subphylum; subsection; subspecies; subtribe; superclass; superfamily; supergroup; superorder; superphylum; supertribe; taxon; tribe; variety"
The ordering is: genus, subgenus, section, subsection, species, subspecies, variety. What matters most are genus, subgenus, section, and subsection because some of our entries have them as one-part short forms where they can appear on the same page as a genus name. DCDuring (talk) 02:44, 29 February 2024 (UTC)Reply
  1. I think we're close to being able to run this. There will be a new data export this weekend that will include all of your recent |i=1 work, so the generated taxlinks will have all of that information. Can we use {{taxlink}}, or is it better for your process to use a new template? If you'd prefer a new template, what would you like it to be named? JeffDoozan (talk) 22:19, 28 February 2024 (UTC)Reply
It would be simpler for me if we used a new template that differed in only a few characters from {{taxlink}}, eg, {{taxfmt}}. You see, the idea that I wouldn't be visiting entries that have {{taxlink}} as the taxonomic name enclosed in the template is made an entry is not realistic. I don't want to have to run all instances of {{taxlink}} against all instances of {{taxon}} and/or {{taxoninfl}} whenever I am generating a frequency-weighted list of "wanted" taxonomic names. The creating of new taxonomic entries for items that are "taxlinked" is a kind of special-purpose watchlist that bypasses the capacity problems of regular watchlists. DCDuring (talk) 02:44, 29 February 2024 (UTC)Reply

@DCDuring, I switched {{taxlink}} to Lua and, as a stress-test, previewed Ixora with the HTML comments removed from the big list of {{taxlink}}s. Even with 532 {{taxlink}}s, it renders at the same speed and, more importantly, works fine even with > 500 taxlinks on the same page. I also created {{taxfmt}} and applied it to tinami, petrello, plum, 癩菌, and duck. Please review those diffs and let me know if you see anything that should be adjusted before I apply {{taxfmt}} to more pages. JeffDoozan (talk) 16:48, 2 March 2024 (UTC)Reply

At tinami#Finnish you did not apply {{taxfmt}} to 7 taxa that were genera, all seemingly unambiguous cases.
At duck#English you missed Dendrocygna.
At 癩菌 you applied a redundant "i=1".
The other two seemed fine as far as your work is concerned, but, as usual, there are plenty of other things that need changing, like missing parentheses (by my lights, not policy) and missing {{vern}} templates for the numerous redlink vernacular names at tinami. No automatable remedy for that short of mining WP, Wikispecies, Wikidata. And polysemy is common in 'vernacular' names. DCDuring (talk) 01:48, 3 March 2024 (UTC)Reply
@User:JeffDoozan Have I neglected to answer any questions that you've asked? Do you have new ones? I like your tests and I can't think of other good ones, but I'll not be too surprised if we uncover a good number (but a small percentage) of problems. There do seem already to be things not formatted as I think they should be, like the peculiar fragment in one of the last names in tinami. There was also a missing space before a following sp. or spp. — and that's just in the five entries so far.
BTW, I really like that we have picked up a lot of unlinked taxa. I also really look forward to, in effect, spell-checking our taxa against Catalogue of Life and applying {{taxlink}} to any instances of those that are not linked. I suspect that my big list of taxlinked taxa has a pretty good number of no-longer-accepted synonyms as well as misspellings and misformattings, especially among the unique ones. I regularly find such in reviewing entries, often derived from old dictionaries. DCDuring (talk) 03:57, 3 March 2024 (UTC)Reply
@DCDuring: Thank you for the very careful review of the edits. I didn't expect you to also check for places it didn't make changes or I would have picked smaller pages. Since this is going to make a lot (56,000+) of replacements, I made the matching very, very conservative and will progressively loosen it on muliple runs. The missing fixes on duck and tinami was part of an overabundance of caution on my part where it would discard any potential 'short' matches if a 'longer' match existed on the page: ie, it would never replace Dendrocygna if Dendrocygna guttata also exists on the page. With careful matching of balanced opening/closing italics and brackets, this is no longer needed.
  • On the first run, it won't make any replacements inside templates, [[File: ]] links, html comments or nowiki tags. It will also not make a replacement if the "matched" text contains an unequal number of opening and closing brackets, or if the number of "'" at the start of the match does not equal the number of "'" at the end of the match. Rejected matches are logged here.
  • After the first pass fixes the "easy" matches and nobody complains that it broke their pages, I'll run it again to make replacements inside inside a manually curated list of templates only when it is replacing an italicized link.
  • If that goes well, I'll let it make replacements inside the allowed templates when it's replacing text that is either linked or italicized. This should still be pretty conservative, but has the potential to cause unexpected problems, so I'll pay closer attention to these matches.
  • Finally, when it's down to a number of matches that can be manually reviewed, I'll let it make replacements inside any template when it's replacing is an italicized link.
  • After that, I'll rely on you to let me know if there are other places where it should make automatic matches.
The current list of allowed templates is ["col-auto", "col2", "col3", "col4", "col4", "der2", "der3", "der4", "der5", "l", "quote-journal", "quote-book", "trans-top", "ja-r/multi", "ja-r/args", "gl", "gloss", "coi", "syn", "m", "ngd", "cog", "q", "syn of", "synonym of", "qual", "qualifier", "trans-see", "obs form", "obsolete form of"]. Matches inside these templates are not included in the error report as rejected matches. If you look through the error report and notice other templates where it should always be safe to make replacements, please let me know. Likewise, you think any templates on my list that not be safe for unsupervised replacements, please let me know.
Questions:
  • Did the bot introduce formatting mistakes in the sample edits? I don't see anything related to the peculiar fragment in one of the last names in tinami. There was also a missing space before a following sp. or spp in the bot's edits.
    I misremembered: it was at petrello. DCDuring (talk) 15:37, 3 March 2024 (UTC)Reply
    @DCDuring: I still don't see any errors in the bot's edit on petrello. I can see that you made some edits afterwards to add {{taxlink}} where the bot didn't match because "sp." or "cf." was inside the italicized text, and if there are easy rules for handling that, I can add them to the bot. Is there anything wrong about the changes the bot made that I need to fix before running it on more pages? If not, I'll run it on a larger sample of 50-100 pages and we can look through that for any other concerns. JeffDoozan (talk) 17:03, 3 March 2024 (UTC)Reply
    They were just instances where the bot didn't everything I would do when editing. Locating and correcting various bits of non-conformity would be useful to one trying to bring these to a uniform high standard. DCDuring (talk) 21:27, 3 March 2024 (UTC)Reply
  • On 癩菌 it applied |i=1 because the {{taxon}} on Mycobacterium leprae contains |i=1. Is there additional logic it needs to use to decide whether or not to use |i=1?
    There is no harm from leaving it for species or genus, except that it provides a bad model for other entries. If parm2 is genus or lower (genus, subgenus, section, subsection, species, subspecies, subspecies, form, variety), then "i=1" is unnecessary. If parm2 is subgenus, section, subsection, species, subspecies, subspecies, form, variety, then it is potentially harmful as "i=1" is supposed to override the formatting logic that is rank-dependent. DCDuring (talk) 15:37, 3 March 2024 (UTC)Reply
    I've fixed this, it will no longer add "i=1" if parm2 is subgenus, section, subsection, species, subspecies, subspecies, form, or variety. JeffDoozan (talk) 17:03, 3 March 2024 (UTC)Reply
    The same thing applies to genus, though there is no risk of outright harm. I'm sure that I will be occasionally searching for instances of presence or absence of "i=1" and don't need unnecessary distractions or exclusions. DCDuring (talk) 21:27, 3 March 2024 (UTC)Reply
    Good catch, I added "genus" to the list of exclusions. JeffDoozan (talk) 22:02, 3 March 2024 (UTC)Reply
  • Is it correct that when |i=1 is passed to {{taxlink}} and {{taxfmt}}, the templates should just blindly add '' to the start and end of the text provided, without needing to call Module:italics and without doing any validation or making any other changes to the provided string?
    Yes, which explains previous answer. AFAICR, "i=1" is necessary only for suprageneric ranks in Archaea, Bacteria, and Viruses. DCDuring (talk) 15:37, 3 March 2024 (UTC)Reply
  • Are there any cases where {{l|mul|{{taxfmt|name|rank}}}} should exist, or can I have it delete the "wrapping" {{l}} template when completely replacing the contents of {{l}}? Does the same answer apply to {{m}}? Are there any other templates that can be removed if they're just wrapping {{taxfmt}}?
    I can't think of a case where it should appear within {{l}}. If {{m}} were only used to convey 'mention' vs. 'use', which was often the norm, then it would necessary to allow {{m}} to wrap around {{taxfmt}} and {{taxlink}}. The current discussion (BP?, TR?) shows that folks don't count on it as anything but a formatting tool. The logic behind italics for taxonomic names is that they are supposed to contrast with the surrounding matrix of text . (As suprageneric names were often beyond the reach of the taxonomic codes, they were and are not italicized. The newer codes, for Viruses and Prokaryotes, do apply to higher ranks.) Mentions are also supposed to supply contrast with the matrix text. So in cases where our normal formatting of the matrix puts it in italics, eg, {{ux}}, {{syn}}, and a to-be-contrasted taxonomic name is embedded within, then it should, in principle, not be italicized. I don't really see how we can follow that rule, especially since many such instances would have relatively little (or no) matrix. IOW, no {{m}} wrapper either. I get a headache trying to think through all the other templates, where my own practice has been inconsistent. DCDuring (talk) 15:37, 3 March 2024 (UTC)Reply
  • JeffDoozan (talk) 14:46, 3 March 2024 (UTC)Reply
    @User:JeffDoozan I appreciate how careful and incremental your approach is. I think we are ready for your proposed next implementation of 50-100. DCDuring (talk) 21:27, 3 March 2024 (UTC)Reply

50 random pages

[edit]