Module talk:etymon

Manual transliteration

Latest comment: 8 months ago3 comments3 people in discussion

@Ioaxxere Is there a parameter for manual transliteration when the automated transliteration is incorrect and needs to be corrected? If examples of this are needed, some of them may found be in:

CAT:Terms with non-redundant manual transliterations by language

For example, if an etymology tree were to be displayed at Gujarati ઈદ (īda), the automated transliteration īda would have to be corrected to īd. Such a parameter may also be useful for terms without automatic transliteration such as at برابر. Kutchkutch (talk) 17:43, 31 May 2024 (UTC)Reply

@Kutchkutch There is no parameter for that currently. It would be better to not duplicate information between different pages or templates. Ideally, lang:transliterate would be smart enough to handle these sorts of difficult cases (Chinese transliteration in particular seems to work very well). Alternatively, the module could scrape the page and look for a |tr= parameter. @Theknightwho what do you think? Ioaxxere (talk) 18:31, 31 May 2024 (UTC)Reply

@Ioaxxere We can't automate transliteration en masse like that, or it would've already been done. I'd say the solution is to include a tr= parameter, but each {{etymon}} template would only need to include the transliteration for the next level up, if that makes sense. Pinging @Benwing2, who may have thoughts. Theknightwho (talk) 19:36, 31 May 2024 (UTC)Reply

Ideas for tracking categories

Latest comment: 8 months ago1 comment1 person in discussion

For some of these, it might be better to throw an error rather than silently accept invalid input.

|1= is an etymology-only language code (should never happen, I think).
Bad IDs in the etymon parameter, e.g. en>word>fakeid is set as an etymon but word#English:_fakeid doesn't exist.
Improper use of keywords, e.g. inh is set on a etymon whose language isn't an ancestor of the current language.
Special characters in |id=. In particular, > (the etymon separator) should probably not be allowed. Also, excessively long or short IDs shouldn't be allowed.
Trees with a height of less than 3 (there's no reason to create a tree of height 1 or 2).
Redundant |title= parameter.
Redundant keywords, e.g. in |inh|bor|etymonparam the inh is completely ignored.

Ioaxxere (talk) 05:31, 3 June 2024 (UTC)Reply

Categorization by roots

Latest comment: 5 months ago7 comments2 people in discussion

@Ioaxxere: Is it possible for this template to allow entering roots on an entry, such that when {{etymon}} is used on a descendant entry it automatically gets the root categorization same as the ancestor entry? Svartava (talk) 10:21, 6 September 2024 (UTC)Reply

@Svartava: I'm thinking something like:

{{etymon|ine-pro|id=something|pos=root}}

which results in every descendant automatically categorizing itself in "terms derived from the Proto-Indo-European root X" (currently there is no such categorization; see here and here). Other values of |pos= could be: prefix, suffix, interfix, infix. I think this would be a much more robust solution as opposed to my initial implementation which makes guesses based off of hyphens in the term itself. This would also mean that a ===Prefix=== and ===Root=== section would need two separate {{etymon}} calls, even if they were in the same Etymology section, although I don't think this situation comes up very often (if at all).

Putting aside the editorial debate of *when* categories should be added (which can easily be tweaked as necessary), would this be a good implementation from a technical standpoint? (@Theknightwho) Ioaxxere (talk) 04:05, 7 September 2024 (UTC)Reply

@Ioaxxere: I assume this will categorize derived terms of the root within PIE as well and also the descendants of the derived terms by the root, right? Svartava (talk) 05:01, 7 September 2024 (UTC)Reply

@Svartava: Yes, all the descendants can be categorized no matter how deep they are. Ioaxxere (talk) 05:07, 7 September 2024 (UTC)Reply

@Ioaxxere: That would be very nice, but for compound words, categorization by root is not preferred (at least among languages I work on), so can it be avoided somehow? Svartava (talk) 05:29, 7 September 2024 (UTC)Reply

@Svartava: Yes, but I think any system like this should be implemented across languages, rather than a single language opting out of categorization. Also, what is a compound exactly? Should PREFIX+ROOT+SUFFIX+SUFFIX still be categorized under the root? Ioaxxere (talk) 05:41, 7 September 2024 (UTC)Reply

@Ioaxxere: I originally meant that we should be able to have something like |rootcat=ine-pro:ROOT1,sa:ROOT2 (formatting doesn't matter) that can be input on an ancestor page's {{etymon}}, and the the direct descendants of that ancestor would get categorized by the roots ROOT1 and ROOT2 by {{etymon}}; I don't think this convention is language specific. Otherwise, I can't generalize anything for PREFIX+ROOT+SUFFIX or any combination, I have seen both types of cases, some where I want categorization by root and some where I don't. Svartava (talk) 08:27, 7 September 2024 (UTC)Reply

Questions

Latest comment: 4 months ago6 comments2 people in discussion

@Ioaxxere:

Is it possible for the inheritance chains to go back to the last non-proto language? This is frequently done on entries, especially for in etymology sections of modern languages.
Should |id= be always needed even when there is only one etymology section in the entry?

--Svartava (talk) 12:49, 17 September 2024 (UTC)Reply

@Svartava: 1) Yes, I could add that if people specifically want to use that with etymon, although the text feature is a bit experimental at the moment. 2) Yes, the ID system isn't really needed in the case of single-etymology entries, but it makes the template more robust in cases when we need to add an etymology section or maybe move an entry to another page. I don't think it's that hard to come up with IDs — just write any word vaguely related to whatever you just added — but maybe there could be a gadget which helps you come up with IDs and also help you quickly add the template to multiple pages. Ioaxxere (talk) 03:10, 18 September 2024 (UTC)Reply

@Ioaxxere: Regarding IDs, it is hard to type it out on every page and always keep in mind which ID was asigned especially for multiple senses under the same etymology, so that's why I think it would be nicer if it can be avoided for obvious cases where there is almost zero probability that there would be another etymology section in future. And in case there is, we could probably just type it out later when the other etymology section is being added and update all associated pages, like we would have to do anyway under the present system. If IDs were not mandatory parameters then adding {{etymon}} to a number of pages (e.g. an inheritance chain where all words mean the same thing) would be quite convenient. Svartava (talk) 06:05, 18 September 2024 (UTC)Reply

Also, can you enable parameters like |t1=, |t2=; |tr1=, |tr2=; |g1=, |g2=; |lit1=, |lit2=; |alt1=, |alt2=, etc. which are present in all etymology templates? That would be very nice. Svartava (talk) 07:10, 18 September 2024 (UTC)Reply

@Svartava: There was a discussion on changing other parts of the syntax a few months ago here but it seems like nothing came of it — I have some ideas myself, but it would need some coordination with bot operators to change all the existing uses. I think the parameters you mentioned are added to many templates via some kind of standardized system; I'll have to check with @Theknightwho. Only |trN= and maybe |altN= are really necessary though. Ioaxxere (talk) 20:24, 18 September 2024 (UTC)Reply

@Ioaxxere: Not sure how the proposed changes would require bot edits, since tr, alt, t, etc. would just be optional parameters just for displaying (in the tree/text) so I don't think there would be any change(s) required on other pages. Inline parameters can work as well, like: LANG>TERM>ID<alt:DISPLAY><t:MEANING> as in {{desc}}, {{altform}}, etc., if that is clearer/easier to implement. Svartava (talk) 21:05, 18 September 2024 (UTC)Reply

Language families

Latest comment: 4 months ago2 comments2 people in discussion

@Ioaxxere: Will it be possible to enable {{etymon}} to allow entering a language family code, like at Sanskrit चिबुक (cibuka)? If possible, it should be entered as FOO>->- (because there is no term that can be entered for a family, for example, {{der|en|ine|term}} ⇒ Indo-European - the term doesn't show up) where FOO is the family code. Svartava (talk) 18:56, 20 September 2024 (UTC)Reply

@Svartava: Maybe you could be able to specify any language or family code without giving the term? That seems like a good idea (a lot of work though...). Ioaxxere (talk) 00:02, 21 September 2024 (UTC)Reply

Specifying how far an etymology text goes back

Latest comment: 3 months ago5 comments3 people in discussion

Is it possible to be able to make the |text= parameter be able to take in details of how far back the etymology text goes back on a page? Very often, we need to go beyond the first step but not till the last step. An example (|text=1 is utilized already, so that won't be preferred, but here is how I would expect this system to work): assuming A is inherited from B, which is from C, which is from P + Q, and we are writing the etymology for "A":

|text=1 - From B.
|text=2 - From B, from C.
|text=3 - From B, from C, from P + Q.

For further |text=4, I don't expect the template to work further like by showing further origins of the words P and Q, and usually if we need to go that further we can just type it out. However the examples like the one for |text=2 would be very convenient and frequently used. – Svārtava (t ɕ) 14:37, 27 October 2024 (UTC)Reply

@Svartava: Thanks for reminding me about this, but I don't see the point at all. Some editors like to write out one step, but who needs to show exactly 2 steps? Or 3? The template is intended to be opinionated (to a reasonable point) rather than allowing unlimited customizability. But if there's a really good reason to implement your proposed suggestion beyond just personal preference then we could definitely add it in. Ioaxxere (talk) 03:35, 12 November 2024 (UTC)Reply

@Ioaxxere: In any case, is this technically possible? If it is, I could take this to BP where it could be known whether editors need it.

An example of this being used is how in modern Indo-Aryan etymologies, most of them go back to Sanskrit and not to PIE (the last step) or just one step back, so I'm sure this has more usage in other communities as well. – Svārtava (t ɕ) 04:16, 12 November 2024 (UTC)Reply

@Svartava:

is this technically possible?

Yes, certainly. But I'm not sure whether that's what you really want. In the case of Indo-Aryan, maybe what you're intending is "go back as far as possible, but stop before reaching a proto-language"? That one would be very reasonable to add in. But by all means take it to the BP if you'd like. Ioaxxere (talk) 05:24, 12 November 2024 (UTC)Reply

Yeah, this would be nice for the very long etymology chains, where we go through steps like (middle New Indo-Aryan) > (early New Indo-Aryan) > (late Middle Indo-Aryan) > (middle Middle Indo-Aryan) > (early Middle Indo-Aryan) > Sanskrit lol —Aryaman^A ^{(मुझसे बात करें • योगदान)} 05:50, 12 November 2024 (UTC)Reply