Wiktionary talk:Etymology/Archive 1
This is an archive page that has been kept for historical purposes. The conversations on this page are no longer live. |
untitled
Hi I have been putting in etymologies as follows:
- For portmanteau words: Brunch: A portmanteau word from breakfast and lunch.
- For foreign language derivations: the root foreign word in italics then a semi colon followed by the meaning in the root language; then a comma followed by the intermediate language in italics semi colon, meaning; and then to English. See for example apostil.
- For disputed etymologies. A note saying that it is disputed see for example Russia.
- For unknown etymologies: A note saying unknown and a possible derivation if it makes sense. See for example moegoe
- For new words the first source I can find with a note saying "earliest known usage..." see for example weekend warrior Regards Andrew massyn
simple question
What is the etymology for cashpoint? hex head wrench? Davilla 16:16, 17 June 2006 (UTC)
- Cashpoint is from cash + point. Phrases do not usually need etymologies, unless the collocation of their constituent words calls for some explanatory comments. In this case it doesn't. Widsith 16:22, 17 June 2006 (UTC)
Formatting
Which parts of an etymology should be italicized? Each form of a word? Only forms which are not linked? Only hypothetical words? – Quoth 04:47, 30 August 2006 (UTC)
- Formatting updated on Wiktionary:Etymology and should be self explanatory now.--Williamsayers79 15:06, 27 February 2007 (UTC)
Policy on mentioning discredited or hoax etymology. How to describe that one meaning is derived from another?
Policy on etymology states: not too verbose.
Is there a policy on mentioning incorrect etymologies?
In the case of words of unknown origin, how should one mention that meaning 2 derives from meaning 1?
Example: Wikipedia has an article on spud that deals mostly with the etymology, refuting suggestions that the word arose as an acronym for "Society for the Prevention of an Unwholesome Diet", which had apparently taken in a number of people. It also mentions that the meaning "potato" derives from the older meaning (given in Wiktionary under spud): "A tool, similar to a spade, used for digging out weeds etc.".
I would like to suggest moving that information to Wiktionary, where I think it belongs, but was not sure of policy -- and how best to word it without being too verbose for Wictionary.
In any case, I thought policy should address such questions.
Boson 22:26, 28 October 2006 (UTC)
- I think wiktionarians would tend to reject that kind of discussion as "encylopedic". Kappa 08:03, 24 November 2006 (UTC)
- Pity. I suspect Wikipedians would reject it as dictionaryish. (Abcedarian?) So it goes. S. Mackie 17:28, 15 November 2007 (UTC)
- I've put this recommendation into the policy draft, together with the suggestion that folk etymologies be discussed on the talk page. I agree that they are bulky and unnecessary, though they are useful to have somewhere.
- Nbarth (email) (talk) 23:31, 7 April 2008 (UTC)
- The Talk page sounds like a great idea for now. Eventually they might need a more structured home, but that will probably be in the second decade of Wiktionary. DCDuring TALK 00:17, 8 April 2008 (UTC)
- Pity. I suspect Wikipedians would reject it as dictionaryish. (Abcedarian?) So it goes. S. Mackie 17:28, 15 November 2007 (UTC)
- I wouldn't. Comprehensive coverage of words -- in all their glorious many-faceted confusingness -- is what we're here for. There's no need to waste space on etymologies that have never been seriously presented, but if an etymology has been presented and discredited, it is reasonable for us to mention that in an Etymology section. If comprehensive etymologies get in the way, that's an argument for changing the layout, not for removing potentially useful information. -- Visviva 01:17, 8 April 2008 (UTC)
Bold vs. Italics
I would like to ask what the reasoning is behind the preference of etymons in italics instead of bolding. The reason why I like to use bold is that it allows the various etymons to stand out a bit more concretely from the rest of the stuff. In my opinion italics does not accomplish this sufficiently, especially when many of the surrounding words, such as the languages, are linked. This emphasis becomes a bit more important with Ancient Greek entries because, in addition to a translation, Ancient Greek requires a Romanization. The end result is a lot of additional crap surrounding the words, and I want the user to be able to see the etymons at a glance. Thoughts on this? Also, a few quick clarifications. We are considering the 2/3 of the English language coming from Old French to be borrowings, and not descents, correct? Is there a consensus on "from" vs. "<"? Most of the examples on the page use double quotations " instead of the single quotes (I can't find the proper key to produce them, nor can I quickly find a page which has them. I'm hoping everyone knows what I'm talking about.), is this a consensus? Finally, I've been doing Romanizations in parentheses, and translations in quotes, does this sound kosher? I'm sure I have more questions, but I can't think of them right now. Thanks Williamsayers79 for cluing me in on this page. Atelaes 22:58, 8 March 2007 (UTC)
- This comes from the traditional way of setting off foreign text (but in the Roman alphabet) in italics in printed books, which is to use italics in a serif typeface. However, this old-fashioned notion does not take into account the fact that some scripts are not meant to be italicized, or that some scripts (such as Cyrillic) have very different and confusing forms in italics, or that italics on a computer monitor, especially in a sans-serif typeface, do not serve to set text off well if at all. On a computer monitor, italics should never be used except in the Roman alphabet when using a serif face such as Times Roman or Garamond. It must be remembered, however, that some scripts also should not be bolded (e.g., Chinese). As for "from" versus <, the < mark is the traditional mark used in etymologies and it makes etymologies readable and attractive. The use of parentheses further makes etymologies much easier to understand. Etymologies that are written with a lot of italics and from’s, but no parentheses, are confusing, unattractive, and often unreadable. —Stephen 23:30, 8 March 2007 (UTC)
Atelaes' bold argument is starting to sway me that way. Of course we'll need to decide in a vote I think to updated Wiktionary:Etymology and get it made formal rather than a draft policy. As for < vs. from I'm not so sure but I do agree it can make the etymoogies a bit snappier.--Williamsayers79 08:46, 9 March 2007 (UTC)
- I also agree with Atelaes on using bolds instead of italics, it will help etyma to stand out. Unorthografair 20:15, 8 May 2007 (UTC)
- I don't particularly mind whether we use bold or italics, both look pretty good to me. Yes, English words from Old French are considered borrowings; the only words which are not are those from Old English (or a few from Old Norse). For once I disagree with Stephen, in that I prefer ‘from’ to the algebraic < which I think confuses some people. And I also prefer curly quote marks (‘’) to the double ones (""), but I think I'm in the minority there. Widsith 08:09, 17 April 2007 (UTC)
- Bold looks good to me, it jumps out from the page, making the chain of terms easy to spot. I prefer < instead of from, as long as the meaning is interchangeable (I don't know if it is); again, it's easy to spot them instead of looking for one word in a string of them. ArielGlenn 08:34, 23 April 2007 (UTC)
- Very diverging opinions here! Let’s add mine:
- Traditionally, I’d like italics, but the arguments for bold are quite sound, so yes, bold seems ok.
- I hate <. It is cryptic, and I never see a good reason to prefer < over >.
- Romanizations should go between parentheses, like everywhere else, and translations between curly quotes. Using double quotes predates Unicode. (On an international US qwerty keyboard, you find ‘ as right Alt+9, and ’ as right Alt+0. You can also get them in the Misc menu under the edit box, recently. I prefer single over double curled quotes.) H. (talk) 11:55, 26 April 2007 (UTC)
- Very diverging opinions here! Let’s add mine:
- < is like a right-to-left arrow. < French means from French; > French (if it were ever used in etymologies) would mean to French. It seems to me that single quotes are usually preferred by the British, and double quotes by Americans. —Stephen 23:24, 26 April 2007 (UTC)
- Yes, obviously that is what it is meant to mean, but it is not clear to the reader. It means "less than" to my eye, as it will to many others. "From" is perfectly good, so let's use it.
- Only Latin script needs to be italicised, so Atelaes's argument for emboldening is specious, IMO. — Paul G 12:00, 20 June 2007 (UTC)
- Latin script has certainly not been the only script italicized previously on Wiktionary. Additionally, I think it look rather....odd...to have Latin scripts italicized and non-Latin scripts bolded. It would be much cleaner to figure out a plan, and then stick with it. Atelaes 18:58, 20 June 2007 (UTC)
So far as I know, we only italicize etymons of Latin script because English Wiktionary is written in Latin script and so we need to distinguish between use and mention of words written in Latin script. Most words written in a non-Latin script are immediately recognized as non-Latin and hence are obviously only mentioned in the Etymology section. (There are exceptional non-Latin-script words that may appear to be Latin, like кот and 이, but I think we can safely ignore their potential for use-mention confusion.) For many non-Latin scripts, italics makes the words more difficult to read. So, in the interest of clarity, I prefer non-Latin-script words mentioned in English Wiktionary to be written without italics. Rod (A. Smith) 19:13, 20 June 2007 (UTC)- Assuming we can adopt
{{term}}
, readers will be able to override the default format for mentioned Latin script terms in running text, so I withdraw my preference for making italics the default. In my own settings, mentions are italics because that's the format I'm used to, but my preferences should not interfere with our having a consistent style for mentioned terms (i.e. the bold format chosen in this vote). Rod (A. Smith) 19:05, 26 August 2007 (UTC)
"Etymology" vs. "Derivation"
I'd like to suggest somehow linking these two terms to ease searching and editing Wiktionary entries and their etymology.
Whilst looking-up the word "royalty" (defn. #4), I was expecting a segment that would provide an etymological derivation for the word, just as any respectable print-media dictionary would, but found no such item on the word's main page. I proceeded to search through Wiktionary's meta-pages (e.g. "Wiktionary:derivation,") attempting to understand the Wiktionary format for displaying etymological data, but searches within Wiktionary's "Community portal" and "Discussion rooms" pages did not yield results relating to word derivations. I then used the search-term ":etymology," rather than ":derivation," and was led to this page, made subsequent edits, etc.
Is there a way to redirect "Wiktionary:derivation," searches to Wiktionary:Etymology?
Caen 01:50, 22 April 2007 (UTC)
- Seems reasonable enough. Done. Atelaes 08:30, 22 April 2007 (UTC)
FYI, I introduced {{term}}
in this edit. I assume that doing so would not cause conflict, as we have not yet officially selected a format for mentioned terms in etymologies and I am illustrating a proposed use of {{term}}
through this draft policy page. If {{term}}
is not accepted for wide-spread use, I will convert invocations of it to whatever format contributors prefer. Rod (A. Smith) 05:25, 27 August 2007 (UTC)
- This seems fine to me.--Williamsayers79 20:55, 27 August 2007 (UTC)
New template
I have created {{etyl}}
which is meant to be a replacement for the etymon language templates such as {{L.}}
, {{AGr.}}
, etc. This is per the discussion here. The advantage of the template is that there would no longer be a need to memorize a separate set of language codes from the ISO codes. The disadvantage is that no one will be manually checking that [[w:xxx language]] actually leads to anything, and as of yet, there is no way to specify any other link. However, it seems that Wikipedia is fairly consistent in their naming conventions and will generally have at least a redirect at the location that the template specifies. The template has been put in use at synonymum. Would everyone please take a look and give their feedback, criticisms, etc. Ultimately, the new template and the old templates should generally look and function identically, so there is no huge rush to make a decision on this. Thanks. Atelaes 04:34, 28 January 2008 (UTC)
- The big disadventage I see is that you have to remember which order the two languages are supposed to come in. With templates like
{{†L.}}
you know which language will display, and that the optional argument is used to name the entry language. With this new template, I would never be sure. Of course, the obvious advantage is that you don't have to have hundreds of individual templates. --EncycloPetey 04:56, 28 January 2008 (UTC)
- The order of parameters is the same as in the original scheme (the first is "from", the optional second one is "to", just like in the usual CLI pipeline "|" :), it's just that now it has been generalized. Where you used to use {{L.}} now you can use {{etyl|la}}, where you used to use {{L.|xx}} now you can use {{etyl|la|xx}}.
- Of course, nothing forbids you to use explicitely the specific templates (they're used in tens of thousands of entries, and not going away anytime soon :) --Ivan Štambuk 10:07, 28 January 2008 (UTC)
- Please put something in its talk page aimed at newbies and tiros. DCDuring TALK 11:58, 28 January 2008 (UTC)
- Thanks for the talk page for etyl. DCDuring TALK 23:38, 28 January 2008 (UTC)
- Please put something in its talk page aimed at newbies and tiros. DCDuring TALK 11:58, 28 January 2008 (UTC)
- I am unclear as to how to use this in chains of derivations, "en < enm < fro < Late Latin < la < grc" being a case. How is the second parameter optimally used?
- Late Latin (and, possibly, Medieval Latin) are in many of our MW1913-based articles, but are not in the ISO 639-1/2/3 codes, I don't think. Will they be in later ISO versions? If so, how to we use etyl to set ourselves up for that? DCDuring TALK 23:38, 28 January 2008 (UTC)
- Etyl does not currently support any languages without ISO codes. For things such as Latin Latin, you will simply have to use the old
{{LL.}}
. Using etyl in chains works the same as if there were no chains. The second parameter needs to be the same in each instance of etyl within a single etymology. So, if you're doing the etymology on a Latin word, every instance of etyl should have "la" as the second parameter. Also, I thought I'd note that I've added the ability to not categorize a word. Simply enter a dash "-" as the second parameter. This is useful for translingual words, which are not yet categorized. Atelaes 00:43, 29 January 2008 (UTC)
- Etyl does not currently support any languages without ISO codes. For things such as Latin Latin, you will simply have to use the old
- Just to make this crystal clear for my benefit, the second parameter of
{{etyl}}
for the etymology of an English word should always be "en". If so, why would one ever use the second (optional) parameter? Taking one of the cases where an etymology shows a direct derivation of an English word from Latin and also the ancient Greek that was the source of the Latin, both uses of{{etyl}}
should contain "en" as the second parameter, even though the Greek was not the direct source of the English. I would have thought you would have wanted "la" in the second parameter of the second use. DCDuring TALK 18:58, 4 February 2008 (UTC)
- Just to make this crystal clear for my benefit, the second parameter of
- You can omit the second parameter for English words (RTFM! :). Latin is the "destination" language in that case, but you're discussing the etymology of an English lexeme in it's Etymology section, not Latin. It would be great that the destination language could be inferred from something like {CURRENTLANGUAGE}, but unfortunately it can't :( -Ivan Štambuk 20:47, 4 February 2008 (UTC)
- So, in short, as long as you're working only on English entries, you don't have to worry about the second parameter. However, a lot of us are working on non-English entries, and we need the second parameter. The second parameter is always the iso code of the L2 which the etyl happens to be in. Atelaes 21:00, 4 February 2008 (UTC)
- The biggest reason I know of, for using the
{{F.}}
style, was for compatability when importing Webster 1913 entries. That is, the 1913 syntax is partially used, so that it can render in our format. The manual W1913 conversion efforts have all but ground to a halt in the last year (for no reason whatsoever.) I guess until someone fully automates the W1913 import, the old forms should be retained. --Connel MacKenzie 22:15, 2 February 2008 (UTC)
- To Connel’s comment, see: Wiktionary:Etymology/language_templates#Webster_1913.
- Nils von Barth (nbarth) (talk) 02:59, 28 September 2008 (UTC)
As per discussion at Beer parlour (August 2008), consensus appears to be to use {{etyl}}
whenever possible. I’ve updated this page to reflect that.
Nils von Barth (nbarth) (talk) 02:59, 28 September 2008 (UTC)
Derivations, agglutinations, compounds, and phrases/idioms?
What is best practice for etymologies of these?
AFAICT, the etymology for derivations, agglutinations, and compounds is a + b, as in absently: absent + -ly.
Questions:
- should these components themselves have etymologies?
- I would guess: no for free morphemes and very common ones (-ly, -ness), but yes for bound morphemes (such as biology = bio + -logy; include Greek etymology).
- should there be some note like "Derivation" or "Compound", say by writing "Compound of cash + point" for cashpoint?
- These would seem desirable.
For phrases and idioms, individual words clearly don't need an etymology, but citing the origin/earliest known attestation of the phrase seems worthwhile.
Do these seem reasonable?
Nbarth (email) (talk) 22:34, 6 April 2008 (UTC)
- Likewise for loan words or calques, presumably translating the terms is sufficient etymology.
- Nbarth (email) (talk) 22:37, 6 April 2008 (UTC)
- WT:ELE#Etymology does specify that regular formations should not repeat the full etymology; I've included that here, with example, but that doesn't address the other categories.
- Nbarth (email) (talk) 00:34, 7 April 2008 (UTC)
Other word formation?
Should we have a standard form (and template?) for various w:Word formation types?
Notably:
- There is already one for blending;
- back-formation;
- clipping;
- onomatopoeia;
- ideophones.
Various abbreviations (Category:Abbreviations, acronyms and initialisms) don't tend to have etymology sections, and instead sometimes bold the relevant initials in the definition (and sometimes not). Is this sufficient, or should the etymology read "Acronym of A... B... C..."?
Nbarth (email) (talk) 22:43, 6 April 2008 (UTC)
- Yes. I think one or two of those have templates already, but not in widespread use. -- Visviva 00:08, 7 April 2008 (UTC)
- I agree that we should make these clarifications when relevant, but I wonder how much use templates would be in this endeavour. -Atelaes λάλει ἐμοί 00:11, 7 April 2008 (UTC)
- I'm thinking of something like the etymology language templates (
{{OE.}}
etc.), as these yield consistent text and a category.{{backformation}}
would save typing, if nothing else (and allow us to easily include a link, for instance). - Nbarth (email) (talk) 00:33, 7 April 2008 (UTC)
- I'm thinking of something like the etymology language templates (
{{suffix}}
and{{prefix}}
probably deserve a mention. Conrad.Irwin 00:35, 7 April 2008 (UTC)
- They sure do! Thanks Conrad; I've added them to the "regular formation" section.
- Nbarth (email) (talk) 00:50, 7 April 2008 (UTC)
- There is a basic logic/structure/layout problem with etymologies for abbreviations in the numerous cases where there are multiple senses for the abbreviation. Usually the senses are completely different etymologies. See ARA. For contrasting case, see sent.. But I have seen a very few cases where an etymology seems appropriate (No example comes to mind.). I don't especially like the appearance of the bold letter in acronyms and initialisms. Though this isn't the right forum for that, I'd be interested in any opinions as they are a near substitute for etymology. I expect that you would just want to say that most abbreviations do not need an etymology, with explicit exceptions like the borrowings from other languages (op. cit., et al.) that use inflected forms we may not have as entries in their own right for many, er, months. DCDuring TALK 01:54, 7 April 2008 (UTC)
Complete history
It seems (from looking around) that one includes the complete history, not just the immediate predecessor; I've written this into the draft policy at Wiktionary:Etymology#Inherited words.
Is this in fact consensus?
It makes sense to me: etymologies rarely go back far (4 or 5 steps is the usual limit), so this reduces having to follow countless links, and makes history visible at a glance, at relatively minimal cost.
For compound words I could imagine a different policy (imagine a compound of 3 words, each with 5 ancestors).
Nbarth (email) (talk) 00:39, 7 April 2008 (UTC)
- My practice has been to give special merit to more recent ancestors (i.e. earlier English variants, Middle English, etc.) when I can find them (which is admittedly not often). Past that a single etymon per language is generally a good rule of thumb. However, one important criterion I use is to try and get back to words that explain the meaning behind the word. So, for example, the etymology of philosophy must, in my opinion, get back to φίλος (phílos, “beloved”) and σοφία (sophía, “wisdom”), whatever else it does with everything in between. For compounds, I think it often sufficient to simply note the components, unless going further back is specifically more meaningful. So, for example, if anthropology came from anthropo- and -logy (it doesn't, but let's assume it did just for this example), it would be nice to get back to ἄνθρωπος (ánthrōpos, “human”) and λόγος (lógos, “account”). -Atelaes λάλει ἐμοί 00:51, 7 April 2008 (UTC)
(edit conflict) I'm not sure what the point would be of dragging a word like syncopate all the way to its conjectural origins in PIE. It is tortuous enough getting it back to the Greek stem words. It may be only 3 steps, but it takes 3 languages and 5 terms. I would think that the core words could take it from there. We don't need to make our etymology section waste as much precious first-screen space as the Pronunciation section already does. Etymology is rarely of interest to the great bulk of our users who already often complain about entry complexity. I have yet to see a complaint in Feedback from our anons about a missing etymology, let alone missing PIE stems or Sanskrit cognates. OTOH, if multi-line etymologies were concealed from view by the use of the {{rel-top}}
etc., my feelings would be different. The issue would then be what goes in the visible gloss. (This seems consistent with Atelaes' comment, but I sometimes have two terms per language where a specific form (past participle often) matches the spelling better. DCDuring TALK 01:01, 7 April 2008 (UTC)
In general I like as few steps as possible. That is the advantage of being able to wikilink the etymons, in my opinion: you can trace it back further if you want to. Widsith 16:19, 7 April 2008 (UTC)
Borrowing step
Words may be borrowed at some ancient step, such as they (borrowed from Old Norse). It seems useful to flag "borrowing" steps via "borrowed from", as distinct from "natural development".
Is this generally accepted, or is there some other policy or de facto format?
Nbarth (email) (talk) 00:45, 7 April 2008 (UTC)
- I think that, in general, this has fairly wide acceptance, if not practice. The sticky issue is whether such things should go under, say Category:Old Norse borrowings instead of Category:Old Norse derivations, or whether they should be listed as a descendant of the Old Norse etymon. These are tricky, and I have mixed thoughts on them. But noting the specifics in the etymology of they is definitely a good idea, regardless of what we do with those other issues. -Atelaes λάλει ἐμοί 00:56, 7 April 2008 (UTC)
- We went with the Category:Old Norse derivations format a while back due to confusion over borrowings, substratum, inherited, loaned etc. I would not like to see a proliferation of categories once again - this would require a whole new load of templates and the categories are more than enough to maintain at the moment.--Williamsayers79 16:10, 7 April 2008 (UTC)
In English the situation is simple, because any word not from Old English is a borrowing. However it's a big issue with other languages, especially Romance languages which have many words both inherited from Latin and borrowed from Latin. We are not very consistent at marking the different, yet. However I have been literally putting "borrowed from..." in the Etymology sections in those cases. Widsith 16:23, 7 April 2008 (UTC)
Write in prose "borrowed from.." but use the usual templates which put them in '<langname> derivations' category.
It was discussed for several times should the inherited words be separated from borrowings, i.e. should the only content of Category:Etymology be Category:Middle English derivations, Category:Old English derivations, Category:Proto-Germanic derivations and Category:Proto-Indo-European derivations (similarly for other languages), and the rest be relocated to to some other category hierarchy related to non-inherited words.
Had something like {{etyl}}
been enforced from the very beginning of Wiktionary, this separation of non-inherited words could be done much more transparently - {etyl} would have to be expanded with almost trivial conditionals that would check if 'source' language is in genetic relationship with the 'destination' language, and would choose the appropriate category based on that fact. --Ivan Štambuk 16:53, 7 April 2008 (UTC)
- A related question is whether to pursue etymologies further back than borrowings, and similarly regarding categorization (e.g., should English words borrowed from Nahuatl via Spanish be in (English) Nahuatl derivations?). There seems to be differing opinions on this, so I’ve flagged the issue at Whether to pursue borrowings? and hope to discuss it to see if there is some consensus.
- —Nils von Barth (nbarth) (talk) 23:18, 9 December 2009 (UTC)
- For English, I'd say the Etymology section could go either way, and that the etymology categories should match whatever ends up in the etymology section. For languages other than English, I would say that pursuing an etymology beyond a borrowing isn't needed, and shouldn't be done. Any additional etymology for non-English terms can be included with the entry for the borrowed term. We can choose to go further in English, but because this is the English Wiktionary. --EncycloPetey 02:02, 10 December 2009 (UTC)
Century
The Concise Oxford Dictionary of English Etymology gives centuries for the origins of English words. How should I phrase the century in a Wiktionary etymology?
- Example:
From {{OF.}} < {{L.}} {{m|la|fortitūdō||bravery, strength}} < {{m|la|fortis||brave, strong|lang=la}}.
- From Old French < from Latin fortitūdō (“bravery, strength”) < fortis (“brave, strong”).
Fortitude is from the 15th century. How would I go about putting that into the etymology above? Start the ety with "15th century: from ..."? Harris Morgan 18:04, 27 April 2008 (UTC).
- We have no proper way of doing this yet. Some entries have done it the way you suggest. However what we really need is to incorporate the earliest attestation year/century into the definition lines, because different senses will have been first used at different times. Eg bead is first attested from 885 meaning "prayer", but is not used to mean "small ball" until 500 years later. Widsith 20:19, 27 April 2008 (UTC)
- This could be done on the citations page, by giving the citation itself (with date), grouping citations by sense. --EncycloPetey 13:21, 30 April 2008 (UTC)
- Yes, I agree. Always assuming we have the earliest citation, that is... Widsith 13:46, 30 April 2008 (UTC)
Lemma
I’ve added the admonition
- “Include the full etymology on the headword / lemma entry, even if historically the lemma derives from another form, such as by back-formation.”
…as this seems common practice and sane – d’y’all agree?
Nbarth (email) (talk) 00:01, 18 June 2008 (UTC)
- Sure. It's also good to trace the etymology back to a lemma, particularly if the entry derives from a specific inflected form. --EncycloPetey 00:05, 18 June 2008 (UTC)
- I also agree, except for the minor quibble that headword refers to any word for which we have an entry. So, on the page "words", the headword is words, while the lemma in each of the English sections on that page is word. So, I'd remove "headword" to avoid confusion. Rod (A. Smith) 00:08, 18 June 2008 (UTC)
- Rod
- Good point – done!
- EncycloPetey
- Good point; to restate:
- “In an etymology, if a word derives from an inflected form, one should say
‘From Xer, form of X.’”
- where “Xer” is the inflected form, and “X” is the lemma – i.e., have inflection be a step in etymologies.
- A question/concern though: consider the etymology for sesquipedalian, which is a compound word whose middle term is derived/inflected from a lemma. The current entry (I recently edited it, just reformatting/re-arranging.) breaks down the middle term in detail, which is correct and clarifies it, but confuses the flow. I’ve added ; to break up the etymology of specific parts – do y’all think this works, or could such entries be made clearer?
- Nbarth (email) (talk) 15:03, 22 June 2008 (UTC)
- While I generally try to keep the etymologies of compound terms as simple as possible (otherwise you end up with an unreadable mess), you admittedly sometimes have to go back a bit to make for a useful etymology. I think your changes make the etymology a great deal clearer. -Atelaes λάλει ἐμοί 01:18, 23 June 2008 (UTC)
- I think this should be reconsidered in light of WT:BP#Re-thinking etymology format. In a monolingual dictionary, a full etymology is needed, but when we have entries for etymons, this locks knowledge away from the other etymon's derived terms. It also fosters duplication and error. I'm not sure at what stage we should shift etymological information back up to etymon entries, but this should be rehashed. --Bequw → τ 18:30, 1 July 2010 (UTC)
- I've removed the word "full" from the main page so as to leave this optional. --Bequw → τ 19:56, 19 July 2010 (UTC)
Reconstructed terms
I’ve incorporated policy from:
into the section:
…hopefully with success (as the edit history shows, there are many wrinkles that I ran into) – does it look ok?
Nils von Barth (nbarth) (talk) 01:01, 5 October 2008 (UTC)
- Looks fine, except for the part "reconstructed forms in attested languages are treated as normal, with a * in front of the entry". If the term is not attested, it must not appear in the main namespace, as it cannot pass CFI. Vulgar Latin is no exception to that. If Vulgar Latin form is reconstructed on the basis of comparative evidence from Romance languages, it might as well be put to the appendix as "Proto-Romance" or sthg. --Ivan Štambuk 13:20, 6 October 2008 (UTC)
Sources
Sources of etymologies that are in public domain include The Century Dictionary published at 1911:
- “Etymology/Archive 1”, in The Century Dictionary […], New York, N.Y.: The Century Co., 1911, →OCLC.
Please, post more public domain sources or sources with license that allows inclusion in Wiktionary. --Dan Polansky 13:22, 8 November 2008 (UTC)
- Archive.org has most of the 1st edition OED, albeit in the form of ginormous PDFs.[1] Not even the best OCR software can handle the OED's trademark multilingual small-print jumble, though, so any imports have to be done by the tedious ol' manual method. Still, it's often worth a look. Note that those volumes published after 1923 (vols. 9 and 10, if I recall) *may* not be in the public domain; I haven't found anything to indicate conclusively one way or the other.-- Visviva 04:45, 4 February 2009 (UTC)
Calques / Loan translations / Compounds
We need to discuss calques generally and the difference between them and compounds specifically. It is easy to fall into the trap of assuming the most obvious breakdown of a word is its etymology. For instance I just reworked the etym of superego which formerly stated that it's just a compound of super- + ego whereas it is in fact a calque of German Über-Ich. This is a pretty common pattern which we need to shed light on. — hippietrail 03:29, 4 February 2009 (UTC)
- I think your approach in superego is the right way to go. The etymology should optimally include both the surface morphology and the actual calque derivation. This applies to descendants as well as calques IMO; the etymology for a Latinate compound that is actually descended from French and/or Latin should include both the historical derivation and the surface morphology. (An advantage of this is it allows us to supersede our benighted lexicographical brethren from the Age of Print. There was one word that came up for discussion recently -- maybe leukemia -- a compound formed from Greek-based Neo-Latin roots which was first coined in German and then calqued into English as if it had been formed directly from the said roots. I looked at 5 or 6 different dictionaries that each provided a different derivation -- each was accurate, as far as it went, but none was complete.)
- I want to stress -- and I think this page should stress -- that there's nothing wrong with an incomplete etymology, as long as it is basically accurate. It is much better to offer a partial etymology -- a seed from which a full etymology can grow -- than no etymology at all. -- Visviva 04:02, 4 February 2009 (UTC)
- On calques: compounding and calquing are two orthogonal processes. You can compound a word with or without resorting to calquing (or "loan-translating"). Knowledge of whether a particular word is a calque or not (whether compounded or non-compounded) is specific and for great many editors terra incognita. There are lots of Latinate/Ancient Greek calques present in many European languages dating from the period of Renaissance, Humanism.., even much before, all of which are centuries old and not clearly recognisable, being diffused into literary language by many writers at once, and not a result of coining by a single man in a single moment of history.
- Example: Russian благодарить (blagodarít’, “to thank”) can be obviously morphologically decomposed into благо (blago) + дарить (darit’), but in fact the whole verb is a borrowing from Old Church Slavonic blagodariti, which itself is a calque of Ancient Greek verb εὐχαριστέω (eukharistéō). Russian lexeme also has parallels in all the other South and East Slavic languages where it was independently borrowed from the Church Slavonic lexis into literary languages as the word of a "higher style"..
- Also, there should be probably Category:English calques, as knowledge of calquing process of a lexeme can very much enhance learning experience. I created Category:Croatian calques a while back, which should prob. be merged into some "caqlues by language" master category, as is the usual procedure. --Ivan Štambuk 04:14, 4 February 2009 (UTC)
In some languages compounding is a seperate but interrelated concept with etymology. Especially in languages which use or formerly used Chinese characters such as Chinese, Japanese, and Vietnamese. But also to freely compounding languages such as German, Hungarian, Turkish, and to languages with a high degree of monosyllabic flavour such as Thai.
For instance all Chinese words can be broken down into characters and are thus compounds, but these breakdowns are often not the origon of the word since Chinese words like the words of all languages are primarily spoken but also need to be written. Thus written forms are often invented or evolve after the fact and are retrofitted onto a spoken word. The obvious case is with transliterations into Chinese of words borrowed from other languages. Now languages heavily influenced by Chinese or the Chinese script often borrow Chinese terms wholesale but also can form words by Chinese-like rules as well as by their own native rules. Thus while modern Japanese borrowing will be written in katakana, there are many older forms written in kanji. Some of these are ateji. More discussion by experts is probably needed on this topic. — hippietrail 03:45, 4 February 2009 (UTC)
- I've created Category:English calques if anyone's intereseted. --Ivan Štambuk 07:19, 26 February 2009 (UTC)
I have created template:transliteration and template:calque, I think it could provide some harmonizatio and should make things easier to maintain, however I am quite new to the wiktionary, so I don't know it is the right way to do it. I think "transliteration" works quite well but calques are a bit more complex, and it would probably be better if it took additional parameters, but I don't want to make the template too heavy to use, but please improve it if you think it is worth it.--Zolo 12:06, 20 December 2010 (UTC)
Via
Is there a prefence of whether to use the formula "From Latin from Ancient Greek" or "From Ancient Greek via Latin"? Or is there a way to decide which to use under various circumstances? — hippietrail 03:32, 4 February 2009 (UTC)
- I'd prefer the first one any time as less confusing. from in etymologies is used as a substitute for < which denotes a diachronic route of the word borrowing/inheritance, and this via intermediary screws up the mental process. Plus, via can only be used in such way with 2 levels of borrowing. So standardising on "From x, from y, from z..." or "y < y < z.." would be the best IMHO. --Ivan Štambuk 03:48, 4 February 2009 (UTC)
- That's my feeling too but I wanted to ask everybody who puts in a lot of work on etymologies. — hippietrail 03:52, 4 February 2009 (UTC)
- I prefer the second, but only because the first is hideous to look upon. ;-) And it only gets worse if there are more stages. This "from A from B from C from D" stuff gives me a headache, and it's neither clear (since it doesn't follow the normal rules of English grammar) nor pleasant to read; I find "<" more acceptable but other editors have raised reasonable objections to that. All in all, IMO it's better to break it up into human-readable chunks, even if this is at the cost of linearity ("From Middle English fooue, derived from Old French foouile, derived in turn from Latin foovilius possibly via unattested Proto-Romance *foovili. The Latin term may originate in a borrowing from the Ancient Greek φύβις, though this is disputed ...") As above, though, whatever we agree on should be understood as an optimal end state, not as a standard to be aggressively enforced. There's nothing particularly wrong with "from from from", or "< < <", or anything else that gets the message across. The preferred style, whatever it is, should be applied only in the course of general improvement and expansion. -- Visviva 04:17, 4 February 2009 (UTC)
Inlcuding each step of borrowing
There is a common pattern on Wiktionary where an English word that still feels like a foreign borrowing skips a step in its etymology. For instance I recently edited the English entry for huitlacoche whcich formerly stated that the term is from Nahuatl. It is from Nahuatl but it is extremely unusual though not impossible for English to borrow a word directly from Nahuatl. Generally such words are borrowed from Spanish which in turn borrowed them from Nahuatl. Etymology editors need to be careful to describe which line of descent applies in all cases. We need to remind them to be careful to do this. — hippietrail 03:36, 4 February 2009 (UTC)
- Well, as you say, it is from Nahuatl, even if not directly, so I don't really see this as problematic. But since I created the entry, that probably doesn't come as a surprise. :-P To repeat myself just a bit, IMO an initial etymology -- much like an initial entry -- provides a seed from which a more complete etymology can grow. We have lots of words with missing stages in their etymology, just as we have lots of words with missing senses; this is because each editor faces unique constraints -- of time, energy and expertise -- which govern the amount of information the editor is able to add. If we demand that entries (or sections) be formed in perfection, we deprive ourselves of the benefits of the collaborative wiki process.
- That said, something on this page reminding editors to make the chain of derivations as complete as possible would not be out of place. -- Visviva 04:34, 4 February 2009 (UTC)
Cognates
[2] - I've added my proposal for listing cognates in the etymology sections. IMHO, it's a mere codification of what has been "common practice" so far. Space is left open to override the guidelines in the individual language policy pages (e.g. one might want always list (Old) Armenian cognate in the Ancient Greek etymology section [and vice versa!] due to Graeco-Armenian theory etc.). Please comment. --Ivan Štambuk 10:42, 24 February 2009 (UTC)
Also note that point 4 (always list OE) rules out the possibility to list Gothic language cognate, which IMHO is pretty important as Gothic is the most archaic and earliest attested Germanic language. So I would rather always list it, if Gothic cognate is present, regardless if OE cognates is present or not.. --Ivan Štambuk 10:58, 24 February 2009 (UTC)
- A promising beginning. However, I do not see any reason for a whole paragraph dealing with OE, probably you could include it in the already enumerated sanctioned ones here: Ancient Greek, Latin, Old Church Slavonic, Sanskrit/Avestan/Old Persian, Lithuanian/Old Prussian, Gothic, Old Irish, Tocharian, Old Anatolian (Hittite, Luwian, Palaic) and Old Armenian. It would be nice, if you write a line or two for Old Norse or Old High German, e. g. in entries of non-Germanic ancient languages ON and(or?) OHG are to be listed provided that there is no OE and Gothic cognate. Bogorm 11:30, 24 February 2009 (UTC)
- point 4 (always list OE) rules out the possibility to list Gothic language cognate ?? How does it rule it out? It simply encourages addition of the OE entry. As for the præsence of Gothic, I support you entirely. You already sanctioned Gothic by adding it to the following enumeration: Ancient Greek, Latin, Old Church Slavonic, Sanskrit/Avestan/Old Persian, Lithuanian/Old Prussian, Gothic, Old Irish, Tocharian, Old Anatolian (Hittite, Luwian, Palaic) and Old Armenian - the languages from this set should under no circumstances be removed from etymology sections, should they? I am looking forward to a consensus on Old Norse with regard to the proposed line just above. Bogorm 11:35, 24 February 2009 (UTC)
- Yes, OE could be listed in there, but it's better to deal with it unambiguously in a separate point given it's special status (list always if present), and given the inapplicability of number-limitation-per-branch rule.
- Yes it does rule it out by taking into account rule #1 (one example per branch), and that's what I've objected above: I would like to see Gothic as a regular example of Germanic branch, regardless of the presence of OE cognate. But as it turns out, some would disagree, so perhaps we better discuss this before putting it there ;)
- Yes, the languages from that list should not be removed, unless there is some other policy taking precedence (e.g. rule #5, or language-specific policy).
- As for the Old Norse: non-Germanic ancient IE languages could list it, per guidelines, iff there is no Gothic and OE cognate present. Ancient Germanic languages (OE, OHG, OS etc.) should OTOH always list it per rule #1 (one example from a major branch of the immediate ancestor, i.e. of Proto-Germanic in this case). --Ivan Štambuk 15:46, 24 February 2009 (UTC)
- I also want to see it on a regular basis. Ivan, I am dumbstruck that you are the main contributor for Gothic entries and yet are faltering. But if you included it in that set of languages, I concluded that it is to be present regularly and I am convinced that I am not the only one who would understand it that way and list every Gothic cognate he finds on every entry of an ancient language, which is how it ought to be, because it is the most archaic and ancient Germanic language. Please, allow both OE and Gothic to be present, they are not so many after all, especially if we trim OHG, ON, Old Saxon. Bogorm 16:45, 24 February 2009 (UTC)
- I must admit that, while I can sympathize with the desire to include Gothic, I feel that two Germanic cognates is generally redundant (except in Germanic etymologies of course), and so I think that the current wording, which rather limits Gothic inclusions, should stay as is. One would hope that Gothic cognates should always be listed on the OE, and thus just one click away. -Atelaes λάλει ἐμοί 21:57, 3 March 2009 (UTC)
- I cannot conceal mine incomprehension of how the Gothic language which is centuries older than OE and by far more archaic can possibly be treated as inferior. I am unable to accede thereto and shall await the conclusion of the current debate and hopefully the participaton of additional contributors. If it disallows Gothic cognates in non-Germanic etymology sections, I shall simply cease adding Gothic cognates wheresoever on en Wikt (a personal resolution). Otherwise, I would be glad to contribute in expanding the etymology sections. The uſer hight Bogorm converſation 22:18, 3 March 2009 (UTC)
- One thought that I had was that in situations where the Old English has no (or no good) descendant in English, perhaps it would lose its preference. So, for example, in διά (diá), the OE cognate to-/te- has no English descendants that I can see, and the Gothic cognate is clearly more conservative, and so I've made the switch. As Bogorm rightly states, this is the English Wiktionary, not the Old English Wiktionary, and Old English merely functions as a bridge from the ancient world to English. Thoughts? -Atelaes λάλει ἐμοί 22:28, 4 March 2009 (UTC)
- I cannot conceal mine incomprehension of how the Gothic language which is centuries older than OE and by far more archaic can possibly be treated as inferior. I am unable to accede thereto and shall await the conclusion of the current debate and hopefully the participaton of additional contributors. If it disallows Gothic cognates in non-Germanic etymology sections, I shall simply cease adding Gothic cognates wheresoever on en Wikt (a personal resolution). Otherwise, I would be glad to contribute in expanding the etymology sections. The uſer hight Bogorm converſation 22:18, 3 March 2009 (UTC)
- I must admit that, while I can sympathize with the desire to include Gothic, I feel that two Germanic cognates is generally redundant (except in Germanic etymologies of course), and so I think that the current wording, which rather limits Gothic inclusions, should stay as is. One would hope that Gothic cognates should always be listed on the OE, and thus just one click away. -Atelaes λάλει ἐμοί 21:57, 3 March 2009 (UTC)
- I also want to see it on a regular basis. Ivan, I am dumbstruck that you are the main contributor for Gothic entries and yet are faltering. But if you included it in that set of languages, I concluded that it is to be present regularly and I am convinced that I am not the only one who would understand it that way and list every Gothic cognate he finds on every entry of an ancient language, which is how it ought to be, because it is the most archaic and ancient Germanic language. Please, allow both OE and Gothic to be present, they are not so many after all, especially if we trim OHG, ON, Old Saxon. Bogorm 16:45, 24 February 2009 (UTC)
- Great stuff, I fully support these ideas. Ƿidsiþ 16:48, 24 February 2009 (UTC)
- Agreed. This is all very good. However, I wonder if there could be a note about especially similar cognates overriding other factors? I'd like to keep the cognates listed at ἱδρώς (hidrṓs), even though they don't follow all the rules we've been discussing so far, as the word is from a specific word in IE, which does not have reflexes in most languages. -Atelaes λάλει ἐμοί 08:24, 26 February 2009 (UTC)
- If reflexes are present only in a limited number of branches (<= 4), all of them should be listed and guidelines can't apply. *swidrōs and *sweyd- are treated as two different lexemes in PIE. OTOH languages that have a reflex only of one of related PIE forms, e.g. *sweyd- but not *swidrōs, might mention both grc ἰδίω (idíō) and ἱδρώς (hidrṓs). And also vice versa. --Ivan Štambuk 08:46, 26 February 2009 (UTC)
Ivan, with no expressed objections against the aptitude of Gothic entries being treated on a par with Old English ones and being præsent in all ancient cognates, I reckon the præsence of this rule in the current policy for exigent, so that everyone erasing them know that he infringes thereby the official policy. Accordingly, I am looking forward to the reversion of Atelaes' edit in θρῆνος(vide supra) The uſer hight Bogorm converſation 21:43, 3 March 2009 (UTC)
- If reflexes are present only in a limited number of branches (<= 4), all of them should be listed and guidelines can't apply. *swidrōs and *sweyd- are treated as two different lexemes in PIE. OTOH languages that have a reflex only of one of related PIE forms, e.g. *sweyd- but not *swidrōs, might mention both grc ἰδίω (idíō) and ἱδρώς (hidrṓs). And also vice versa. --Ivan Štambuk 08:46, 26 February 2009 (UTC)
- Agreed. This is all very good. However, I wonder if there could be a note about especially similar cognates overriding other factors? I'd like to keep the cognates listed at ἱδρώς (hidrṓs), even though they don't follow all the rules we've been discussing so far, as the word is from a specific word in IE, which does not have reflexes in most languages. -Atelaes λάλει ἐμοί 08:24, 26 February 2009 (UTC)
Balto-Slavic-Germanic kinship
Additionally, I would propose allowing listing of all ancient Germanic cognates on Baltic and ancient Slavic entries, because the Balto-Slavic hypothesis involves often the Germanic languages (many common words in Slavic and Germanic languages) and there was a balto-Germanic hyporhesis as well. Bogorm 16:45, 24 February 2009 (UTC)
- Balto-Germanic and Balto-Slavic-Germanic hypotheses are obsolete. BSl and Germ share some exclusive trivial phonological developments (prothetic vowels before PIE syllabic */l/ and */r/), some morphology and some lexis, but not enough to justify genetic clade. Rather it was dialect continuum is post-PIE times, one that didn't last too much. There are lots of Gothic borrowings to Common Slavic (and to individual Slavic branches in post-Common-Slavic, from Balkanic Gothic e.g.), and this is the only place were these should be mentioned. Users should utilise 1-2 mouse clicks it takes to reach the Germanic cognates of the Slavic word they're interested in the appendix namespace. --Ivan Štambuk 17:01, 24 February 2009 (UTC)
- Now you are again beginning with the claims of obsoletion like in Talk:Macedonia. Shall I again quote the foremost authors of Indo-Germanic linguistics in order to corroborate my appeal? Besides, what do you think of tausend/тысяча Leute/ljudi and dozens of others? All Gothic borrowings? Bogorm 17:08, 24 February 2009 (UTC)
- I repeat: all affinity between BSl and Germanic is a result of areal development in post-PIE dialect continuum, not a result of strict genetic relationship between the two. That is the general consensus among all modern Balto-Slavists and Germanicists. There is much more evidence to connect e.g. Greek and Indo-Iranian than those two. --Ivan Štambuk 17:36, 24 February 2009 (UTC)
- Now you are again beginning with the claims of obsoletion like in Talk:Macedonia. Shall I again quote the foremost authors of Indo-Germanic linguistics in order to corroborate my appeal? Besides, what do you think of tausend/тысяча Leute/ljudi and dozens of others? All Gothic borrowings? Bogorm 17:08, 24 February 2009 (UTC)
- Balto-Germanic and Balto-Slavic-Germanic hypotheses are obsolete. BSl and Germ share some exclusive trivial phonological developments (prothetic vowels before PIE syllabic */l/ and */r/), some morphology and some lexis, but not enough to justify genetic clade. Rather it was dialect continuum is post-PIE times, one that didn't last too much. There are lots of Gothic borrowings to Common Slavic (and to individual Slavic branches in post-Common-Slavic, from Balkanic Gothic e.g.), and this is the only place were these should be mentioned. Users should utilise 1-2 mouse clicks it takes to reach the Germanic cognates of the Slavic word they're interested in the appendix namespace. --Ivan Štambuk 17:01, 24 February 2009 (UTC)
Albanian
I have my strong misgivings about Albanian - it is not certain whether it is IE, it is recorded very lately and this is the main difference between it and Armenian. Additionally, I have never heard of any hypothesis invliving it and similar to the Greek-Armenian one. I suggest removing the line encouriging its insertion in ancient etymologies. Bogorm 11:37, 24 February 2009 (UTC)
- Nonsense, Albanian is definitely IE language. Greako-Armenian hypothesis is not mentioned in the guideline, I simply mentioned it here (on the talk page) as an example of individual language policy overriding the general guidelines. --Ivan Štambuk 15:46, 24 February 2009 (UTC)
- Well, proving that it is IE is no great achievement, since this is the mainstream conception. How are you going to convince me that it is archaic enough to be ompared to Lithuanian? Bogorm 16:28, 24 February 2009 (UTC)
- It's not about archaicity here, it's about lateness of attestation. Albanian is self-contained grouped of IE (like Armenian and Greek) and by all means deserves equal treatment in its own entries' etymologies. However, Albanian cognates themselves are not likely to be appropriate for ancient languages etymologies, unless they're one of the rare ones surviving. --Ivan Štambuk 16:33, 24 February 2009 (UTC)
- To put it plainly: Ancient Greek, Sanskrit, Gothic etc.. should not list Albanian as cognate if there are "better" cognate candidates available. Albanian entries OTOH may mention those ancient languages as cognates, unless there is some "Old Albanian" etymon available. --Ivan Štambuk 16:36, 24 February 2009 (UTC)
- Now it sounds convincing: no Albanian in ancient entries, but ancient entries in the etymology of Albanian ones. I agree Bogorm 16:51, 24 February 2009 (UTC)
- No unless it is allowed by rules such as those permitting by lack of cognates in branches other than Albanian, which happens from time to time. --Ivan Štambuk 17:37, 24 February 2009 (UTC)
- Now it sounds convincing: no Albanian in ancient entries, but ancient entries in the etymology of Albanian ones. I agree Bogorm 16:51, 24 February 2009 (UTC)
- Well, proving that it is IE is no great achievement, since this is the mainstream conception. How are you going to convince me that it is archaic enough to be ompared to Lithuanian? Bogorm 16:28, 24 February 2009 (UTC)
Old Norse
I suggest accepting in the policy the following rule: if the etymology section lacks OE and Gothic cognates, the Old Norse may be added, as Ivan and Atelaes already stated. Furthermore, the majority of Old Norse words coincide in spelling (especially verbs) with those in Icelandic and in these cases it is not especially motivating to create the second entry, because the meaning is identical. Therefore, in the cases where only Icelandic is præsent, I would like to add ancient Germanic cognates (Gothic, OHG) and accordingly propose that to be præsent in the policy. Any objections? Bogorm 17:00, 24 February 2009 (UTC)
- Yes, I've expressed a number of objections already. Among them is the fact that the earliest written sagas date from only the 10th century AD or later. The earliest texts already show considerable Latin influence and borrowings. Old Norse has little utility in presenting possible cognates for Latin (whose Classical period predates the earliest Old Norse documents by some eight centuries). It has even less utility for comparison with earlier Classical languages. --EncycloPetey 03:17, 25 February 2009 (UTC)
- And the earliest runes are from the 7th/8th centuries which reduces the hiatus between ON and Latin. But we are not going to argue about history. The question is: why are you not willing to allow the Old Norse cognate in those cases, where there was no OE and Gothic cognate? Bogorm 06:56, 25 February 2009 (UTC)
- I have no problem with adding Old Norse cognates to ancient entries, provided that Gothic and OE do not exist. However Latin entries should generally have Ancient Greek and Sanskrit cognates when available, and perhaps Celtic, considering their similarities, so I wonder how often there will be room for Germanic. I am a bit confused about the bit with Icelandic. Could that be clarified? -Atelaes λάλει ἐμοί 08:20, 26 February 2009 (UTC)
- W:AL could be extended with a section saying that Celtic (namely Old Irish) cognates need to be mentioned due to Italo-Celtic theory, and that other Italic cognates (Oscan, Umbrian etc.), if attested, also need to be mentioned in a separate line. Italo-Celtic is today generally abandoned as a theory of genetic relationship between Italic and Celtic, tho it is doubtless these these two branches share several exclusive common developments resulting and were areally very closely related in post-PIE times. Some modern linguists like Dybo moreover openly advocate IC theory with not so trivial arguments. So, when sa, grc, sga and ang/en cognates are listed, there wouldn't be much room for Gothic or anything else so I agree that it would be reasonable to drop it in cases such as this. --Ivan Štambuk 08:39, 26 February 2009 (UTC)
- I have no problem with adding Old Norse cognates to ancient entries, provided that Gothic and OE do not exist. However Latin entries should generally have Ancient Greek and Sanskrit cognates when available, and perhaps Celtic, considering their similarities, so I wonder how often there will be room for Germanic. I am a bit confused about the bit with Icelandic. Could that be clarified? -Atelaes λάλει ἐμοί 08:20, 26 February 2009 (UTC)
Armenian
I didn't exactly get what part of the policy does the Graeco-Armenian theory override? And, all in all, what's the consensus about (Old) Armenian cognates now? Should I add them only to Ancient Greek entries or to others as well? I am not pushing any agenda, it's just I am not a linguist and would like to know where professional linguists find (Old) Armenian cognates interesting. Finally, I would like to mention that if the proto-appendices for PIE roots where created, the energy of cognate-adding and well-meaning users IMHO could be channelled into adding cognates there, instead of overburdening etymology sections in common entries. Ivan, maybe you could concentrate on that? I understand it is a titanic toil, but you could create empty appendices like this in an accelerated manner. What do you say? --Vahagn Petrosyan 00:18, 25 February 2009 (UTC)
- Language-specific policy (e.g. Wiktionary:About Ancient Greek, Wiktionary:About Armenian) could say "because of Graeco-Armenian hypothesis endorsed by some linguisits, Greek/Aremenian cognates are particularly interesting and should always be mentioned". Only then, the rest of the guidelines would apply (at most 1 cognate per branch, always list OE if present etc.)
- OK, I'll look to create more of these like *oḱtṓw. It's not such an easy task because reconstructions vary, and I need to check various authors to find out why do they think that e.g. *oḱtṓ(w) is more probable than e.g. *h₃eḱtṓ(w), where there variations is reconstructions, was there an ablaut and which daughters generalised particular (direct/oblique) stem etc. --Ivan Štambuk 05:03, 25 February 2009 (UTC)
Ossetian
I am eager to see the Ossetian language granted a status similar to Lithuanian and Armenian, so that it can appear on all ancient cognates. The argument therefor is as follows: it is known that the Alanian language is the sole Eastern Iranian language spoken west of the Caspean Sea, which has caused significant differences in comparison to other Eastern Iranian languages such as Pashto (which in its turn is Southeastern, not Northeastern). How is the situation in the other Indo-Iranian languages: the Western ones are well repræsented with Old Persian, the Southeastern ones with Pashto, Indo-Aryan with Sanskrit. The only one of these four groups which is not repræsented is the group of the Northeastern Iranian languages (given that the classification of Avestan is uncertain). Unlike Pashto, the præcursor of Ossetian had its runic inscriptions as early as 7th century AD, while Pashto literature flourished in the 15th-16th centuries. (based on this and this articles) The uſer hight Bogorm converſation 19:08, 4 March 2009 (UTC)
Deprecate {{unk.}}
?
Can anyone explain why we have {{unk.}}
when it essentially duplicates what {{etyl|und}}
does? Or is there some subtle distinction between unknown and undetermined that I'm missing? Carolina wren 20:30, 3 March 2009 (UTC)
- Seems to me that the latter would be preferable. My guess is that the creators and users of
{{unk.}}
were unaware of{{und}}
(I was). -Atelaes λάλει ἐμοί 19:22, 4 March 2009 (UTC)
- See also the discussion at: WT:BP: 2009/July: template:und
- There several people agree that {etyl|und} is the way to go, but others (Ruakh) suggest (as I understand it) that it’s only appropriate if the language is undetermined, but the {unk.} is better if the etymology is unknown for other reasons.
- Does this seem a good distinction, and should we retain separate templates, or standardize on one or the other?
- —Nils von Barth (nbarth) (talk) 02:37, 19 December 2009 (UTC)
I don't think that that distinction is particularly relevant or of value (when browsing categories), and would rather prefer that we have the generic category for words with "unknown" etymologies where the source language (or language family) is completely unknown. On the other hand, words which have disputable etymologies (e.g. multiple possible sources) should not use it IMHO and should use multiple invocations of {{etyl}}
instead. --Ivan Štambuk 03:20, 19 December 2009 (UTC)
Looks like I'm the main user of {{unk.}}
and I don't mind switching to {{etyl|und}}
as long as it doesn't link to Wikipedia article for "Undetermined language", is not capitalized and takes an optional title= parameter for cases when I want to word the ety differently, as in ձախ. --Vahagn Petrosyan 07:10, 19 December 2009 (UTC)
User DCDuring flagged my edit again with etystub this time. Is it personal? I provided the etymology - the literal translation of 桂林 Guìlín:
From Chinese 桂 (guì) and 林 (lín) - "laurel tree".
It is what it means, no more, no less. The etymology project page is big but I don't see any contradiction. Anatoli 21:38, 30 March 2009 (UTC)
- The only things I can think of are thus: First, it's missing the standard templates, which I'll add shortly. Second, it seems like it's missing the Chinese name (i.e. a link to 桂林 in the etymology). Finally, we don't treat Chinese as a language, but rather as a language family. Do you know if it's from Mandarin, Cantonese, etc.? If you don't, then you don't, but if you do, it'd be good to have. I'll give it some formatting real quick. -Atelaes λάλει ἐμοί 22:23, 30 March 2009 (UTC)
- OK, thanks. The Chinese entry doesn't exist yet but it's OK to have it in red until it's created. The character spelling and the meaning of 桂林 are identical across the dialects and be described as Chinese, unless there is a difference between the dialects, it's more about the pronunciation, not the spelling in Hanzi. The pinyin pronunciation I provided is from the standard Mandarin, the default for Chinese entries and translations. Besides, the written form of Chinese proper names can be safely described as "Chinese". The dialectal forms can be added as *: Cantonese, etc. The Cantonese pronunciation is "gwai3 lam4". In my opinion, providing the etymology of the Chinese names doesn't need to be specific, as per which dialect. 北京 is "Northern capital" in Chinese, no matter what dialect it is, the pronunciation "Běijīng" is Mandarin, of course. Anatoli 01:09, 31 March 2009 (UTC)
- Yes, I realize that (or, well, most of it anyway :-)). But, the real question is, from what Chinese language came the English word? I imagine that the phonological differences between the languages might be the key to the answer. Now, if you don't know it, then that's fine, but it is a valid question. -Atelaes λάλει ἐμοί 01:15, 31 March 2009 (UTC)
- The current English spelling is based on the Mandarin pronunciation and pinyin romanisation - Guìlín. The other spellings are Wade-Giles: Kuei-lin, Postal System Pinyin: Kweilin; Zhuang: Gveilinz. etc. Anatoli 01:23, 31 March 2009 (UTC)
- Sweet. The entry has been updated accordingly. I've removed the etystub tag, as the ety seems fairly complete to me. -Atelaes λάλει ἐμοί 01:33, 31 March 2009 (UTC)
Reconstructed derivational morphemes
If a word is a derivation at some proto-stage, eg. nälkä, what is the proper formatting for the reconstructed (non-productiv) derivational morpheme? I'd like the outcome not to repeat "Proto-Whatever *root + Proto-Whatever *-sfx", but simply "Proto-Whatever *root + *-sfx". --Tropylium 11:21, 27 July 2009 (UTC)
- You can remove the listing of the language with head= (on second and subsequent components). Good question, thanks for asking – I’ve added this to the page!
- —Nils von Barth (nbarth) (talk) 18:48, 3 September 2010 (UTC)
Useless etymologies
Many etymologies of compound words written with a space/hyphen are just links to the base words. For example, computer language#Etymology is currently just "computer + language". Are these really appropriate? They don't seem that useful and appear more morphological than etymological. Would it be appropriate to remove that etymology section and move the base word links to the inflection line, that used to look like this:
- computer language (plural computer languages)
so that it becomes this:
- computer language (plural computer languages)
Here's an example edit. --Bequw → ¢ • τ 17:08, 6 December 2009 (UTC)
- I totally agree (although with the obvious proviso that a few multi-word terms do require further explanation). Ƿidsiþ 17:38, 6 December 2009 (UTC)
- Yes, please do the kind of edits you showed. --Vahagn Petrosyan 19:59, 6 December 2009 (UTC)
- I already do that for multi word terms, but not compounds like banknote where it would all appear to be a single blue link anyway. Mglovesfun (talk) 20:02, 6 December 2009 (UTC)
- It would be appropriate to link the word parts, but not appropriate to remove etymology sections. The section can (and often does) include information about the first use of the word and the
{{compound}}
template (often used in this situation) categorizes the entry as well. Removing an Etymology section thus eliminates the categorization, and slows progress towards the accumulation of additional etymological information. --EncycloPetey 01:59, 10 December 2009 (UTC)
- I’ve elaborated at Wiktionary:Etymology#Phrases, Compounds, Acronyms, and Abbreviations, which I think addresses all issues above (link everything except for simple phrases and hyphenated). EncycloPetey, would you suggest using {compound} for hyphenated terms or phrases as well, or is a link in the head as Bequw suggests as good or better?
- —Nils von Barth (nbarth) (talk) 01:03, 11 December 2009 (UTC)
- Yes,
{{compound}}
can work for combinations that have (1) a space, (2) a hyphen, or (3) no space because of conjoining. Other people may feel differently, but I don't see any difference as far as an etymology is concerned. I think a link in the head is also good, but I know there are community members who have disagreed on that point in the past. Some editors are opposed to that kind of linking, but there are times where the etymology and head linking may not match. I can't recall offhand one of the examples I know of that exists. The best I can come up with is an example of head linking that's necessary because it rightly isn't in the etymology, to whit fortis Fortuna adiuvat. In that entry, the etymology explains the origin of the phrase as a proverb, but doesn't bother with the meanings of the individual words or their grammar (which would just make the etymology section overly tedious and difficult to read). --EncycloPetey 02:37, 11 December 2009 (UTC)
- Yes,
- Agree with all of that, but when the etymology is computer + language it is indeed absolutely useless. Consider make + a + meal + of as an even more frivolous example. Mglovesfun (talk) 23:11, 6 April 2010 (UTC)
- But computer + language makes the positive statement that this term was formed as an English compound, and not, say, calqued like silver wedding < Silberhochzeit. It is positive data, not self-evident: it could save a hundred future editors from searching for better sources which do state the etymology, and it informs the reader. This would only be redundant in a dictionary that's finished and gone to print, where every absent etymology would positively indicate the (apparently) obvious case.
Inline etymologies
I took a peek at where {{etyl}}
is used outside of Etymologies sections. Most of the cases are in given name entries (cf. Osvald). Should these "inline" etymologies be converted to independent ety sections? --Bequw → ¢ • τ 02:56, 16 January 2010 (UTC)
- To be more precise, it should be
{{given name|male|from=Old Norse}}
, which goes into Category:English male given names from Old Norse. But the current practice is better than no link at all. Mglovesfun (talk) 23:08, 6 April 2010 (UTC)- Oh, I was unaware that
{{given name}}
could include etymological information. Doesn't that seem wrong? Shouldn't the etymology be separate and not on the sense line? A fuller etymology could trace the name back several steps. We don't do{{es-noun-m|from=Latin}}
. --Bequw → τ 03:04, 7 April 2010 (UTC) - As
{{given name}}
and{{surname}}
don't actually display the from= parameter, an etymology section is still needed to enlighten the reader to the knowledge already present in the info. And if there's a known etymon, all the more reason.- I don't see this as a huge problem, but would agree that it would be preferable to have the info in a separate ety section, for consistency's sake. It should be noted that there are other uses for inline etymologies, such as etymological info unique to a particular sense. I believe Widsith has been doing some experimentation with this, largely confined to first attestation date. However, I don't think our current format holds such info very well, though I am exploring some new formatting options which might alleviate this. -Atelaes λάλει ἐμοί 21:23, 11 April 2010 (UTC)
- Oh, I was unaware that
Common -> Proto
Are there cases where we should use "Common ..." (eg "Common Slavic") instead of "Proto ..."? If not, for consistency, and concise {{proto}}
usage, I'd like to edit entries to use "Proto". --Bequw → τ 19:15, 5 March 2010 (UTC)
- Converted all the
{{proto}}
usages. --Bequw → τ 22:42, 6 April 2010 (UTC)
- Common refers specifically to the last stage of a proto-language, while Proto- can refer to any of the intermediate reconstructible stages. See usage notes at [[Common Slavic]]. --Ivan Štambuk 22:50, 6 April 2010 (UTC)
- Hmmm, is the case for any other (or all other) languages? My understanding was that Common Germanic was the synonymous with Proto Germanic. I'll revert my changes for those that differ. --Bequw → τ 01:51, 7 April 2010 (UTC)
- When a distinction is made, the usual convention is that "Common" indicates a feature that postdates the dialectal splitting of a language or a family into dialects (and that has spread across the varieties by diffusion), while "Proto" indicates a feature that has been inherited from the assumed single common ancestor of the varieties. E.g. e-mail, pasta or veal are "Common English", yet not "Proto-English" — having been absent from Old English.
- On the other hand, it is often, if not impossible, then at least extremely difficult to determine the precise age of any word that is not strictly inherited from an older stage yet, which makes this a largely useless distinction for the discussion of lexemes. --Tropylium (talk) 10:35, 19 November 2014 (UTC)
What, in real terms, is the difference between Vulgar Latin, Late Latin and this? Proto-Romance derivations has 3 members right now, could we move them? Mglovesfun (talk) 23:45, 29 June 2010 (UTC)
- Ultimately, there is no real difference between Vulgar Latin and Proto-Romance, except that the latter is perhaps a bit more vague. I would argue that the three instances should be changed. -Atelaes λάλει ἐμοί 23:12, 1 July 2010 (UTC)
Folk etymology logistics
As WT:Etymology#Folk etymologies states that folk etymologies should be on the talk page I was wondering if there was a standard way of linking to them. Should we use something like {{seeTalk}}
? Should we have a standard header name on the talk page (eg "<language name> folk etymologies") so that we can link right to it? --Bequw → τ 18:52, 1 July 2010 (UTC)
- I don't know if "folk etymologies on the talk page" is really the best suggestion. Simply put, we just don't use the talk pages....for anything. What we really want is an expandable etymology format, so that we can put less pertinent info in the etymology without bogging it down for normal users. -Atelaes λάλει ἐμοί 23:13, 1 July 2010 (UTC)
- I think if notable enough (whatever that means) they should be in the etymology itself. For example I saw Fornication Under Consent of King on Sexcetera. Mglovesfun (talk) 23:16, 1 July 2010 (UTC)
- Maybe a
{{rel-top}}
-like box then. Anything else that we'd want to put in an expandable section? --Bequw → τ 00:26, 2 July 2010 (UTC)
- I wonder if we might need something a bit more robust than
{{rel-top}}
. Etymologies aren't simple lists, and we want to have at least some of the content displayed initially. -Atelaes λάλει ἐμοί 00:55, 2 July 2010 (UTC)
- I wonder if we might need something a bit more robust than
What do you mean given the correct etymology? How do you know what the correct etymology is? Ok, a lot of folk etymologies are rubbish, but surely some folk etymologies may be right? 89.243.163.176 22:02, 21 September 2010 (UTC)
- I think it's circular. If a folk etymology is shown (or rather supported) to be true, it's no longer folk. SFAICT a lot of etymology is based on comparative linguistics. There are a core of cases where the word is attested from (for example) Latin, to Old French, to Middle English to English. In many cases though, etymologies are just 'our best guess'. Mglovesfun (talk) 22:07, 21 September 2010 (UTC)
Long quotations in etymologies
I think many quotations in etymology section are too long (e.g. 絕纓 and serendipity). If the quote is large I think the policy should be to hide it by default and just show the citation. Alternatively, it could go in a <ref>-tag and be shown in the references sections. --Bequw → τ 19:03, 19 July 2010 (UTC)
Glosses or numbers when reffering to etymologies
Many entries reference etymologies by number (either in prose or links). Are the etymology numbers stable enough that we should condone this, or should we push for textual qualifiers (like we glosses for word senses)? --Bequw → τ 19:50, 19 July 2010 (UTC)
Ancestor and current language the same
We don't have anything saying that we don't allow Category:fr:French derivations. It would be possible, though of debatable worth, for {{etyl}}
to call attention when the first and second parameters are the same (in this case, {{etyl|fr|fr}}
). Mglovesfun (talk) 11:45, 10 January 2011 (UTC)
- There's precedent in the deletion of Category:es:Spanish derivations and problems are (haltingly) picked up in Wiktionary:Todo/etyl problems. Would it be helpful to pick up the mistakes via categories? --Bequw → τ 13:58, 11 January 2011 (UTC)
- It would be nice to have something saying we don't accept them. Could save arguments in future. Mglovesfun (talk) 14:02, 11 January 2011 (UTC)
- Ah, hopefully my recent edit clarifies that. Wasn't sure exactly where to place that note, though. --Bequw → τ 17:17, 12 January 2011 (UTC)
- Oh I just added something too. I'll check I haven't duplicated your edit. Mglovesfun (talk) 15:39, 22 January 2011 (UTC)
- Ah, hopefully my recent edit clarifies that. Wasn't sure exactly where to place that note, though. --Bequw → τ 17:17, 12 January 2011 (UTC)
- It would be nice to have something saying we don't accept them. Could save arguments in future. Mglovesfun (talk) 14:02, 11 January 2011 (UTC)
Examples
From the page:
====Examples==== In the entry {{m|en|antibody}}: ===Etymology=== {{prefix|anti|body}}, a calque of {{cog|de|-}} {{m|de|Antikörper}}.
Wouldn't you want to categorize in Category:German derivations in this case? Why not? It's derived from German. Mglovesfun (talk) 14:07, 3 February 2011 (UTC)
- Would, too.—msh210℠ (talk) 17:48, 3 February 2011 (UTC)
Old English Cognates
From the page
# In case of ancient languages, Old English (Anglo-Saxon) is an exception and should always be listed if it is a cognate, including the modern-English reflex in parentheses.
Sorry what? I think it means Old English should always be listed as a cognate for any cognate in any language. Including English, too. So, why, and who added it, and why? Mglovesfun (talk) 14:09, 3 February 2011 (UTC)
- Not sure who added it, but I think the idea, shared by several etymology editors (not necessarily including me, I'm ambivalent really), is that because this is the English Wiktionary, then the English or Old English cognate of a foreign word should always be shown if it exists. Ƿidsiþ 14:46, 3 February 2011 (UTC)
- Do we still feel that way, then? I don't really see the point. Oh and it doesn't mention Modern English, or even Old English for that matter. Mglovesfun (talk) 10:12, 4 February 2011 (UTC)
- Probably not. I see no point in including an O.E. cognate in a Dhivehi entry's etymology section. -- Prince Kassad 10:15, 4 February 2011 (UTC)
Where to put etymologies
I frequently stumble over etymologies like Russian магазин, where the complete history of borrowing is explained in one article. This is nice for readers interested in the Russian word but it leads to the absurd situation that the inner-Arab relationship (mahazin vs. mahzan vs. hazana) is explained in the Russian article but the intermediate step, Italian magazzino has no etymology section at all.
In a similar way, English loan words may have their complete history of borrowing (French, Latin, Greek) described at the English word, even if the French, Latin, Greek article exist. This leads to the situation that the structure of an Ancient Greek compound may be described in the English article and omitted in the Ancient Greek article.
One possible solution is copying around the nearly complete borrowing path from one article to the other one, for example from Russian магазин to Italian magazzino. But this will lead to much duplicate information and to many duplicate errors.
The alternative could be: to describe only one step of borrowing, namely the immediate source of borrowing. Example: in the English article refer to the French word, in the French article (if that exists) refer to the Latin word, in the Latin article (if that exists) refer to the Greek one etc.
The current version of the guideline proposal seems not to say what to do with such cases (at least I couldn't find it). It recommends, however, the complete history of inherited words.
Therefor, I suggest that borrowed words should describe only the path to the immediate source (e.g. English < French). If the immediate source has no article one should describe two steps (e.g. English < Old Cornish < Latin) and so on.
What do you think? --MaEr 10:53, 11 September 2011 (UTC)
- As a user, and ignoring technical considerations, I would prefer each etymology to show only the immediate predecessor, but with a "More" link that would expand the etymology as far as possible, going back through as many languages as we know about. Of course we can't do this now without tiresome duplication; presumably it would require a complicated database of linked etymology records. That's my preference though in an ideal world. Equinox ◑ 11:03, 11 September 2011 (UTC)
- (We have discussed this before, somewhere.) I think English words' etymologies should definitely include all the steps we know. In our capacity as an English dictionary, we want to provide etymologies without making people click a dozen links to get to the PIE (or whatever). I don't see the harm in having the same for other entries, except the occupation of screen space, which can be solved by having etymologies beneath the definitions. Since we currently put etymologies above the definitions, I'll say foreign words should also include a full chain of words except in the case that an English word is an ancestor (or loan/calque source), in which case stop there and let people read the English entry (or perhaps say "... from English foo, eventually from Ancient Greek φυ", without the intervening steps).—msh210℠ (talk) 16:19, 11 September 2011 (UTC)
- Thank you for your comments. I understand that there is no easy solution and probably no consensus.
- Equinox, I have a similar dream for my ideal world, knowing very well that this solution will never come :-/
- Yes, Msh210, there has been a similar discussion before: Wiktionary:Beer_parlour_archive/2007/May#Duplication_in_etymologies, without a clear result. I had hoped that things would be clearer now than four years ago.
- --MaEr 12:53, 17 September 2011 (UTC)
- My preference: Let each etymology chain be as complete as possible. If not that, then let each etymology chain be rather long, so that as few clicks as possible lead to the complete etymology chain. I don't mind duplication in etymologies; one only has to be clear for each link of an etymology chain about which entry is the master record and which is a copy. --Dan Polansky 07:15, 22 September 2011 (UTC)
Conflicting etymologies
(Possibly not the best place for this discussion, but...) What is one supposed to do when multiple entries give conflicting etymological information? We have quite a number of situations like these. (An example: Appendix:Proto-Germanic/bōks gives English book as a descendant, as well as Old Norse bók > Danish bog, and the entry book's etymology says it's from enm book < ang bōc < Proto-Germanic bōks < PIE *bheh₁g̑ós, and the Danish entry bog says it's from Old Norse bók < PIE *bʰeh₂go-. Thus, either the Proto-Germanic entry is mistaken in its list of descendants, or the English entry is mistaken in its spelling of the PIE etymon, or the Danish entry is mistaken in its etymology or just misspelled the PIE word and left out the intermediate Germanic step.) Perhaps we should have a template for adding to conflicting etymologies, that could list the page(s) that appear to conflict, and add the pages to a category? --Yair rand 07:45, 12 January 2012 (UTC)
- I don't understand why the PGmc entry might be mistaken in its list of descendants, but it is true that the English entry is mistaken in its spelling of PIE (it has to be h₂, not h₁). The Danish entry is correct but leaves out the PGmc step. There are definitely cases of conflicting etymologies, but here it's just a matter of a mistake (*bheh₁g̑ós instead of *bheh₂g̑ós) combined with differing typographical conventions (there's no meaningful difference between "bh" and "bʰ" in PIE etyma, it's just a matter of taste) and differing editorial decisions about how many intermediate levels to show. AFAICT, *bheh₂g̑ós has direct reflexes only in centum languages (Greek, Latin, and Germanic) so we have no way of knowing whether it's "g̑" or plain "g" in the middle. —Angr 10:19, 12 January 2012 (UTC)
- (Based on just the info given on the pages, there was no way to know whether perhaps just the PGmc entry was mistaken in listing the Danish word as a descendant, and both the Danish and English entries were accurate.) We do have many entries giving conflicting spellings, and we really need to keep them consistent, since they need to link to the same entry at some point. --Yair rand 00:19, 13 January 2012 (UTC)
Indicating Borrowings
This is mainly regarding English, since this is English Wiktionary. Should every English word of Latin origin (often through Old French or Anglo-Norman) be indicated as "borrowed" within the etymology? For now I see that the vast majority (and there are many) are not. The question is if this is necessary though. I'd imagine most people would know that English is not descended from Latin and that most words were borrowed from it, either through French or as a scientific or academic loan word, but would it possibly confuse some people into thinking it is a naturally inherited term if it doesn't have "borrowed" in the etymology? And would an origin in Old French or Anglo-Norman really be considered a borrowing just because it's not of the same source as Old English/Anglo-Saxon? The same goes for, say French words of Frankish or Germanic origin: should they say "borrowed from" or not?
Another related issue is whether or not to indicate borrowing in all the Romance languages. In reality, very large portions of even Romance languages were borrowed from Latin in the Middle Ages as learned terms. Probably less than half of the overall lexicon in many is actually inherited or truly native, though it does of course account for almost all the core. day-to-day vocabulary. But is it useful to indicate on all the thousands of borrowed words that they are? I'd imagine this would take quite a while. I understand it's useful when there's a doublet of an inherited word vs. a borrowed one from the same source, thus showing a comparison.
Also, in the descendants section of Latin words, the vast majority do not differentiate between borrowings and inherited forms. While I could see this as useful, it would also make those sections look really cluttered, especially if there's (borrowed) next to every non-Romance language descendant, like German, Russian, etc. I'm just wondering what others' thoughts are on this Word dewd544 18:01, 13 February 2012 (UTC)
- I used to make indented descendants sections to indicate the difference between inherited and borrowed words [3], but I was told it's better to have them alphabetically listed. Maybe the
{{italbrac}}
template could be used, for example:
Phrases, Compounds, Acronyms, and Abbreviations section
This currently says not to use {{compound}}
if the term is written with parts separated by spaces or hyphens. Is that a good idea? If we don't use {{compound}}
on such entries, they won't end up in Category:English compound words or the equivalent. So I think it would be desirable to include it regardless. —CodeCat 21:32, 29 November 2012 (UTC)
- I disagree. That's what the argument
head=
in{{head}}
is for. The user can be assumed to understand that the links go to the constituents. —Μετάknowledgediscuss/deeds 21:12, 30 November 2012 (UTC)- So what about the category? —CodeCat 22:23, 30 November 2012 (UTC)
- It can be added manually. Templates don't have to do all the work around here. —Angr 18:52, 6 December 2012 (UTC)
- So what about the category? —CodeCat 22:23, 30 November 2012 (UTC)
Separate categories for borrowed and inherited words
So far these are all covered as "derivations", giving rise to
- nonsensical categories such as Category:Japanese terms derived from Proto-Indo-European
- inability to separate truly inherited words from borrowings in languages that the language donor and acceptor belong to the same family. E.g. Latin borrowings in Romance languages that have inherited reflex as well, or e.g. words in any IE language borrowed from Ancient Greek or Latin (of which there are plenty!) that etymologize all the way to the PIE root, which then cause Category:Xxx terms derived from Proto-Indo-European or Category:Xxx terms derived from Latin containing both inherited words as well as borrowings.
These need to be clearly separated, either by a special switch to {{etyl}}
(why is not that template moved to {{etym}}
?!), or by some "intelligent" behavior that would recognize that e.g. Japanese can't possibly have terms inherited from PIE. --Ivan Štambuk (talk) 08:25, 21 August 2013 (UTC)
- Or we could just yell at people who take the etymology of e.g. Japanese アーク all the way back to PIE. The chain should stop at English. --Vahag (talk) 08:35, 21 August 2013 (UTC)
- If we decided to stop at the last language before the proto-term, that would also solve #2 in cases when chains would involve proto-terms (as well as deprive us of some "deep" etymologies/cognates), but would not resolve the borrowed/inherited ambiguity for for Romance languages borrowing from Latin in post-VL times, newer Sanskritisms in Indo-Aryan languages, and similar. --Ivan Štambuk (talk) 10:34, 21 August 2013 (UTC)
- I would support limiting etymologies to displaying only inherited terms. Borrowing would "break" the chain of inheritance so it would normally be the last term displayed; any further etymology should be looked up at that term, to avoid too much duplication. However, it is sometimes useful to elaborate on the morphology of a borrowed term. For example, aardvark is borrowed from Afrikaans, but it's useful to show that the word consists of two parts meaning "Earth pig" in Afrikaans. And what about despot, which is borrowed from Greek which in turn inherited it directly from a fossilised PIE phrase? Should the etymology despot show this PIE meaning or not? —CodeCat 17:14, 21 August 2013 (UTC)
- I have nothing against explaining etymologies in terms of etymon's morphology, or extending derivational chains to arbitrary limits; it's just categorization that bothers me which creates nonsensical (#1 above) or useless categories (#2, you cannot fetch a list of inherited terms from a category because the category is polluted with borrowings which ultimately happen to inherit from the same proto-language). --Ivan Štambuk (talk) 17:27, 21 August 2013 (UTC)
- Then let's propose rules... The derivation categories are to be used for either
- A word that originated from that language without borrowing, or
- A word that was borrowed directly from that language, but not from any of its descendants.
- That would exclude borrowed terms from appearing in the PIE categories at all. Borrowing and inheritance from the same language (Romance from Latin) is not distinguished by this, I don't know if we should distinguish them. If we do, that would mean modifying the category structure which is a bit more complicated than just removing entries from them. —CodeCat 17:32, 21 August 2013 (UTC)
- A problem arises if the borrowing is still a red link; AFAIK our Lower Sorbian entry šołta is the only place where the etymology of a pan-West Germanic word is provided. If the etymology simply stopped at the Low German borrowed word, the rest of the etymology wouldn't be anywhere. —Angr 17:34, 21 August 2013 (UTC)
- Also, Category:Japanese terms derived from Proto-Indo-European isn't nonsensical, because "derived from" doesn't mean "inherited from". アーク is a Japanese word and it can be traced back to PIE, therefore it's a Japanese term derived from PIE. Distinguishing words that Romance languages have inherited from Latin from those borrowed from Latin would be good, but it isn't always possible to tell. Some words would have the same form either way, so all we can say is they "come from" the Latin word without specifying whether that's by borrowing or inheritance. —Angr 12:40, 22 August 2013 (UTC)
- Our derived means both "inherited" and "borrowed", and that's why it's useless. For languages that have little inherited vocabulary left (English, Armenian..) that might make sense, but for the rest, not quite. I find it both pointless to see Japanese terms deriving from PIE, Albanian from Proto-Slavic etc. as well ancestor (proto-)language etymological categories containing both directly inherited words, and those that were mediated through some borrowing. --Ivan Štambuk (talk) 20:44, 23 August 2013 (UTC)
- Then let's propose rules... The derivation categories are to be used for either
- I have nothing against explaining etymologies in terms of etymon's morphology, or extending derivational chains to arbitrary limits; it's just categorization that bothers me which creates nonsensical (#1 above) or useless categories (#2, you cannot fetch a list of inherited terms from a category because the category is polluted with borrowings which ultimately happen to inherit from the same proto-language). --Ivan Štambuk (talk) 17:27, 21 August 2013 (UTC)
- I would support limiting etymologies to displaying only inherited terms. Borrowing would "break" the chain of inheritance so it would normally be the last term displayed; any further etymology should be looked up at that term, to avoid too much duplication. However, it is sometimes useful to elaborate on the morphology of a borrowed term. For example, aardvark is borrowed from Afrikaans, but it's useful to show that the word consists of two parts meaning "Earth pig" in Afrikaans. And what about despot, which is borrowed from Greek which in turn inherited it directly from a fossilised PIE phrase? Should the etymology despot show this PIE meaning or not? —CodeCat 17:14, 21 August 2013 (UTC)
- If we decided to stop at the last language before the proto-term, that would also solve #2 in cases when chains would involve proto-terms (as well as deprive us of some "deep" etymologies/cognates), but would not resolve the borrowed/inherited ambiguity for for Romance languages borrowing from Latin in post-VL times, newer Sanskritisms in Indo-Aryan languages, and similar. --Ivan Štambuk (talk) 10:34, 21 August 2013 (UTC)
Transliteration
Is it actually Wiktionary policy not to mention transliteration in etymologies?
Specifically, in foreign terms borrowed from Chinese, should one not mention whether the spelling use derives from pinyin, Wade, Postal Map, or whatever else? I really cannot see why it improves the dictionary to actively remove such information, but cf. baijiu where editors have repeatedly tried to remove the offending information, despite the presence of Template:transliteration, its mention on other pages, the obvious use of the knowledge, etc.
The editors involved seem very sure of themselves, despite there being no actual policy here to support them. If they're right, you should go ahead and add that information here, maybe in a "Chinese" or "East Asian languages" section. Lemme know. LlywelynII (talk) 05:37, 7 November 2013 (UTC)
- The practice have always been to use native scripts, not romanisation:
{{etyl|cmn|en}}
{{term|白酒|lang=cmn|tr=báijiǔ}}
(unlinked romanisation displayed as (báijiǔ)). Further User_talk:Atitarev#Baijiu, Talk:baijiu. --Anatoli (обсудить/вклад) 05:54, 7 November 2013 (UTC)- Well, the native script should be supplemented by a transliteration for the benefit of those who can't read the native script, so it should say "From Mandarin 白酒 (báijiǔ)", but the current wording "From the pinyin transliteration báijiǔ of Mandarin 白酒 (lit. "white liquor")" is absurd. Languages borrow words from languages, not from writing systems, and it's really superfluous to mention it when the English word uses the pinyin spelling (stripped of diacritics) anyway. If the English spelling were, say, pai-chiu, then it might be worth mentioning that the English spelling was based on the Wade-Giles transliteration, but as it is, "From Mandarin 白酒 (báijiǔ)" is correct and sufficient. —Aɴɢʀ (talk) 09:49, 7 November 2013 (UTC)
Language section split based on etymology
Analyzing the "split" in distinct etymologies of hull and husk made me wondering if there is a single word in any language in which we can have more than one meaning in one etymology section. The existence of split of the meanings based on (sometimes unknown) etymologies indicates, to any reader, that when a meaning is inside an etymology section than the word is definitely derived from that etymology. Hence having senses such as "figuratively" means the word was derived not from the above it mentioned meanings but from the same etymology as the etymology section indicates (adding something like "a bottle full of the number x meaning" transfers portion of the etymology section inside the "meaning" section possibly making the meaning more "unclear").
Words having meanings that derived from one meaning (of the same word) by extention or figuratively should have their own etymology section. That section should state the fact that they are derived from a specific meaning (maybe explaining how their "figurative" or "extended" meaning is derived from that word, which in most cases is obvious, but not always). There are also meanings used only in some English spoken countries (say: UK or USA or Australia or even some US State) which, for sure, have different etymological reason for stating that particular "thing" in that country (or region) in difference from the other countries (or regions).
But this will break the whole idea of searching in a dictionary for the explanation of a word. ("by the way", one may not be sure if it is a verb, a noun, a preposition or a proverb if has found her in a small phrase).
- Searching (as a simple reader, such as a student of basic education school, not a professor) in a dictionary for a meaning I want to find all meanings in one place.
- Searching (as a simple reader) in an etymological dictionary for etymologies is another thing.
- Searching (as a simple reader) in a thesaurus to find antonyms or synonyms or phrases is yet another thing. --Xoristzatziki (talk) 05:11, 21 July 2014 (UTC)
(I wrote this to state there is an inconsistence in etymology split and not for provoking more splits in etymology) --Xoristzatziki (talk) 05:15, 21 July 2014 (UTC)
How to not duplicate work
I feel adding a section to note that etymological sources should primarily be applied to the language the source is talking about would be handy. E.g. if sourcing the claim that a Turkic word is an old loan from Indo-Iranian, this will be better explained and sourced in detail at a Proto-Turkic appendix page, not individually on our Turkish, Turkmen, etc. entries; though as long as said Proto-Turkic entry has not even been created, leaving the source on an individual language's page is probably OK. In the case of attested languages this is definitely clear, i.e. etymologies for Romance languages should simply refer to Latin words, not spend space discussing Latin derivative processes or Proto-Italic developments. Any opposition / comments? --Tropylium (talk) 13:29, 21 February 2016 (UTC)
- I agree wholeheartedly. I already move the sources to the ancestor language and say "see there for more" for Armenian and Iranian languages. For other languages people may get upset.
- E.g. the etymology chain in Japanese アーク is completely inappropriate. --Vahag (talk) 15:57, 21 February 2016 (UTC)
- I'd probably keep the link to Latin in that case, but as there are no sources there that would be making particularly noteworthy claims, it's a slightly different issue from what I was thinking of anyway. --Tropylium (talk) 20:58, 21 February 2016 (UTC)
- OK, a better example. The sources and comments at börü should be at the Proto-Turkic page only. --Vahag (talk) 21:13, 21 February 2016 (UTC)
- Yes, agreed: that's a nice selection of sources but at an inappropriate page.
- We have had users and editors who are less interested in etymology complain before about overlong etymology sections, and splitting discussions like these to etymology appendices would probably help with that. --Tropylium (talk) 20:02, 22 February 2016 (UTC)
- OK, a better example. The sources and comments at börü should be at the Proto-Turkic page only. --Vahag (talk) 21:13, 21 February 2016 (UTC)
- I'd probably keep the link to Latin in that case, but as there are no sources there that would be making particularly noteworthy claims, it's a slightly different issue from what I was thinking of anyway. --Tropylium (talk) 20:58, 21 February 2016 (UTC)
- Yeah, it's depressing when someone edits a modern word like hydroplane to explain the Greek roots. It's from hydro-. No Greek person invented the word. Equinox ◑ 22:09, 21 February 2016 (UTC)
RFC discussion: June 2006–October 2007
The following discussion has been moved from Wiktionary:Requests for cleanup (permalink).
This discussion is no longer live and is left here as an archive. Please do not modify this conversation, but feel free to discuss its conclusions.
Can someone who knows a thing or two about etymology do some work on this, please? At the moment it looks like it was written by a primary-school pupil as an English project (apologies to whoever did write it, but it is somewhat lacking in content and quality) and doesn't help at all explain how etymologies should be researched and entered into Wiktionary.
Particularly bad is "Coined expressions: Some expressions have been coined by somebody." Aside from the fact that all terms were coined by someone (they don't spring fully formed out of thin air), saying "This term was coined by X." is not an etymology. It is part of the word history. Etymologies indicate how a word was constructed, not who thought them up.
At the moment this page is almost completely useless. Please can we work on making it useful.
Rant over. — Paul G 10:17, 14 June 2006 (UTC)
- Huh, interesting. I've always confused them as well. Where should we include word histories, e.g. the origin of Tarzan? ∂ανίΠα 19:04, 14 June 2006 (UTC)
I have rewritten this page more or less from scratch. It's still not perfect, but hopefully somewhat better. Dav - word histories can go in Etymology, Paul's point (I think) is that where possible we should show the apparent reasoning behind such coinages. Not possible with Tarzan, but see for example chortle or orgone. Widsith 22:23, 14 June 2006 (UTC)
Also updated recently to show correct format and to make it fairly self-explanatory when words are italicised too. Added section Etymology language templates and usage.--Williamsayers79 15:25, 27 February 2007 (UTC)
- This page been tidied up somewhat since it was nominated. removed rfc --Williamsayers79 13:58, 16 October 2007 (UTC)
typo in Etymology
"undertermined" should be "undetermined". 189.228.99.251 03:09, 6 July 2017 (UTC)
- Fixed, thanks. —Mahāgaja · talk 05:01, 27 May 2020 (UTC)
"Surface analysis"
is nonsense that links here at the top of Google and doesn't show up on Google scholar much > at all. The etymologies should be reformated back to their standard and sensible "equivalent to", regardless of what term we agree to call the process... which doubtless has a better name than this. — LlywelynII 00:24, 27 May 2020 (UTC)
- Personally, I always write "synchronically analyzable as". —Mahāgaja · talk 05:01, 27 May 2020 (UTC)