Talk:toleratie

From Wiktionary, the free dictionary
Latest comment: 10 years ago by Dan Polansky in topic toleratie
Jump to navigation Jump to search

The following information passed a request for deletion.

This discussion is no longer live and is left here as an archive. Please do not modify this conversation, but feel free to discuss its conclusions.


toleratie

[edit]

Wrong spelling of tolerantie. Should be deleted.15:50, 10 July 2013 (UTC)

  • Keep as an obsolete spelling or a common misspelling. This Dutch spelling is plentifully attested in Google books. The search in Dutch books of toleratie finds 90 hits, while the search in Dutch books for tolerantie finds 54,100 hits. The ratio of the two numbers is 601, which suggests a common misspelling to me; compare to the ratio of "conceive" vs. "concieve". --Dan Polansky (talk) 18:13, 10 July 2013 (UTC)Reply

Anyway, here is a relevant snippet:

Term 1 Term 2 Ngram Frequency Ratio
in Year 2000
beleive believe Ngram 3349
beleiver believer Ngram 22913
aquitted acquitted Ngram 433
aquire acquire Ngram 1075
arithmatically arithmetically Ngram 441
concieve conceive Ngram 1494
recieve receive Ngram 1874
bibiliography bibliography Ngram 2920
assidious assiduous Ngram 1084
bizzare bizarre Ngram 396
athiest atheist Ngram 561
condensor condenser Ngram 99
concensus consensus Ngram 341
accross across Ngram 5097

--Dan Polansky (talk) 19:09, 10 July 2013 (UTC)Reply

To judge from your data both here and on your talk page, I'd say a misspelling needs a frequency ratio < 100 to be considered common. —Angr 19:28, 10 July 2013 (UTC)Reply
Thus, of the examples listed in the above table, only "condensor" is a common misspelling per your assessment. My assessment differs: "recieve" is a prototypical common misspelling by my lights, and it has frequency ratio of 1874. In Google web search, "recieve" has 32,700,000 hits. Furthermore, I see no reason to believe that copyedited Google books material should contain common mispelling in ratios less than 100. Be it as it may, you still have not listed your prototypical common misspellings with their frequency ratios. --Dan Polansky (talk) 19:46, 10 July 2013 (UTC)Reply
Yes, and of the ones listed on your talk page, "referencable", "experiencable", "influencable", "sequencable", "idiosyncracy", and "supercede" are. As for what I consider to be common misspellings, that's a bit hard to judge, but two words I often misspell myself are separate and existent. I'm not sure how to read Google's Ngrams, but if I've read them correctly, then this says the ratio of seperate to separate is about 1:1030, while this says the ratio of existant to existent is about 1:52. So do that mean I have to increase my maximum frequency ratio for misspellings to be considered common? Not at all; it means seperate isn't as common a misspelling as I thought, so if someone were to nominate it for deletion, I'd vote delete. I'm surprised that existant is so much more common, though, and I wonder whether the French and Latin words are perhaps showing up in the results despite the "from the corpus: English" setting. Maybe the French and Latin words are showing up in quotes inside otherwise English-language texts. At any rate, the results are making me very skeptical of the reliability of Google Ngram Viewer as a reliable linguistic corpus analysis tool. There are so many corpora of written English out there, surely someone has analyzed some of them following proper statistical procedure to estimate the frequency of various misspellings. —Angr 21:15, 11 July 2013 (UTC)Reply
What is the basis of your choice of the frequncy ratio in copyedited Google books of 100 as a threshold? What possible factual observation could shake your choice of that ratio? What about the results is making you very skeptical, as per your statement above? Given your doubt, have you considered looking at other corpora, such as COCA, BNC or even world wide web? --Dan Polansky (talk) 15:41, 12 July 2013 (UTC)Reply
"What possible factual observation could shake your choice of that ratio?" Probably nothing; any spelling that occurs less than 1% of the time the word is used is simply too rare to be called "common". I would much prefer we use a real corpus like COCA or BNC, but there too I would want the frequency threshold to be at around 1%. —Angr 15:29, 13 July 2013 (UTC)Reply
Do you agree with the following statement? 'Any spelling that occurs less than 1% of the time the word is used in a copyedited corpus is simply too rare to be called a "common misspelling".' Is there any further reasoning or evidence that you could provide in support of that statement? Let me emphasize that we are talking about common misspellings, not common spellings. --Dan Polansky (talk) 18:25, 13 July 2013 (UTC)Reply
Yes, I agree with that statement. My reasoning is based on sense 3 of common: "Found in large numbers or in a large quantity". If we're going to call something an alternative but correct spelling, it had better occur far more often than 1% of the time. —Angr 18:40, 13 July 2013 (UTC)Reply
Yes, correct spelling. But we are discussing threshold for common misspelling, not common correct spelling. The threshold is not for an alternative spelling tag to be used but rather for common misspelling to be included. --Dan Polansky (talk) 19:23, 13 July 2013 (UTC)Reply
But we're still calling them common misspellings. I don't think a misspelling is common unless it slips past a copyeditor at least 1% of the time. I really don't think that's an unreasonably high threshold. —Angr 13:09, 14 July 2013 (UTC)Reply
Well yes, Google books for lang=en for existant finds the term in many French and Latin books or snippets. Whether the results are similarly skewed for other spelling pairs can be discovered for each pair by having a glance at what the links present on the Ngram page show in Google books search. Actually, the suspect low ratio of around 50 suggested such a glance was worthwhile. I find it very likely that, for most investigated spelling pairs, there is no such skewing. --Dan Polansky (talk) 16:08, 12 July 2013 (UTC)Reply
Wouldn't toleratie be equivalent to toleration not tolerance? Could it be a separate word, not a misspelling? Mglovesfun (talk) 20:28, 10 July 2013 (UTC)Reply
  • And any chance it might be dated? The bulk of hits popping up at google books:"toleratie" "geen", for instance, are not terribly recent. Limiting that search to the 21st century produced only two hits. FWIW. ‑‑ Eiríkr Útlendi │ Tala við mig 21:39, 10 July 2013 (UTC)Reply
    • As I said, "Keep as an obsolete spelling or a common misspelling." If it is not obsolete, then maybe dated. If it is a misspelling, then a common one. I don't feel qualified to decide whether this is an obsolete form, a dated form or a common misspelling. --Dan Polansky (talk) 17:40, 11 July 2013 (UTC)Reply
      I've never seen this word anywhere, but Mglovesfun is right, this would be "toleration" rather than "tolerance". So I don't think this can be considered a misspelling for sure, it's quite possibly another formation, which happens to be very rare. Of course we can't tell the difference in this case because they're still only one letter apart. —CodeCat 15:53, 12 July 2013 (UTC)Reply
@DP: Obviously the frequency ratio of alt/misspelling to unmarked (prevailing) spelling alone is not sufficient evidence to choose among the classifications and presentations we use for current spellings: unmarked, "alternative spelling" and "common misspelling". Some weighting by its frequency in the corpus as a whole or by the absolute number of occurrences of the spelling in the corpus is needed. The natural log or square root of such frequencies or absolute numbers would give the right shape to a criterion curve, though it would need to be calibrated to reflect our judgement, preferences, or whims.
What Dutch corpora are there that reflect adequately misspellings? Is there a way to use Google searches of the web to be reasonably sure that one is counting mostly occurrences in Dutch text? Can we create a template for each language that would provide a way of having consistent searches for this purpose? DCDuring TALK 16:35, 12 July 2013 (UTC)Reply
If you can create a compelling multi-factor method, that's great. In the absence of a presented specific alternative method complete with factor weights, the presented single-factor method using a reasonably reliable and already copyedited corpus is compelling to me. You can constrain a Google books search by language, which I have done for Dutch; the results seem reasonably reliable overall, with some skewings and glitches. --Dan Polansky (talk) 16:50, 12 July 2013 (UTC)Reply
Accordingly, Delete as too rare a misspelling,if misspelling it is. DCDuring TALK 17:09, 12 July 2013 (UTC)Reply
After your musings about multi-factor logarithmic analysis, what is the basis for your claim of "too rare a mispelling"? What method and threshold have you used? --Dan Polansky (talk) 17:22, 12 July 2013 (UTC)Reply
I cry foul: an editor requests a complicated method and when he does not get any, votes upon a whim with providing no method whatsoever. --Dan Polansky (talk) 17:38, 12 July 2013 (UTC)Reply
To DCDuring: well, as I noted above, there is no way to tell if it's a misspelling or an independently formed (but rare) word. So there is no grounds for considering it a misspelling that I can see. —CodeCat 17:41, 12 July 2013 (UTC)Reply
Keep as a non-misspelling per Mglovesfun and CodeCat. — Ungoliant (Falai) 17:48, 12 July 2013 (UTC)Reply
@DP: All of our decisions on misspellings are whimsical as we have no express criteria of any kind, quantitative or otherwise. In particular, we have no accepted criteria for determining what makes a misspelling common. Accordingly, I whimsically determine that this is too rare at less than 1%. But 1% is not in particular my criterion, nor of any implicit consensus, AFAICT. I don't think that we have ever accepted a challenged misspelling with such a low frequency, but facts could prove that wrong. I was willing to contemplate other criteria using this as a test case, but it does not have the makings of a good test case, IMO. DCDuring TALK 18:12, 12 July 2013 (UTC)Reply
Thank you for providing the specific tentative threshold of 1% AKA 100 frequency ratio for Google books or similar copyedited corpus. Based on the table I have posted above, I think the threshold is eminently unreasonable. On another note, you might want to consider the analysis provided by a Dutch native speaker above (CodeCat) as an input to your vote. As for consensus, there is an implied consensus in Category:English misspellings, from which most of the items in the table have been taken. It is merely implied, but much better than anything else we have on consensus as far as the inclusion of common misspellings in Wiktionary.
If you want to be musing about previously challenged common misspellings, you'd better find some. Otherwise, yours is an idle speculation produced in the absence of actual knowledge. --Dan Polansky (talk) 19:28, 12 July 2013 (UTC)Reply
I look forward to your offering your findings for discussion. DCDuring TALK 21:19, 12 July 2013 (UTC)Reply
I was not making any claims about frequencies of previously challenged misspellings; you were: "I don't think that we have ever accepted a challenged misspelling with such a low frequency". My suspicion rests: yours is an idle speculation produced in the absence of actual knowledge. --Dan Polansky (talk) 06:24, 13 July 2013 (UTC)Reply
  • Outcome: RFD kept: no consensus for deletion. Boldfaced keeps included mine and by Ungoliant; pro-keeping arguments were made by Mglovesfun and CodeCat; boldfaced delete is by DCDuring and, by implication, by the unsigned nominator; pro-deletion arguments were made by Angr. --Dan Polansky (talk) 09:55, 7 December 2013 (UTC)Reply