Jump to content

Wiktionary talk:Votes/2011-07/Redirecting single-character digraphs

Page contents not supported in other languages.
Add topic
From Wiktionary, the free dictionary
Latest comment: 13 years ago by Ruakh in topic Affected characters

Affected characters

[edit]

I wrote a Perl script to go through UnicodeData.txt and NamesList.txt (from http://www.unicode.org/Public/UNIDATA/) and find each character that has a direct compatibility mapping to a sequence of multiple non-modifier letters, without any compatibility formatting tag. That turns out to include these 56 characters:

Of course, we may not want this vote to include all of the above; <№>, for example, is not exactly a "digraph". And conversely, we may want it to include some things that aren't listed above; the above-mentioned search criteria are just a first pass, and I welcome other thoughts. But, it's hopefully a starting-point for discussion.

(I realize some of the above is probably gobbledegook to anyone who's not familiar with the guts of Unicode . . . if you have any questions, ask. Though I have to admit that I'm not terribly familiar with the guts of Unicode, either!)

RuakhTALK 00:04, 5 July 2011 (UTC)Reply

Thanks, but, how complete is that list? You mentioned some ligatures, but not æ... --Daniel 16:44, 5 July 2011 (UTC)Reply
It is a perfectly complete list . . . of characters meeting the above-mentioned criteria. Unicode does not give a nontrivial compatibility decomposition for <æ>, so it didn't qualify. But as I mentioned, we may want to consider different criteria. Incidentally, Unicode names <æ> "LATIN SMALL LETTER AE", not "LATIN SMALL LIGATURE AE", though the latter is indicated to be an alias for it. Here is its full entry in NamesList.txt:
00E6	LATIN SMALL LETTER AE
	= latin small ligature ae (1.0)
	= ash (from Old English æsc)
	* Danish, Norwegian, Icelandic, Faroese, Old English, French, IPA
	x (latin small ligature oe - 0153)
	x (cyrillic small ligature a ie - 04D5)
RuakhTALK 18:07, 6 July 2011 (UTC)Reply
IMO some of these, at least, are of interest in their own right and should not redirect. U+FB4F ﭏ HEBREW LIGATURE ALEF LAMED, for example, can have an interesting etymology: when it was first used, why and when it's used, etc. This is independent of the page אל. The same can be said for the (now former) Rupee sign, the ffi and st families of ligatures, and perhaps more.​—msh210 (talk) 17:22, 6 July 2011 (UTC)Reply
I agree about alef-lamed and the rupee sign, but I think the ffi and st ligatures are exactly what this vote should be about. They're exactly the kind of "character" that is no longer being added to Unicode. —RuakhTALK 18:07, 6 July 2011 (UTC)Reply
I suggest restricting this vote to entries written in Latin script only, regardless of whether characters in Hebrew, Lao, Armenian and Arabic would follow suit. --Daniel 18:24, 6 July 2011 (UTC)Reply
Redirecting to ffi is a bad idea, because the latter does not exist. We can create it, but I don't see how it would be justifiable. --Daniel 18:24, 6 July 2011 (UTC)Reply
Oh, right. Good point. —RuakhTALK 20:52, 10 July 2011 (UTC)Reply

Redirecting trigraphs

[edit]

This vote should probably extend to trigraphs, to cover and as well. --Daniel 15:58, 5 July 2011 (UTC)Reply

Redirecting Roman numerals

[edit]

Apparently, should this vote pass, will redirect to ⅹⅰ; and, not to xi, because we would still keep the distinction between "generic" Latin letters and Roman numerals. --Daniel 16:56, 5 July 2011 (UTC)Reply

Specific digraphs

[edit]

I restricted the list to 14 specific redirects. --Daniel 02:33, 10 July 2011 (UTC)Reply