Jump to content

User talk:Stephen G. Brown/Russian stress mark

Page contents not supported in other languages.
Add topic
From Wiktionary, the free dictionary
Latest comment: 19 years ago by Stephen G. Brown in topic GSub

Russian stress marks

[edit]

Hi again Stephen. I really don't understand why you're removing all of the stress marks from the Russian entries. The Russian dictionaries I've used use them extensively in both headwords and translations in the English->Russian section. I see you're keeping them in the transliterations but those are secondary. People familiar with Cyrillic don't want to have to look at both scripts to see where the stress belongs. Sure there may be some rendering problems on some older computers/browsers but generally we've opted to use advanced Unicode on Wiktionary and letting the technology catch up with us if necessary. I for one miss the acute accents in the Russian Cyrillic. — Hippietrail 13:02, 8 May 2005 (UTC)Reply

Partly because they look so horrible, floating way up in the air and far off center. And partly because many people will think they are part of the spelling. In most cases, people who would have any use for the Russian accents already know them on most words, and for unfamiliar words you can instantly see the accent in the transliteration, and without even actually having to look at the transliteration. I'm even finding these pseudo-accented Russian words as titles of articles, and if you search for them without including the accent, you can't find them. There is a way to make them look good, but it’s something that I don’t think most people will be willing to do ... I’m talking about putting the IPA template around the word, like this: волна́ replace н with ʜ, о with o and в with ʙ, invalid IPA characters (волна). Otherwise you just get волна´, which looks awful and is likely to wind up as the title of its own page. All this on top of the fact that the pseudo-accent is much harder to type than a real accent. I think putting the fake accent on Russian words offers very little benefit, but it carries lots of problems. —Stephen 13:28, 8 May 2005 (UTC)Reply
Well you see they don't look horrible for everybody. It depends on OS, browser, fonts installed, versions of certain parts of the OS (such as w:Uniscribe on Windows). For me it matters that this type of thing works and looks as good as possible so I update these things and customize them. My default font is Arial Unicode MS which does a good job of the stress marks in Latin and Cyrillic.
I am using the most recent version of Uniscribe, along with Arial Unicode MS. To my eye, the effect of the universal accent is extremely bad. —Stephen 15:57, 8 May 2005 (UTC)Reply

Lua error in Module:parameters at line 376: Parameter "sc" should be a valid script code; the value "Latnx" is not valid. See WT:LOS. (Yat’)

[edit]

File:CyrillicStressMarks.gif Here is a picture of how it looks on this computer where I work. I don't have Internet Explorer's Cyrillic font set to Arial Unicode MS after all. I remember now that I customized my user stylesheet. In fact I did this early on both here and on Wikipedia to solve this and other international language rendering problems. — Hippietrail 10:29, 9 May 2005 (UTC)Reply

Well, that would explain why we’re seeing two different things. I believe what I see is what most people see. At any rate, your GIF looks reasonable, except for the accented Lua error in Module:parameters at line 376: Parameter "sc" should be a valid script code; the value "Latnx" is not valid. See WT:LOS., which collides with the accent. —Stephen 10:08, 11 May 2005 (UTC)Reply
Yes, Arial Unicode MS seems to have better accent positioning than Code2000, but both fonts have overlooked the possibility of accenting obsolete vowels. — Hippietrail 01:27, 17 May 2005 (UTC)Reply

I don't know about "most cases". I do know my case which is that I know the Cyrillic alphabet and basic Russian pronunciation. I know that Russian words have a stressed syllable. I know how to use a Russian dictionary to find where the stress goes. But I have a very tiny vocabulary.

Dictionaries are best if they are complete. If they were based on assumptions like "most people don't need X" then we would have dictionaries listing only rare or difficult words.
Yes, but few Russian-English dictionaries offer a transliteration of the Russian. When there is a transliteration, pronunciation guides are not repeated. If we're going to use the pseudo-accent on the Cyrillic word, then we should drop the transliteration. If we are going to have a transliteration, then all the pronunciation guides should be applied to the transliteration, and there only. I am happy to drop the transliterations and just put the accent in the Cyrillic, but then, because of the way it looks with my newest possible Uniscribe and font, all the entries should have the IPA template applied. This, however, creates another problem: if you include the accented word, plus the word spelt correctly (as a link), plus the IPA template, and try to put it in a bulleted list, it confuses the program and the word gets knocked to the next line, and not tabbed.

Transliteration

[edit]

The reason we offer both native script and transliteration is to be useful to everybody. One of the basic tenets of Wikipedia and Wiktionary are that they are "not paper". This means we don't have leave things out that print dictionaries do to save space. Personally, I can read Cyrillic but many people who just want to look up the occasional word don't. Because of this I think the transliteration is less important than the native script but sometimes I add transliteration and I would never delete a transliteration that somebody else added. Their are some other issues with Russian transliteration such as competing systems, how to treat the "hard sign", how to treat obsolete letters, the fact that "y" in transliteration doesn't mean what inexperienced people expect... — Hippietrail 10:29, 9 May 2005 (UTC)Reply

And that’s exactly why I’ve been including transliterations all along, not only for Cyrillic, but Chinese, Japanese, Korean, Georgian, Cherokee, Hindi, Arabic, Greek and all the rest. But I think that not deleting a transliteration is the same as not deleting anything else that is wrong, unneeded, poorly done or otherwise undesirable. I change transliterations very frequently, sometimes because they are just wrong, but more often in an effort to standardize. For instance, I would change a Russian pyat’ to pjat’, because that seems to be the standard here. —Stephen 10:08, 11 May 2005 (UTC)Reply

Unicode

[edit]

A real accent is when a font designer actually creates individual accented letters. For example, the acute accent on this á is not a separate glyph...the á is a complete character with its own place in the font. Furthermore, different typefaces require different accent designs. You can’t use a Helveltica accute accent on a Times-Roman letter. So-called "universal" accents are like making a pot of mashed potatoes with a handful of dirt in it. All the accents, acute, grave, diaeresis and the rest, have to be properly designed for each individual typeface, and must be specially positioned for each character. "Universal" solutions are like putting a tuxedo on a pig. —Stephen 15:57, 8 May 2005 (UTC)Reply

Aha I see what you're getting at but it doesn't work quite this way. At least not since Unicode. It's not the font designers who have left out accented Cyrillic characters (not exactly anyway). It's the people behind Unicode. Unicode has no provision for Cyrlillic letters with acute accent. Their reasoning is that none of the legacy encodings had them and the w:comining characters (The full Unicode name seems to be "Combining Diacritical Marks") can be used with any character. I have read a couple of times Unicode's official stance on the matter but it's very difficult to find with Google. I'll post I link when I find it.
Yes, I understand that. In fact, Cyrillic fonts with accented vowels have existed for a long time. I have several of them. I did a lot of work with the Unicode Consortium back in the early and mid ’90s in connection with Khmer. At the time I did not get into Cyrillic with them because Cyrillic is such a simple matter. But the universal diacritics (or "combining," as they now say) have also been around for a long time, centuries. Prior to about 1990, you could always recogize an American hand in typeset foreign text by the mismatched and poorly located universal accents. They’re a sure sign of a monoglot isolationist. Perhaps that’s why I find them so offensive, while you don’t see anything wrong with them. —Stephen 10:08, 11 May 2005 (UTC)Reply

Combining accents

[edit]

I have found some posts on the Unicode mailing list explaining that the combining characters are necessary for accented Cyrllic. I'd prefer something more official-looking, but here's what I have so far: [1], [2], [3]Hippietrail 01:27, 17 May 2005 (UTC)Reply

Yes, there is no question but that combining characters are required for accented Cyrillic in Unicode. But when Cyrillic fonts are designed, the "combining" accents are created with zero width, so that they work like a deadkey. This can work fairly well with fixed-space fonts such as Courier... but with proportional fonts, the combining accents fall too far to the left or to the right, depending on the base letter, and frequently either too high or too low. The combining accents cannot be made to work in an acceptable manner unless the developer takes the extra step of using a program like VOLT to add positioning tables that dictate exactly how each accent will be positioned for each base. Even then, it only works for the accent/letter pairs that the developer actually addresses. With most fonts available today, no one has bothered to add positioning tables for the acute with each of the ten Russian vowels, and Uniscribe doesn't perform this function unless the tables have been included in the font.
I think a good example of what I’m talking about is the different positioning proportioned to the font invoked by the IPA template as opposed to the font invoked by the Unicode template: IPA а́э́ы́о́у́ я́е́и́ёю́ obsolete or nonstandard characters (P), replace е with e, I with ɪ, у with y, я with ᴙ and о with o, invalid IPA characters (IPAаэыоуяеиёю) vs. Lua error in Module:parameters at line 376: Parameter "sc" should be a valid script code; the value "Latnx" is not valid. See WT:LOS.. I don’t know what the fonts are, but you can see what a tremedous difference there is. Another big problem is the fact that the big Unicode fonts (such as Arial Unicode MS) don’t have bold versions, so when you apply bolding, Windows creates a pseudo-bold font on the fly, and it really looks terrible (blurry, blotchy and out-of-focus). Let’s see if there is a difference in the font templates in this regard: aeiou & IPA а́э́ы́о́у́ я́е́и́ёю́ obsolete or nonstandard characters (P), replace е with e, I with ɪ, у with y, я with ᴙ and о with o, invalid IPA characters (IPAаэыоуяеиёю) vs. aeiou & Lua error in Module:parameters at line 376: Parameter "sc" should be a valid script code; the value "Latnx" is not valid. See WT:LOS.. —Stephen 10:03, 17 May 2005 (UTC)Reply

What a font designer needs to do these days for accented Cyrillic is to design all the Cyrillic letters he/she wants plus the combining acute accent, U+0301. These glyphs are designed to overstrike a previous glyph. The problem is that some fonts don't include them, some fonts include them but omit to make them "non-spacing".

Yes, I know exactly how they work. I have a lot of experience with them. They are included in fonts only as a bandaid. —Stephen 10:08, 11 May 2005 (UTC)Reply

The issue of putting an accent from one font with a letter of another font wouldn't usually come up unless Uniscribe (or the Mac or Linux counterpart) has logic to specifically implement such a workaround.

No, most fontmakers who include the loose accents just pick them up from a character database, and don’t expend any time or effort refining them to make them match the font. Just because they are in the font does not mean that they match the font. If they do match the font, then Uniscribe could conceivably position them properly for each letter, but this rarely if ever happens. As far as I can tell, the loose accents are given a single unit of width (one unit is required by the Windows system), and the accent then fits differently with different characters according to their width. I have seen some that were positioned better by kerning, but kerning values are almost always missing from these letters. I know you think that I don’t know anything about this, but I’ve designed many fonts myself (using Fontographer) and I have over three decades of experience with fonts and typography. —Stephen 10:08, 11 May 2005 (UTC)Reply
Let me apologize if I've been giving the wrong impression but there's no contributor here whom I respect more than yourself. I'm just being over-descriptive because I'm not sure how far your knowledge extends, to test test my own knowledge by describing to you how I believe things work, and also for the benefit of any 3rd parties reading this conversation.
It's a shame if font designers are not taking due care to shape and position the combining acute accent for Cyrillic. I would expect Cyrillic fonts to do a better job than general purpose fonts, but since the old 8-bit encodings, the keyboards, and the majority of existing text doesn't use the acute, they seem to have overlooked it for the time being.

GSub

[edit]
TrueType/OpenType includes two useful features: GSUB for glyph substitution, which allows the designer to provide a distinct glyph for a character given its context; and GPOS for glyph positioning, which allows a font designer to specify how to position an accent relative to its base character.
Maybe Fontographer doesn't have good support for GPOS and GSUB or only introduced it in a certain version. These features are exotic enough that the majority of fonts for "simple" scripts such as Latin, Cyrillic, Greek, CJK, can usually do without them. — Hippietrail 01:27, 17 May 2005 (UTC)Reply
Yes, I know about the GSub and GPos tables. I’ve used VOLT now for several years to insert these tables into Arabic fonts. But all of my experience has been in regard to high-end typography programs such as QuarkXPress and Adobe Illustrator, and I have not gotten into Internet fonts. I used to work with Macromedia on their Fontographer forum, helping people with their font issues, but then Macromedia abandoned the business of printer fonts and typography, and started focusing strictly on the Internet. That’s when I stopped moderating there. I know that I should have followed fonts as they turned to Internet usage, but I found that aspect too boring, at least for the time being. That’s why I don’t know much about fonts as used on the Internet. —Stephen 10:03, 17 May 2005 (UTC)Reply
It is true as you say that "All the accents, acute, grave, diaeresis and the rest, have to be properly designed for each individual typeface, and must be specially positioned for each character". This is what w:TrueType and w:OpenType are all about. One of their features is to allow font designers to supply specific positioning information for combining accents.
The "universal" solution you mention just doesn't exist. — Hippietrail 04:30, 11 May 2005 (UTC)Reply
I don’t understand what you mean. What "universal" solution doesn’t exist? —Stephen 10:08, 11 May 2005 (UTC)Reply
The "codepoint" for "combining acute" is Universal. My interpretation of what you were saying was that a rendering system might draw the text using one font, and the accents using another. The latter is the non-solution to which I was referring. — Hippietrail 01:27, 17 May 2005 (UTC)Reply
I don’t remember what I was saying, but it definitely wasn’t that. I know that the base letters and the accents are always part of the same font. I was probably talking about the fact that just because a given font includes the combining acute accent, that doesn’t mean that the font developer went to the trouble of using VOLT to create positioning tables for the acute accent and the particular Cyrillic letters that we are trying to apply it to. Arabic ranges always have to have GSub and GPos tables, but Roman and Cyrillic letters rarely have them. —Stephen 10:03, 17 May 2005 (UTC)Reply

If there are words which exist only under a title with a stress mark, this is an error. Wiktionary is full of errors and one of the things us editors do is browse through and fix (or mark to be fixed) errors we find. In this case we would move the article to the spelling without the stress mark. If there is already a word there, merging might be required. It's not a bad idea to keep the redirect from the stress-marked word to the non-marked word in case somebody searches for it. But it's not very important.

Yes, but there are more than one pseudo-acute accent that someone could type, so several pages would be needed for each word. Besides, I don't think anyone would ever type an accent when searching for Russian...there would only be a need for such confusion if we start putting pseudo-accents on Russian words, which would inexorably lead to accented words as article heads. (And anyway, I checked and there were no links pointing to the words I marked for deletion.) —Stephen 15:57, 8 May 2005 (UTC)Reply
I don't think the redirect pages are "needed". They are nice extras which few people would ever use. Much like vowelled Hebrew or Arabic redirects. It's probably true that few people will type accented Russian but I have already cut and pasted accented Russian back when I was going through the painful process of learning all this esoteric Unicode stuff.