Wiktionary talk:Hindi transliteration

From Wiktionary, the free dictionary
Latest comment: 6 months ago by RonnieSingh in topic ISO 15919
Jump to navigation Jump to search

ड़ (and ढ़) vs.

[edit]

ड़ and are both represented by ṛ in w:Devanagari, but in order to disambiguate them, I'd like to use ŕ for one of them. Currently, I have ŕ listed as describing ड़, but as this is a retroflex sound, and the modern appears to be pronounced [r] in Hindi, I'd like to switch this so that the retroflex consonants can all be represented by dot-below characters. If no one objects, I'll implement this soon. — [ R·I·C ] opiaterein03:39, 1 May 2009 (UTC)Reply

visarga

[edit]

I couldn't find anything mentioning visarga, the symbol (ḥ) as in दूःख. --Anatoli 12:49, 19 January 2011 (UTC)Reply

That's the only place I've ever seen visarga in Hindi, though I guess it could be an uncommon or high-register word... Anyway I see no problem using ḥ. — [ R·I·C ] opiaterein01:01, 25 January 2011 (UTC)Reply
Yeah, it must uncommon, another one I know is a variant spelling of six in Hindi. I will add the symbol and transliteration for consistency. --Anatoli 02:35, 25 January 2011 (UTC)Reply

double tilde

[edit]

I forgot, did we ever come to a conclusion on how to deal with nasalized diphthongs, since the Unicode double tilde is so buggy? It's a pretty bad issue for words like मैं (ma͠i). -- Liliana 05:24, 15 February 2012 (UTC)Reply

Transliteration module

[edit]

This module Module:hi-translit may need some tweaking but it's already quite useful. E.g. to transliterate हिन्दी, use {{#invoke:hi-translit|tr|हिन्दी}} and receive" "hindī" (today's result is "hindī"). --Anatoli (обсудить/вклад) 08:07, 27 April 2013 (UTC)Reply

ISO 15919

[edit]

Why not just use ISO 15919, which is the international standard for all Indic scripts, rather than doing a custom modified IAST? Getsnoopy (talk) 05:14, 15 September 2020 (UTC)Reply

@Getsnoopy: What do you mean the? It isn't the international standard in linguistic scholarship or in dictionaries for Indian languages. I have certainly not seen it in really good scholarship (though I do catch it every now and then in work that mentions Hindi/other Indian languages but isn't primarily about them). It only purports to be a standard, and it's not a very good one because it uses so many duplicate letters that can cause confusion to users (e.g. is "ee" ई or ए?) and it has so many ugly punctuation marks in running text, which will conflict with actual punctuation. —AryamanA (मुझसे बात करेंयोगदान) 03:28, 16 September 2020 (UTC)Reply
@AryamanA: It is a superset of IAST, and can losslessly/unambiguously represent all Indic languages. And the fact that it's released by ISO, an international organization of which India is a member, is what makes it the standard. There's also Hunterian, which the Indian government uses officially, but that's lossy. I don't know what you mean by duplicate letters. "ee" would never occur in ISO 15919 for ई (represented by ī) nor ए (represented by ē); the only way it would occur is if the underlying text was एए, which is an impossible string of characters in Devanagari, and even then it wouldn't be transliterated as "ee" because the proper transliteration would be "ēē". As for "ugly punctuation marks", if you're referring to the diacritics, then IAST and the current custom implementation uses them as well. I don't see any other way of unambiguously representing the Devanagari script. Something tells me you're thinking of some other transliteration scheme and not ISO 15919. Getsnoopy (talk) 07:11, 17 September 2020 (UTC)Reply
@Getsnoopy:, Ah, I see, I accidentally looked at 7-bit ISO 15919 no the actual encoding, my apologies. Either way, I don't think there's any problem with our system as is. —AryamanA (मुझसे बात करेंयोगदान) 14:05, 17 September 2020 (UTC)Reply
@AryamanA: There are a few places where it deviates from ISO, so we might as well fix those to be compliant with an international standard: ऋ (should be r̥, not ŕ), ए (should be ē, not e), ओ (should be ō, not o), ख़ (should be k͟h, not x), and झ़ (should just settle on ž instead of also listing ź, which I also don't know how it would work). Getsnoopy (talk) 16:57, 17 September 2020 (UTC)Reply
@AryamanA: Given the grace period, I will be changing this in a few days time to be compliant with ISO 15919. Getsnoopy (talk) 17:40, 13 December 2020 (UTC)Reply
@Getsnoopy: You need to get a broader agreement. Since the current transliteration has been a compromise and was used for a long time. User:AryamanA may have been busy to respond. I suggest to you to revert your changes to transliteration policy pages for now. (Notifying AryamanA, Benwing2, DerekWinters, Kutchkutch, Bhagadatta, Msasag, Inqilābī): . --Anatoli T. (обсудить/вклад) 05:49, 26 December 2020 (UTC)Reply
@Getsnoopy, Atitarev I completely agree. Getsnoopy, changing the transliteration system is a big task because there may be manual transliterations used in several pages and you have to find all of them and change them. For this reason alone, please don't try to change anything before there's general agreement to do so. Benwing2 (talk) 06:43, 26 December 2020 (UTC)Reply
@Getsnoopy I have reverted your changes as there is disagreement from several people. Please don't edit war to try to get them back. I'm not opposed per se to changing the transliteration system, but we need good reasons for it and general agreement. Benwing2 (talk) 06:50, 26 December 2020 (UTC)Reply
@Getsnoopy, Atitarev For reference, the following page probably lists all the pages with manual Hindi transliteration; all would need to be reviewed if any change is made to the transliteration system: Category:Terms with manual transliterations different from the automated ones/hi Benwing2 (talk) 05:19, 27 December 2020 (UTC)Reply
@Getsnoopy, Benwing2 Thanks, and another side effect is Urdu transliteration is largely following the Hindi transliteration and there is probably going to be more resistance and harder to implement e.g. "k͟h" instead of "x" for خ (x), Urdu equivalent of ख़. --Anatoli T. (обсудить/вклад) 05:25, 27 December 2020 (UTC)Reply
@Benwing2 Thanks for the list. It doesn't seem like more than 30% of those pages will be affected by the changes.
@Atitarev Why do you think that is? ISO applies to Hindi and Urdu both. Getsnoopy (talk) 07:02, 27 December 2020 (UTC)Reply
@Getsnoopy: Cause you need their (Urdu editors) agreement as well. E.g. both ख़ाक (xāk) and خاک‎ (xāk) are romanised identically. --Anatoli T. (обсудить/вклад) 07:39, 27 December 2020 (UTC)Reply
@Atitarev: Sure, but I understood your comment to mean that doing that for Urdu will be harder than it is for Hindi, so I wasn't sure why the Urdu editors wouldn't accept switching to comply with an international standard. Getsnoopy (talk) 07:51, 28 December 2020 (UTC)Reply
@Getsnoopy: You're throwing around "international standard", but this is hardly the standard for Indological dictionaries. In Hindi-Urdu, there is no native short e or o, so ē and ō are quite redundant. The macronless standard is also used in works like Cardona and Jain (2003), one of the most comprehensive overviews of the Indo-Aryan languages. I agree the r with undercircle is more in line with the usual standards though. I dislike k͟h because it is way inconsistent with how ग़ is treated, and it's a weird digraph.
I don't understand at all what the problem with the current system is. It's unambiguous and has a good basis in literature. —AryamanA (मुझसे बात करेंयोगदान) 08:31, 28 December 2020 (UTC)Reply
@Getsnoopy: As with Hindi, Urdu transliteration has been around for a long time. First as the facto, then as a standard, even if it's a Wiktionary standard but unchallenged, as far as I can tell by the majority of editors, even if Urdu transliteration has been less consistent. I'm not saying they won't accept your proposal but the impact on the transliteration change there, is much bigger, since everything is manual at the moment. There is some work on automation by @Taimoorahmed11, Kushalpok01, though. --Anatoli T. (обсудить/вклад) 10:27, 28 December 2020 (UTC)Reply
@Atitarev: Ah, I see; that makes sense. I guess we'll have to wait and watch for the Urdu side of things.
@AryamanA: It's an international standard as in it's an ISO standard. At least 165 countries of the world are members of ISO, and therefore accept any standard that's released by it, ISO 15919 included. The reason ISO distinguishes between ē & ō and e & o is because it's meant to be a solution for all Indic languages; whether it's redundant is neither here nor there because the point of the standard (rightly so) is to be unambiguous in any Indic-language situation. The issue with "Indological dictionaries" and such is that they seldom seem to follow standards, or they try to shoehorn IAST into the Hindi-Urdu lexicography and, thus, invent their own standard ad-hoc to fill in gaps in either case. ISO solves all of these problems, is the most backwards-compatible with IAST, and is not an ad-hoc solution. I don't know why we're trying to reinvent the wheel here when the standard already exists, and is used on other Wiki projects such as Wikipedia.
I dislike k͟h because it is way inconsistent with how ग़ is treated, and it's a weird digraph. I don't know how much mileage that statement is going to have, as the discussion isn't about the aesthetic merits of certain characters. Getsnoopy (talk) 23:25, 28 December 2020 (UTC)Reply
@Getsnoopy: We're hardly reinventing the wheel, this has been the system here for at least a decade--well before I started editing. For the record, the Indian government doesn't use ISO, it uses Hunterian. And I would absolutely hate using ISO for say Punjabi (no tones) or Bengali (it doesn't capture ô~o alternation of the schwa) or Sinhala (again schwa alternations, pre-nasalised consonants, etc). It's a great unifying system in that it is absolutely unsuited for language-specific transliteration in South Asia; it's equally bad for all of them. Which is why we have discretion in using our own systems.
As for k͟h, digraphs are bad because we should strive for a one-to-one mapping, which is in my experience the most readable. k͟h makes no sense for Urdu, where there is no relation between ख and ख़ orthographically. And yes, aesthetic concerns matter because our transliterations should make sense for the reader. —AryamanA (मुझसे बात करेंयोगदान) 23:52, 28 December 2020 (UTC)Reply
@AryamanA: That may be, but it was a reinvention of the wheel in that it decided to chart its own path rather than use a pre-existing standard, an internationally accepted one at that. Yes, the Indian government uses Hunterian for internal purposes, but that's due to the convenience of the 26-letter English alphabet that it is restricted to and its 100+-year-long pedigree. When it comes to the ISO standard, however, it's clear that India voted to have that as opposed to anything else. The Punjabi scripts (neither Gurmukhi nor Shahmukhi) indicate tones themselves, so ISO doesn't. Same thing with the Bengali ô vs o. The Sinhala pre-nasalized consonants are listed in ISO 15919 because they are represented in the script. After all, it is for transliteration, not transcription. I think you may be confusing the two.
The digraph logic would also exclude all the aspirates that are represented as digraphs currently, so that argument seems unconvincing. In the same way that most readers aren't familiar with IPA, yet we aren't trying to invent our own IPA symbols for each sound in order to "make sense for the reader", the point is to use the most widely-used standard and let readers figure out what they mean by reading around. They have the most likelihood of already being exposed to said standard seeing as they likely also read about / see the standard on Wikipedia. That's not to mention that what makes sense for the reader is what is the most widely used convention/standard; inventing an ad-hoc convention is ill-equipped to satisfy that condition. Most readers are likely already familiar with Hunterian, which uses "kh" for both ख and ख़, so that's what readers would be familiar with. "x" has been used in myriad situations as a stand-in for "ks" or "kṣ", which would seem to only confuse, not clarify, the situation for readers. As for Urdu, even if one ignored the phonetic relation of ख (کھ) and ख़ (خ) despite their lack of visual resemblance, the Arabic transliteration also uses a "k" (albeit one with a different diacritic). Getsnoopy (talk) 01:03, 29 December 2020 (UTC)Reply
@Getsnoopy: We don't actually do transliteration, we are indeed transcribing. Otherwise we wouldn't show schwa deletion. —AryamanA (मुझसे बात करेंयोगदान) 01:07, 29 December 2020 (UTC)Reply
@AryamanA: Schwa deletion is a separate special case. If we were doing transcription (ignoring all of the places where we call it "transliteration", the "transliteration" feature, and the fact that we could've just done away with this article and related ones like it), यह and पहला would be written as yeh and pehla; those, they are not. I'd like to get some of the others' opinions on this, as it seems like we're straying from the topic. Getsnoopy (talk) 01:29, 29 December 2020 (UTC)Reply
@Getsnoopy: To get a broader community involved, you can post on WT:BP, since this is about a language policy. Regarding यह (yah) as "yeh" and the like, "yah" may match older or regional pronunciation, so that's fine but we do supply phonetic respellings and manual transliteration, so these as well, can have alt. transliterations, IMO. --Anatoli T. (обсудить/вклад) 01:45, 29 December 2020 (UTC)Reply
The nice things about standards are that there's so many to choose from. There are a lot of ISO standards that have gotten completely ignored and for which there are no good reasons to use.--Prosfilaes (talk) 05:24, 10 January 2021 (UTC)Reply
I don't know why you're trying to get more input, Getsnoopy. It seems pretty clear that our active Hindi editors are not interested (except perhaps for ŕ, which does seem odd to me). By the way, ISO standards are not intended for our specific use-cases and are often pretty bad — that's why we try to follow the scholarly literature instead. —Μετάknowledgediscuss/deeds 01:26, 10 January 2021 (UTC)Reply
@Metaknowledge except perhaps for ŕ, which does seem odd to me This is exactly my point; it seems odd because it deviates from the standard, ISO 15919, that you're probably used to. Which specific use cases are you referring to? I don't see how a standard specifically made for transliteration would not be intended for use-cases of transliteration. Scholarly literature doesn't really have a standard for non-Sanskrit languages and is wildly inconsistent, as IAST (the standard oft-used in literature) is not comprehensive enough to encompass the other Indic languages (Hindi included), which is why I've been insisting on the ISO standard. Besides, the transliteration method we have essentially already corresponds to that of ISO. The only changes would be to the romanizations for 3 or 4 characters; I don't understand what the fuss is about. Regardless, it seems like there are few vocal ones who are not interested, while others who are or are indifferent.
@Prosfilaes so many to choose from There aren't that many (at least, ones that properly and comprehensively represent the language at hand), but yes, indeed, it would be prudent for one to choose an existing standard that everyone knows and follows (including other Wiki sites) rather than create a new one that serves no purpose other than to introduce more overhead for editors and a needless learning curve for readers. Getsnoopy (talk) 04:19, 18 January 2021 (UTC)Reply
Dictionary creators, in this case, Wiktionary editors are free to choose a transliteration method they like, as long as it's described and is consistent. And, you don't really need to fix what is not broken, right? We had a number of customised transliteration policies based on standards but tweaked to suit something else. BTW, I am not voting against any specific changes. It's just a general statement. And it's going to be even more difficult now, since it seems Urdu policies (ignored for a long time) are currently undergoing some possible changes (subject to the success of the automated module work). --Anatoli T. (обсудить/вклад) 05:09, 18 January 2021 (UTC)Reply
@Atitarev And, you don't really need to fix what is not broken, right? Not only do I completely disagree with this unfortunately popular sentiment, but it often assumes that there are no problems when there are. I've described many of them in this discussion (e.g., there's no way for us to gauge whether an average reader who encounters this custom transliteration scheme is confused by it or not). as long as it's described It's described on a page that is hard to find; even I have trouble finding it many times. to suit something else May I ask what was this something else? And it's going to be even more difficult now I'm not sure I follow: if they're working on an automated module, then it should be easier to standardize, right? Getsnoopy (talk) 23:36, 3 February 2021 (UTC)Reply
@Getsnoopy, Atitarev, AryamanA I'll repost the same thing I posted on a parallel discussion about the same: I do not think ISO 15919 is a good romanisation considering the fact that different IA and Dravidian phonologies are different and trying to merge them under a single romanisation is ridiculous. I don't think we should transliterate at all, rather transcribe them phonemically. I would suggest getting rid of length in transcriptions in languages that don't make a length distinction for certain vowels. Moreover, I'm completely against using using ⟨k͟h⟩ for ⟨خ⟩ and ⟨ख़⟩. Like I said, transcribe, don't transliterate. And there's no point distinguishing Urdu and Hindi transcriptions or having different letters for the same sound that's written with different letters. People cam already see the spelling there. As for ⟨x⟩ being read as [ks], Persian transcription still uses ⟨x⟩, even though laypeople wouldn't directly make an association in chats and stuff and read it as [ks]. Wiktionary and Wikipedia are academic spaces, albeit for common people. If they come to Wiktionary enough, they'll get used to the usage of ⟨x⟩. Let's not unify all IA and Dravidian transliterations under one umbrella. They have differences and those differences matter. RonnieSingh (talk) 08:15, 8 February 2021 (UTC)Reply
@Getsnoopy, RonnieSingh: Right, I think it's abundantly clear that ISO 5919 is not what we want to use. ISO standards are not for high-quality linguistic work, they're for convenience. We have a nice system for Hindi, and we are going to stick with it. This is trying to "fix" a non-existent problem.
Getsnoopy, once again, ISO is not a standard in linguistic work. I have not seen any work on Hindi using it, instead they use an IAST-inspired system like ours or IPA. —AryamanA (मुझसे बात करेंयोगदान) 19:50, 8 February 2021 (UTC)Reply
I am working with Bengali transliteration and occasionally lurking around Hindi transliteration. While we may not fully implement the ISO 15919 system in Hindi transliteration, we may at least replace ŕ with , as the latter looks similar to in Sanskrit. Sbb1413 (he) (talkcontribs) 14:36, 4 April 2023 (UTC)Reply
Pinging @AryamanA in case the user hasn't responded yet. Sbb1413 (he) (talkcontribs) 15:44, 5 April 2023 (UTC)Reply
That I second RonnieSingh (talk) 14:37, 12 April 2024 (UTC)Reply