Wiktionary:Transliteration and romanization

From Wiktionary, the free dictionary
Jump to navigation Jump to search

Many languages use writing systems other than the Latin alphabet (or Roman script) which is used for English. There are various methods of romanization of text in these writing systems, with varying degrees of standardization. For the English Wiktionary, it is important to have certain standards of romanization, particularly for languages with non-phonetic writing systems.

The intent of this and its derivative articles is to provide a guide to English-language Wiktionarians attempting to deal with foreign scripts.

Transliteration policy

Foreign scripts
A foreign term written in a language with a non-Roman phonetic alphabet should be accompanied by a transliteration in most places it appears, including:
  • Main inflection words of an entry
  • Inflection, conjugation, or declension table or listing
  • In any list of related terms, including homophones, rhymes, synonyms, antonyms, derived terms, related terms, coordinate terms, descendants, translations, etc.
  • In prose, like in an etymology or usage notes section
Other Latin alphabets
For a foreign term written in a language that uses the Roman alphabet, there is no need for any transliteration. If a transliteration of one of these terms is especially common, then a soft redirect (e.g. a link from ===See also=== or using {{also}}) should be generated that points to the non-transliterated version of the term.
Borrowings into English
Some foreign words are partly or wholly naturalized in the English language. These should be considered simply as English terms, with their own English entries according to the criteria for inclusion, even though they may resemble or match a romanized version.

Wiki-romanization

Because most languages have multiple systems for romanization, any language that sees frequent romanization in Wiktionary should have a language considerations page defining the romanization standard to be used in Wiktionary.

Pages documenting the romanization systems used on Wiktionary should be placed in Category:Transliteration policies.

Using transliterations in entries

Many of Wiktionary's templates, both general and language-specific, have a parameter named |tr= which is used to specify the translation of the term into the Latin alphabet. Simply specifying this parameter on any template that has it enabled will display the transliteration next to the word in question.

Many templates — in particular {{l}}, {{m}}, {{t}} and {{head}} — are able to provide automatically generated transliterations. If these templates detect that they are being used for a term in a non-Latin script, they will attempt to generate a transliteration themselves, even without a |tr= parameter. The transliteration rules have to be defined for each language, and not all languages have this feature, so it will only work for the languages that have a transliteration module available. This is defined with the translit setting for each language in Module:languages.

Note that for some scripts, automatic transliteration can be difficult or impossible to implement practically. This can be because the script in its native written form does not provide enough information for an accurate transliteration, like abjad scripts such as Arabic or Hebrew which are normally written without vowels. Transliteration of Hindi also suffers similar limitations. Another reason may be that the writing system is very complicated and it would be very hard and time consuming to define rules for every case, like for the many thousands of Chinese characters. For such scripts, manual transliteration by a person who understands the language and its script remains the only viable option.

Criteria for romanization systems

A transliteration system is a balance between several, often contradictory, goals. Different systems may be better suited for different purposes, including lexicography, linguistic study, geographic naming, bibliographic cataloguing, diplomatic communication, etc., based on how well they meet particular goals.

Guidelines for transliteration systems:

United Nations Group of Experts on Geographical Names
  1. Reversible
  2. Simple and clear-cut
    • Table and notes to be sufficient; not requiring dictionaries, etc.
    • Not allowing for variations in the romanization
  3. Easy to write, read, and memorize, as well as store electronically
    • Minimizing diacritics, unusual character sequences, etc.
    • Systematically representing phonology
UNGEGN (2007), Technical Reference Manual for the Standardization of Geographical Names, p 4
US Board on Geographic Names and Permanent Committee on Geographical Names for British Official Use
  1. Reversible and parsimonious (economical)
  2. Minimizing diacritical signs
  3. Not a guide to pronunciation nor a language treatise
BGN/PCGN (2008), Romanization Systems and Roman-script Spelling Conventions, p iii
US Library of Congress
  1. Transliteration and not pronunciation
  2. Enabling machine transliteration and reversible
  3. In line with international and native standards
LoC (2010), “Procedural Guidelines for Proposed New or Revised Romanization Tables
Unicode Consortium
  1. Standard: follow established systems
  2. Complete: every sequence of characters should transliterate
  3. Predictable: letters themselves should be sufficient, allowing mechanical transliteration
  4. Pronounceable: having reasonable pronunciations in the target script
  5. Reversible: it is possible to recover the text
Unicode Consortium (2013), “Transliteration Guidelines

Established romanization systems

There are many established systems of romanization, including a few broad sets of systems for many languages. This is not to say that that any established way of doing this is the only way, or even the best way for a particular purpose.

Some standardized, multi-language systems:

  • Scholarly, international, or scientific transliteration, a loose collection of romanization systems used in linguistics.
  • UNGEGN (United Nations Group of Experts on Geographical Names, specifically its Working Group on Romanization Systems) creates and chooses standards for romanization in international relations.
  • ISO Romanizations (International Organization for Standardization) are a series of international romanization standards, including codifications of international systems, available at a cost.[1]
  • ALA-LC Romanization Tables (American Library Association–Library of Congress) for bibliographic reference, used in English-language libraries, bibliographies, and publications.
  • BGN/PCGN Romanization Systems (United States Board on Geographic Namesand Permanent Committee on Geographical Names for British Official Use), used primarily by the US and UK governments for worldwide place names.

Key terms

Writing system, language script, script
A native representation of a language in writing or print. Types of writing systems include alphabets, abugidas, abjads, syllabaries, and pictographic, logographic, and ideographic writing systems. Examples of scripts include Latin, Cyrillic, Canadian Aboriginal Syllabics, Arabic, and Hanzi.
Transformation
Transformation of written text includes translation and conversion.
Conversion
Conversion of scripts comprises transcription and transliteration.
Transcription
Literally “writing across.” Transcription has several meanings that overlap with transliteration. In linguistics and lexicography (dictionary-making), it means phonological or phonetic transcription, the written representation of spoken utterances. See Wiktionary:Pronunciation.
Transliteration
Literally “lettering across.” Rendering of written text from one writing system into another, letter-by-letter, or character-by-character for non-alphabetic scripts. In English Wiktionary we are mainly concerned with romanization.
Romanization
Transliteration from a foreign writing system into the Latin (Roman) alphabet, possibly supplemented by diacritical marks or additional characters.
Romanization system
Standardized romanization systems exist for most languages, used in linguistics, library science, geography, publishing, government and legal documentation, and other fields. For a list, see w:romanization.
Wiki-romanization
A romanization system chosen for Wiktionary. It is usually a common standard of romanization, or based on one and modified for Wiktionary's specific needs.
Source, original, or donor
The script or language from which text is to be transformed.
Target, or receiver
The script that text is to be transformed into, or the language of the transformation’s intended audience.

Transliteration and romanization are not pronunciation. They relate to the written languages, not to the spoken languages. Although these systems will often approximate the pronunciation of a language, that remains a secondary consideration to their development. Thus, the very common Russian genitive singular ending -ого would normally be transliterated as -ogo but pronounced /-ovo/.

See also

Wikipedia

Other resources