Wiktionary:Khmer romanization

Shortcut:
WT:KM TR

These are the rules concerning transliteration in Khmer entries.

Khmer romanization

Romanization of Khmer on Wikipedia

The Khmer language is written with the Khmer script, an Indic-based alphasyllabary. There are many methods to romanise the Khmer script; the most common schemes are the United Nations Group of Experts on Geographical Names (UNGEGN) scheme, the Geographic Department scheme which is based on the UNGEGN scheme, the BGN/PCGN scheme and the ALA-LC scheme. All of these schemes use a mix of transcription and transliteration principles (with different proportions of mixing), and as a consequence it is appreciably difficult to algorithmically generate these romanisations in an accurate manner. Monolingual Khmer dictionaries, such as the renowned Chuon Nath Dictionary, traditionally make use of ‘respellings’ to indicate irregularities in pronunciations in a fashion similar to Thai dictionaries, though the use of respellings is not as consistent. The following will attempt to introduce the intricacies of the Khmer script and the romanisations.

Consonants

Consonants	Subscript form	Class	IPA (letter)	IPA (before vowel)	IPA (first in cluster)	IPA (final)	UNGEGN (letter)	Wiktionary Transliteration	Wiktionary Transcription
ក	្ក	1	/kɑː/	/k/	/k/	/k/	kâ	k	ka
ខ	្ខ	1	/kʰɑː/	/kʰ/	/k/	/k/	khâ	kh	kh
គ	្គ	2	/kɔː/	/k/	/k/	/k/	kô	g	k
ឃ	្ឃ	2	/kʰɔː/	/kʰ/	/k/	/k/	khô	gh	kh
ង	្ង	2	/ŋɔː/	/ŋ/	―	/ŋ/	ngô	ng	ng
ច	្ច	1	/cɑː/	/c/	/c/	/c/	châ	c	c
ឆ	្ឆ	1	/cʰɑː/	/cʰ/	/c/	―	chhâ	ch	ch
ជ	្ជ	2	/cɔː/	/c/	/c/	/c/	chô	j	c
ឈ	្ឈ	2	/cʰɔː/	/cʰ/	/c/	―	chhô	jh	ch
ញ	្ញ	2	/ɲɔː/	/ɲ/	―	/ɲ/	nhô	ñ	ñ
ដ	្ដ	1	/ɗɑː/	/ɗ/	/ɗ/	/t/	dâ	ṭ	d
ឋ	្ឋ	1	/tʰɑː/	/tʰ/	/t/	/t/	thâ	ṭh	th
ឌ	្ឌ	2	/ɗɔː/	/ɗ/	―	/t/	dô	ḍ	t
ឍ	្ឍ	2	/tʰɔː/	/tʰ/	―	/t/	thô	ḍh	th
ណ	្ណ	1	/nɑː/	/n/	/n/	/n/	nâ	ṇ	n
ត	្ត	1	/tɑː/	/t/	/t/	/t/	tâ	t	t
ថ	្ថ	1	/tʰɑː/	/tʰ/	/t/	/t/	thâ	th	th
ទ	្ទ	2	/tɔː/	/t/	/t/	/t/	tô	d	t
ធ	្ធ	2	/tʰɔː/	/tʰ/	/t/	/t/	thô	dh	th
ន	្ន	2	/nɔː/	/n/	―	/n/	nô	n	n
ប	្ប	1	/ɓɑː/	/ɓ/	/p/	/p/	bâ	p	b
ផ	្ផ	1	/pʰɑː/	/pʰ/	/p/	/p/	phâ	ph	ph
ព	្ព	2	/pɔː/	/p/	/p/	/p/	pô	b	p
ភ	្ភ	2	/pʰɔː/	/pʰ/	/p/	/p/	phô	bh	ph
ម	្ម	2	/mɔː/	/m/	/m/	/m/	mô	m	m
យ	្យ	2	/jɔː/	/j/	―	/j/	yô	y	y
រ	្រ	2	/rɔː/	/r/	―	/Ø/	rô	r	r
ល	្ល	2	/lɔː/	/l/	/l/	/l/	lô	l	l
វ	្វ	2	/ʋɔː/	/ʋ/	―	/w/	vô	v	v
ឝ	្ឝ	1	―				shâ	ś	s
ឞ	្ឞ	2	―				ssô	ṣ	s
ស	្ស	1	/sɑː/	/s/	/s/	/h/	sâ	s	s
ហ	្ហ	1	/hɑː/	/h/	/Ø/	―	hâ	h	h
ឡ	្ឡ	1	/lɑː/	/l/	―	―	lâ	ḷ	l
អ	្អ	1	/ʔɑː/	/ʔ/	/ʔ/	―	qâ	ʾ	ʾ

Digraph consonants	Subscript form	Class	IPA (letter)	IPA (before vowel)	IPA (first in cluster)	IPA (final)	UNGEGN (letter)	Wiktionary Transliteration	Wiktionary Transcription
ហ្គ	―	1	/ɡɑː/	/ɡ/	/ɡ/	/k/	gâ	h˳g	g
ហ្គ៊	―	2	/ɡɔː/	/ɡ/	/ɡ/	/k/	gô	h˳g′	g
ហ្ន	―	1	/nɑː/	/n/	―	―	nâ	h˳n	n
ប៉	―	1	/pɑː/	/p/	/p/	/p/	pâ	p″	p
ប៊	―	2	/ɓɔː/	/ɓ/	―	―	bô	p′	b
ហ្ម	―	1	/mɑː/	/m/	―	―	mâ	h˳m	m
ហ្ល	―	1	/lɑː/	/l/	―	―	lâ	h˳l	l
ហ្វ	―	1	/fɑː/ /ʋɑː/	/f/, /ʋ/	/f/	/f/	fâ, vâ	h˳v	f, v
ហ្វ៊	―	2	/fɔː/ /ʋɔː/	/f/, /ʋ/	/f/	/f/	fô, vô	h˳v′	f, v
ហ្ស	―	1	/ʒɑː/ /zɑː/	/ʒ/, /z/	―	―	žâ, zâ	h˳s	ž, z
ហ្ស៊	―	2	/ʒɔː/ /zɔː/	/ʒ/, /z/	―	―	žô, zô	h˳s′	ž, z
Used in phonetic respellings
ញ៉	―	1	/ɲɑː/	/ɲ/	―	―	nhâ	ñ″	ñ
ម៉	―	1	/mɑː/	/m/	―	―	mâ	m″	m
យ៉	្យ៉	1	/jɑː/	/j/	―	―	yâ	y″	y
រ៉	្រ៉	1	/rɑː/	/r/	―	―	râ	r″	r
ល៉	្ល៉	1	/lɑː/	/l/	―	―	lâ	l″	l
វ៉	្វ៉	1	/ʋɑː/	/ʋ/	―	―	vâ	v″	v
ស៊	្ស៊	2	/sɔː/	/s/	/s/	/h/	sô	s	s

‘Syllabic configurations’

a-series = 1^st class; o-series = 2^nd class.
Note that the combination of diacritics may not be displayed as desired; please consult the column of examples.

Diacritics	Examples		IPA		UN Romanization		Wiktionary Transliteration	Wiktionary Transcription
Diacritics	a-series	o-series	a-series	o-series	a-series	o-series	Wiktionary Transliteration	a-series	o-series
(none)	ក	គ	/ɑː/ /ɑ/ (when unstressed in some words)	/ɔː/ /ɔ/ (when unstressed in some words)	â	ô	a	ɑɑ, ɑ	ɔɔ, ɔ
់	កត់	ទប់ យល់	/ɑ/	/u/ (before labial finals) /ŭə/ (elsewhere) /ɔ/ (elsewhere, in codaless nonfinal syllables)	á	ó	á	ɑ	u, ŭə, ɔ
័	ស័ក	ល័ខ ទ័ព	/a/	/ĕə/ (before velar finals) /ŏə/ (elsewhere)	ă	eă oă	ă	a	ĕə, ŏə
័យ	សម័យ	ជ័យ	/aj/	/ɨj/			ăy	ay	ɨy
័រ		ជ័រ		/ɔə/			ăr		ɔə
ា	តា	ជា	/aː/	/iə/	a	éa	ā	aa	iə
ា់	កាត់	ទាក់ គាត់	/a/	/ĕə/ (before velar finals) /ŏə/ (elsewhere)	ă	eă oă	ā́	a	ĕə, ŏə
ិ	មតិ កិរិយា	លទ្ធិ និទាន	/eʔ/ (in stressed syllables) /e/ (elsewhere)	/iʔ/ (in stressed syllables) /i/ (elsewhere)	ĕ	ĭ	i	eʾ, e	iʾ, i
ិ (with non-glottal coda)	ចិត្ត	ជិត	/ə/	/ɨ/			i	ə	ɨ
ិយ	ចេតិយ	ឥន្ទ្រិយ	/əj/	/iː/			iy	əy	ii
ិះ	តិះដៀល	ជិះ	/eh/	/ih/			iḥ	eh	ih
ី	បី	ពីរ	/əj/	/iː/	ei	i	ī	əy	ii
ឹ	ដឹក	ទឹក	/ə/	/ɨ/	œ̆	œ̆	ẏ	ə	ɨ
ឹះ	ឆ្កឹះ	គន្លឹះ	/əh/	/ɨh/			ẏḥ	əh	ɨh
ឺ	ដឺ	គឺ	/əɨ/	/ɨː/	œ	œ	ȳ	əɨ	ɨɨ
ុ	វត្ថុ កុមារ	វិទ្យុ គុលិកា	/oʔ/ (in stressed syllables) /o/ (elsewhere)	/uʔ/ (in stressed syllables) /u/ (elsewhere)	ŏ	ŭ	u	oʾ, o	uʾ, u
ុ (ុ with non-glottal coda)	កុន	គុណ	/o/	/u/	ŏ	ŭ	u	o	u
ុះ	ចុះ	ពុះ	/oh/	/uh/	ŏh	ŭh	uḥ	oh	uh
ូ	កូរ	គូ	/ou/	/uː/	o	u	ū	ou	uu
ូវ	ត្រូវ	នូវ	/əw/	/ɨw/			ūv	əw	ɨw
ួ	កួរ	គួរ	/uə/	/uə/	uŏ	uŏ	ua	uə	uə
ើ	បើ	ឈើ	/aə/	/əː/	aeu	eu	oe	aə	əə
ើះ	ចង្កើះ		/əh/				oeḥ		əh
ឿ	តឿ	ជឿ	/ɨə/	/ɨə/	œă	œă	ẏa	ɨə	ɨə
ៀ	តៀប	ទៀប	/iə/	/iə/	iĕ	iĕ	īa	iə	iə
េ	កិរ្តិ៍	គេ	/eː/	/ei/	é	é	e	ee	ei
េច (េ before palatals)	ម៉េច ចេញ	ភ្លេច ពេញ	/ə/ (before palatals)	/ɨ/ (before palatals)			e	ə	ɨ
េះ	សេះ	នេះ	/eh/	/ih/	éh	éh	eḥ	eh	ih
ែ	កែ	គែ	/ae/	/ɛː/	ê	ê	ae	ae	ɛɛ
ែះ	កែះ		/eh/				aeḥ		eh
ៃ	ប្រៃ	ព្រៃ	/aj/	/ɨj/	ai	ey	ai	ay	ɨy
ោ	កោរ	គោ	/ao/	/oː/	aô	oŭ	o	ao	oo
ោះ	កោះ	គោះ	/ɑh/	/ŭəh/	aôh	ŏăh	oḥ	ɑh	ŭəh
ៅ	តៅ	ទៅ	/aw/	/ɨw/	au	ŏu	au	aw	ɨw
ុំ	ដុំ	ទុំ	/om/	/um/	om	ŭm	uṃ	om	um
ំ	ចំ	ទំ	/ɑm/	/um/	âm	um	aṃ	ɑm	um
ាំ	ចាំ	ជាំ	/am/	/ŏəm/	ăm	ŏăm	āṃ	am	ŏəm
ាំង	តាំង	ទាំង	/aŋ/	/ĕəŋ/	ăng	eăng	āṃng	ang	ĕəng
ះ	តះ	ទះ	/ah/	/ĕəh/	ăh	eăh	aḥ	ah	ĕəh
ៈ	វណ្ណៈ	ជីវៈ	/aʔ/	/ĕəʔ/	ă	eă	à	aʾ	ĕəʾ

Independent vowels

Note that words spelt with independent vowels should always have respellings in entries, for example ឩកា (ʼuukaa) should be respelt as អ៊ូកា.
Also note that the independent vowel អ (ʼâ) is different from the consonant sign អ (ʼɑɑ). On Wiktionary, only the latter should be used in entries.

Independent vowels	UN romanization	IPA
អ	â	/ʔɑʔ/
អា	a	/ʔa/
ឥ	ĕ	/ʔe/
ឦ	ei	/ʔəj/
ឧ	ŏ	/ʔ/
ឨ
ឩ	ŭ	/ʔu/
ឪ	ŏu	/ʔɨw/
ឫ	rœ̆	/ʔrɨ/
ឬ	rœ	/ʔrɨː/
ឭ	lœ̆	/ʔlɨ/
ឮ	lœ	/ʔlɨː/
ឯ	é	/ʔeː/
ឰ	ai	/ʔaj/
ឱ, ឲ	aô, aôy	/ʔaːo/
ឳ	âu	/ʔaw/

Diacritics

Diacritics	Name	Notes
ំ (ំ)	nɨkkĕəʾhət (និគ្គហិត)	niggahita; nasalizes the inherent vowels and some of the dependent vowels, see anusvara, sometimes used to represent [aɲ] in Sanskrit loanwords
ះ (ះ)	rĕəh muk (រះមុខ)	"shining face"; adds final aspiration to dependent or inherent vowels, usually omitted, corresponds to the visarga diacritic, it maybe included as dependent vowel symbol
ៈ (ៈ)	yukuəl pintuʾ, yukĕəʾlĕəʾ pintuʾ (យុគលពិន្ទុ)	yugala bindu ("pair of dots"); adds final glottalness to dependent or inherent vowels, usually omitted
៉ (៉)	muusekaʾtŏən (មូសិកទន្ត)	mūsikadanta ("mouse teeth"); used to convert some o-series consonants to the a-series
៊ (៊)	trəysap (ត្រីសព្ទ)	trīsabda; used to convert some a-series consonants to the o-series
ុ (ុ)	kbiəh kraom (ក្បៀសក្រោម)	also known as bok cəəng (បុកជើង); used in place when the diacritics trəysap and muusekaʾtŏən impede with superscript vowels
់	bɑntɑk (បន្តក់)	used to shorten some vowels
៌ (៌)	rɔbaat (របាទ) reiphaʾ (រេផៈ)	rapāda, repha; behave similarly to the tŏəndĕəʾkhiət, corresponds to the Devanagari diacritic repha, however it lost its original function which was to represent a vocalic "r"
៍ (៍)	tŏəndĕəʾkhiət (ទណ្ឌឃាដ)	daṇḍaghāta; used to render some letters as unpronounced
៎ (៎)	kaak baat, kaakaʾ baat (កាកបាទ)	kākapāda ("crow's foot"); more a punctuation mark than a diacritic; used in writing to indicate the rising intonation of an exclamation or interjection; often placed on grammatical particles such as /na/, /nɑː/, /nɛː/, /vəːj/, and the feminine response /cah/
៏ (៏)	ʾahstaa (អស្តា)	denotes stressed intonation in some single-consonant words^[1]
័ (័)	sangyook saññaa (សំយោគសញ្ញា)	represents a short inherent vowel in Sanskrit and Pali words; usually omitted
៑ (៑)	viriəm (វិរាម)	a mostly obsolete diacritic, corresponds to the virāma
្ (្)	cəəng (ជើង)	a.w. coeng; a sign developed by Unicode to input subscript consonants, appearance of this sign varies among fonts

References

^ Unicode Character 'KHMER SIGN AHSDA' (U+17CF)

[1] Unicode Character 'KHMER SIGN AHSDA' (U+17CF)

[1]