Wiktionary:About Korean/Historical forms

This is a Wiktionary policy, guideline or common practices page. This is a draft proposal. It is unofficial, and it is unknown whether it is widely accepted by Wiktionary editors.

Policies – Entries: CFI - EL - NORM - NPOV - QUOTE - REDIR - DELETE. Languages: LT - AXX. Others: BLOCK - BOTS - VOTES.

Proto-Koreanic (?—1300)

Proto-Koreanic here refers to historical stages of Korean which are reconstructed solely on the strength of internal Middle Korean evidence, modern dialectal forms, or comparative evidence from Korean borrowings into other languages. For more on how this is done, see Appendix:Koreanic reconstructions. However, this is an extremely uncertain field. Proto-Koreanic should not normally be lemmatized, due to the paucity of well-understood sound laws and Korean's lack of dialectal diversity. The only exceptions are:

A non-productive morpheme in Old/Middle Korean where it makes sense to concentrate discussion in a single, reconstructed lemma, such as Reconstruction:Proto-Koreanic/hoL
A word which has been borrowed into Japanese or Manchu at some ancient stage, such as Reconstruction:Proto-Koreanic/Pwutukye

Old Korean (600—1300)

Periodization

In the traditional linguistic periodization of Korean established in the 1970s, Old Korean is the stage of the Korean language up to the tenth century, with the fall of the Silla kingdom and the unification of the country by the Goryeo dynasty marking the transition to Middle Korean. But because very few primary sources on pre-fifteenth century Korean were known in the 1970s, this traditional periodization was based almost entirely on conjecture; a dynastic change probably meant linguistic change.

The discovery in the 1990s of a sizable corpus of "interpretive gugyeol" (석독구결 / 釋讀口訣) texts, Korean-language glosses to the Buddhist canon written between the tenth to thirteenth centuries, greatly expanded the primary source material that linguists of Korean had at their disposal. As the interpretive gugyeol data was subject to more detailed analysis, it was discovered that the Korean language of the interpretive gugyeol glosses was orthographically and grammatically much more similar to what survives of first-millennium Old Korean rather than to fifteenth-century Middle Korean. Accordingly, the growing consensus in South Korean academia is to classify all interpretive gugyeol texts as "Late Old Korean".^[1] Some Western scholars have also recently adapted the new schema, such as Alexander Vovin. John Whitman, who as of 2015 continued to follow the traditional periodization, still noted that "much of our knowledge of OK comes from materials dating from the Koryŏ (918–1392) period".^[2]

Wiktionary follows the novel South Korean consensus and classifies all pre-fourteenth century Korean-language texts as sources of Old Korean. This has a number of advantages:

it better reflects recent findings in the field
it avoids the issue of having duplicate entries for the many terms that are attested with identical orthography and meaning in both Silla-era sources and interpretive gugyeol texts
it avoids the issue of having Middle Korean entries for lemmas that are archaic even in the mid-fifteenth century and never attested since (or sometimes never attested in Hangul form at all) but are found throughout interpretive gugyeol sources
it allows us to distinguish Middle Korean lemmas in the Hangul script and Old Korean lemmas in Chinese characters

Criteria for entries

Only entries attested from texts written in Old Korean by Koreans are considered uncontroversially valid for mainspace entries. These include:

Hyangga poems
Interpretive gugyeol glosses
Korean grammatical elements in Idu texts from before the fourteenth century
Korean text in wooden tablets discovered by archaeologists

When citing hyangga, take care to note that Cheoyong-ga (처용가 / 處容歌), Seodong-yo (서동요 / 薯童謠), and Pung-yo (풍요 / 風謠) are all believed to be from the twelfth or thirteenth century, not their claimed date of composition. This must be noted in the quotation. The claimed date of composition may be taken as largely factual for the other twenty-one hyangga.

Chinese wordlists

There are a few wordlists for Old Korean written by Chinese visitors, the most significant of which is the twelfth-century Jilin leishi. However, these Chinese transcribers were simply transcribing what they heard, unaware of the rules of Old Korean orthography, and as a result produced orthographically invalid forms. For instance, the Jilin leishi writes the Old Korean words for "one" as 河屯 and "two" as 途孛, but we know that the actual way Koreans wrote these words was 一等 and 二尸.

It is not clear whether the words given in these lists can be used as the basis of their own entries. For now, the following guidelines stand:

References to these wordlists are strongly recommended in otherwise attested Old Korean entries. See 有叱 (*Is-) and 無叱 (*EPs-) for examples of the use of Leishi data in Old Korean entries.
References to terms first attested in these sources may be discussed in the Etymology sections of the corresponding Middle Korean or Modern Korean sources.
It is unknown whether if terms attested only in these sources should be lemmatized.

Proper noun reconstructions

Many Old Korean morphemes are reconstructed from proper nouns given in the traditional histories. For example, the twelfth-century history Samguk sagi gives a large number of placenames, personal names, and titles in two forms. One form generally appears to be a translation into Classical Chinese of the meaning of the name, and the other seems to be a transcription of the pronunciation using Chinese characters. Linguists have reconstructed non-Chinese morphemes by comparing the translation form to the transliteration form.

However, this is not considered sufficient attestation for an independent Wiktionary entry. As the Wikipedia article on the placenames of the Samguk sagi discusses, many of the morphemes reconstructed in such a way may not have been Korean at all, but reflect a Japonic or other substratum.

As with Chinese wordlists, references to such reconstructions are strongly recommended in the Phonology sections of otherwise attested Old Korean entries, and in the Etymology sections of likely Middle and Modern Korean reflexes. See 거칠다 (geochilda) for an example.

Non-Silla terms

There are a few terms attributable to the languages of the ancient Korean kingdoms of Baekje and Goguryeo. These languages have their own ISO 639-3 language codes and are not suitable as Old Korean entries.

Reconstructed forms

Reconstructions should not normally be made for Old Korean.

Controversial hyangga lemmas

Due to the opaque nature of hyangga orthography and the lack of a canonical translation in the primary sources, the language of the hyangga poems is difficult to parse. In Wiktionary, only some lemmas attested solely in hyangga works are considered suitable for inclusion:

Entries may be created without comment for terms that have been interpreted identically by virtually all scholars. Examples:

川理 (“stream”) in Changiparang-ga (찬기파랑가 / 讚耆婆郞歌)

慕理 (“to long for”) in Mojukjirang-ga (모죽지랑가 / 慕竹旨郞歌)

Normally, these terms are content words that 1) consist of a Chinese character translating the native word and a subsequent phonogram and 2) are clearly ancestral to a Middle Korean form.

Entries may be created for terms that have been segmented as an independent, semantically meaningful unit by virtually all scholars, but for which the meaning is disputed. Use {{unk|okm}}. Examples:

阿冬音 (“beauty?”) in Mojukjirang-ga

惱叱古音 (“words informing to a superior?”) in Wonwangsaeng-ga (원왕생가 / 願往生歌)

Entries may not be created for terms on which no scholarly consensus on the segmentation exists. Examples:

將來 (some kind of suffix or suffixes, or an auxiliary verb?) in Chamhoeopjang-ga (참회업장가 / 懺悔業障歌)

For the purposes of scholarly opinion, only interpretations that postdate the late 1970s must be considered, as the principles of Old Korean orthography were not correctly understood before then. In particular, the readings of Shinpei Ogura and Yang Chu-dong are not considered valid.

Orthography and romanization

Old Korean forms are given in the Chinese characters of the original attestations, not their reconstructed phonetic value.

Old Korean forms transcribed only by Chinese logograms, without any phonographic element, should not be included. An example is 我 (“I; me”) in the gugyeol glosses. There is simply nothing one can say about these forms other than their semantic meaning, which is in any case identical to the meaning of the Chinese characters with which they are written.

Gugyeol glosses are usually drastically abbreviated, e.g. 隱 is written as 𠃍. However, the source Chinese characters are the forms used for entry titles because:

The hyangga poems and gugyeol glosses share forms, but the orthography of the former uses the original Chinese characters, not the glossing abbreviations
There is a great deal of variation in gugyeol abbreviations even for the same character, which is avoided by using the source character instead
Some gugyeol abbreviations are not included in Unicode

When quoting primary sources, the actual gugyeol abbreviations are preferred. The source characters may be used instead if this is not feasible, but the fact that the abbreviations have been replaced by their sources should be explicitly noted.

Reconstructed romanizations are conventionally given in the Yale Romanization of Korean, and preceded with an asterisk. Per scholarly convention, romanizations for elements of an Old Korean phrase which are orthographically represented by a logogram are given in capital letters. Example:

慕理 (*KUli, “to long for”)

Only the second syllable *li is phonetically represented (as 理), so the unrepresented first syllable that we fill in with the Middle Korean reflex is given in capitals.

Given our poor understanding of Old Korean phonology, IPA pronunciations must not be added.

Middle Korean (1300—1600)

Middle Korean after the invention of Hangul is very well-attested and well-understood. Per scholarly consensus, Korean-language texts produced up to 1600 are considered Middle Korean, and subsequent texts are considered Early Modern Korean. Many late sixteenth-century texts, especially informal ones such as personal letters, show Early Modern Korean features. But for the sake of consistency, the year 1600 is used as a definitive boundary date on Wiktionary. Take care to note that many texts published in the seventeenth and eighteenth centuries are attributed to the Middle Korean period, but are linguistically clearly Early Modern. For example, almost all known sijo poems are linguistically Early Modern works, even though many are ascribed to poets who would have written in Middle Korean.

The entry titles for Middle Korean terms should be written in the Hangul script as invented by Sejong, without tone marks. However, the tone should ideally be marked within the entry itself.

A phonemic IPA pronunciation may be added for Middle Korean based on the scholarly consensus on fifteenth-century Korean phonology (see Middle Korean#Script and phonology). It is not clear whether a phonetic orthography is appropriate, given ongoing dispute over the exact vowel qualities of Middle Korean, although it is useful to allow readers not familiar with Korean to understand, for example, that intervocalic /l/ is actually [ɾ].

In addition to Hangul, Middle Korean was also written in Sinographic systems such as Idu and "consecutive gugyeol". Middle Korean terms in these scripts should be marked with the template {{spelling of}} that link back to the Hangul form. For an example, see 遣 (Yale: -kwo).

Forms attested only in 칠대만법 / 七大萬法 must be marked with {{lb|okm|Gyeongsang}}.

Lemmatizations

Per this discussion in October 2020, nouns are lemmatized at connective forms, and verbs, adjectives, and verbal suffixes at allomorphic forms with 다 (Yale: -ta). It is strongly recommended that soft redirects be made for the alternative forms. Examples:

ᄀᆞᅀᆞᆶ (Yale: kozolh) instead of ᄀᆞᅀᆞᆯ (Yale: kozol)
졀ᄯᅡ빛 (Yale: cyelsta pich) instead of 졀ᄯᅡ빗 (Yale: cyelsta pis)
깃다 (Yale: kista) instead of 깇다 (Yale: kichta)
ᄃᆞᆺ다 (Yale: tosta) instead of ᄃᆞᇫ다 (Yale: tozta)

This is because nouns are being theoretically lemmatized at a stem, whereas verbs are (by tradition) lemmatized at an actual inflected form.

Conjugations

The following templates give detailed conjugations. However, several dozen parameters must currently be manually inputted. It is hoped that it can be eventually modularized, which should remove this problem.

{{okm-conj/L}}
{{okm-conj/L!}}
{{okm-conj/H}}
{{okm-conj/H!}}
{{okm-conj/R}}
{{okm-conj/R!}}
{{okm-conj/HH}}
{{okm-conj/됴타}}

Early Modern Korean (1600—1900)

Per the discussion at Wiktionary:Language treatment requests/Archives/2020-24 § RFM discussion: January–February 2022, Early Modern Korean is now listed separately from Modern Korean under the code "ko-ear". For terms that have remained in continuous use from Early Modern times to now, there's no explicit need to add a separate Early Modern listing; however, this is up to the entry creator.

Another consideration that should be made is that many Middle Korean texts were reprinted in the Early Modern era. Usually, some of the language was modernized while other parts of the language were left in its Middle Korean state. One example is 횩다 (Yale: hyokta), which was already considered archaic by the seventeenth century and is nowadays given as a citation example of a distinctively Middle Korean form,^[3] but nonetheless continued to exist in print into the nineteenth century. Ideally, only terms that appear in an original composition of the Early Modern era should be considered Early Modern and be used with the "ko-ear" language code.

Idu and gugyeol texts continued to be produced into the Early Modern era. But as their highly formulaic phrases did not undergo any real shift during the transition from Middle Korean to Early Modern Korean, all post-thirteenth century idu and consecutive gugyeol forms are grouped as orthographic variants of Middle Korean instead of Early Modern Korean.

References

^ "차자표기 자료의 연구가 진행될수록 고려시대의 언어 현상들은 중세국어 쪽이 아닌, 고대국어의 범주로 포함시키려는 경향이 강하다."
김지오 (Kim Ji-o) (2019) “고대국어 연결 어미의 현황과 과제 [Godae Gugeo yeongyeol eomi-ui hyeonhwang-gwa gwaje, The conditions and research tasks for Old Korean connective suffixes]”, in Gugyeol Yeon'gu, volume 43, →DOI, pages 55–87
^ Whitman, John B. (2015) “Chapter 24: Old Korean”, in Lucien Brown, Jaehoon Yeon, editors, The Handbook of Korean Linguistics, John Wiley & Sons, →ISBN, pages 421–439
^ Lee, Ki-Moon, Ramsey, S. Robert (2011) A History of the Korean Language, Cambridge University Press, →ISBN

[1] "차자표기 자료의 연구가 진행될수록 고려시대의 언어 현상들은 중세국어 쪽이 아닌, 고대국어의 범주로 포함시키려는 경향이 강하다."
김지오 (Kim Ji-o) (2019) “고대국어 연결 어미의 현황과 과제 [Godae Gugeo yeongyeol eomi-ui hyeonhwang-gwa gwaje, The conditions and research tasks for Old Korean connective suffixes]”, in Gugyeol Yeon'gu, volume 43, →DOI, pages 55–87

[2] Whitman, John B. (2015) “Chapter 24: Old Korean”, in Lucien Brown, Jaehoon Yeon, editors, The Handbook of Korean Linguistics, John Wiley & Sons, →ISBN, pages 421–439

[3] Lee, Ki-Moon, Ramsey, S. Robert (2011) A History of the Korean Language, Cambridge University Press, →ISBN

[1]

[2]

[3]