Wiktionary talk:About Old Japanese

Old Japanese on Wiktionary

Latest comment: 5 years ago19 comments9 people in discussion

It seems to me that Old Japanese has been shamefully neglected, and when it has been paid attention to, it has been shoehorned into Japanese entries in a way that will confuse learners (plus, it's a different language). Category:Old Japanese lemmas currently has just 21 entries, but {{RQ:Man'yōshū}} is linked to by way more entries than that. I'd like to go about porting these over into actual Old Japanese entries, but before that can be done, we need to figure out what a good Old Japanese entry looks like, what orthography it should use, and what Japanese templates need Old Japanese versions to be made. I'm going to start an About page, but I'd really like input from more knowledgeable people. @Cnilep, Eirikr, Mellohi!, Poketalker, suzukaze-c, TAKASUGI Shinji (and please ping anyone else you can think of who'd be interested) —Μετάknowledge^{discuss/deeds} 04:45, 6 February 2019 (UTC)Reply

@Dine2016, Huhu9001, 荒巻モロゾフ —Suzukaze-c ◇◇ 04:48, 6 February 2019 (UTC)Reply

I've now created the About page, some of which codifies existing practice, and some of which is new. For one thing, I think we shouldn't have hiragana entries at all, and that {{ojp-def}} should be deleted. I don't think the entries need to have hiragana in the headword line either. —Μετάknowledge^{discuss/deeds} 05:26, 6 February 2019 (UTC)Reply

Actually Japanese sections contain a lot of Old Japanese entries without distinguishing the two languages, which is bad. (also @Atitarev, Haplology, Wyang, エリック・キィ) — TAKASUGI Shinji (talk) 05:38, 6 February 2019 (UTC)Reply

That was my point above; I want to work out some standards before moving all of it into Old Japanese sections. —Μετάknowledge^{discuss/deeds} 05:40, 6 February 2019 (UTC)Reply

Support this move, but I'm not familiar with Old Japanese. I've been confused why this has been included into the modern Japanese section for quite some time. My only comment is, the derived terms in our present Japanese section will need to be sorted with care. KevinUp (talk) 06:36, 6 February 2019 (UTC)Reply

Maybe I'm too uneducated and unaware of the complexity of the matter, but I currently support combining all stages of Japanese into "Japanese". We can use context labels and etymology-only language codes as necessary. This might present complications regarding the presentation of romanization; perhaps Wiktionary:Grease pit#support for multiple transliterations in templates needs to be considered. The Man'youshuu is quoted in modern Japanese dictionaries; treating the language of the Man'youshuu and the modern national language as some sort of coherent entity seems to be acceptable.

As for lemmatization, I am reminded of this discussion on Eirikr's talk page. —Suzukaze-c ◇◇ 06:51, 6 February 2019 (UTC)Reply

Thank you for the link! I strongly disagree with Eirikr's main thesis there. He says: "Man'yōgana spellings are so variant that there is little value in including this: readers of OJP are expected to be adequately familiar with kanji readings and with man'yōgana in general. For similar reasons of excess variation, Wiktionary does not include all divergent spellings of Middle English or Old English words -- we generally settle on one canonical spelling." His statement about Middle and Old English is untrue — we choose one common spelling to lemmatise on, but we do create soft redirect entries for all spellings that were used. He seems to think that OJP variation is such that "all words in all languages" should be thrown out the window. Editors of paper dictionaries have agreed, and for good reason, but we are not paper, and there is no reason that we should not be able to document OJP as it was written. —Μετάknowledge^{discuss/deeds} 15:39, 6 February 2019 (UTC)Reply

Thanks for looking into this.

Orthography: I think it's a good idea to use phonographical spelling (e.g. 須流 instead of 為流 or 為 for suru, the attributive form of se- “to do”), because all templates can work out a romanization from the page title, without the need of additional parameters. However, the most common or standard spelling of a word is not necessarily a phonographical (man'yōgana) spelling. And it take editors effort to find which spelling is the most common, or correct it if the mistaken-as-the-most-common spelling has already propogated to other pages that cite the word. I suggest we use romanizations, because they are unique and predictable for each word. In addition, by using romanizations, we can present verbs such as “to exist, to be” by their stem (ar-) rather than any particular conjugated forms (阿理/安利/安里 ari). Please see also the last reply in Template talk:ja-usex#Middle Japanese header.
Romanization: I agree that the index notation should be used. On the other hand, we can give other notations in headword templates. It should be easy for the templates to generate the other notations once we have the index notation.
Pronunciation: we can give reconstructions like how we do with Middle Chinese. --Dine2016 (talk) 07:07, 6 February 2019 (UTC)Reply
Saying from me who is a native speaker of Japanese, enumeration of the all man’yōgana spelling paterns is necessary. Since Japanese has more identity from the past excluding the spellings (doesn't have any large grammatical changes from the Old Japanese), compared with the western languages (e.g. English vs Old English, Romance languages vs Latin, Modern North Germanic vs Old Norse etc.), I think that it does not matter if it treated like current situation of the Old Chinese and Old Greek languages (qualifiers "dated", "obsolate" etc. can work enough).--荒巻モロゾフ (talk) 11:54, 6 February 2019 (UTC)Reply

@荒巻モロゾフ, I don't understand the point you're trying to make. Old Chinese is treated as Chinese, and this works because the writing system is the same. Ancient Greek is treated as a separate language from Greek. Old Japanese has a different writing system than Japanese, such that treating it as Japanese has thus far led us to be forced to document it rather poorly. —Μετάknowledge^{discuss/deeds} 15:43, 6 February 2019 (UTC)Reply

Excuse me, I misunderstood about the policy for Ancient Greek. For your information, large Japanese dictionaries include ancient words and usages belonging to Old Japanese along with the entries of modern vocabulary. If Old Japanese entries are made as completely independent things from the existing Japanese entries, they will have many repetition of the contents. For that reason, if you want Old Japanese lemmas, I recommand to make them as a kind of the soft-redirects, which contain list of the man’yōgana spellings, link to the existing modern Japanese lemma and romanized spelling. If Old Japanese lemmas become auxiliary of the existing Japanese lemmas, both of man’yōgana and romanized entries can coexist like the rōmaji transcriptions.--荒巻モロゾフ (talk) 16:50, 6 February 2019 (UTC)Reply

@荒巻モロゾフ: I think I understand now. You want to keep Old Japanese within Japanese, so that man'yōgana spellings point to modern spellings. If we do that, where would information like Old Japanese verbal conjugation go? Your last sentence still confuses me; I don't know what you mean by "auxiliary". —Μετάknowledge^{discuss/deeds} 18:39, 6 February 2019 (UTC)Reply

Turning point between the classical and modern conjugations is in the early Edo period (circa 1600 CE), not in the end of the time of Old Japanese (ca. 800 or ca. 1200 CE). Classical conjugations are used even in the modern times for elegant contexts. If you include Japanese until the WW2 into the Japanese, undoubtedly classical grammar is a subset of the Japanese, because various public texts (like provisions of the law) were written in the classical grammar in that time. (See Constitution of the Empire of Japan; You can find out classsic adnominal forms, "避クル sakuru (modern 避ける sakeru)", "受クル ukuru (受ける ukeru)", etc.)--荒巻モロゾフ (talk) 20:49, 6 February 2019 (UTC)Reply

I can get behind having OJP lemmas be at romanizations, as long as we can agree to which romanization (i.e. 甲類と乙類はどう表すか？) mellohi! (僕の乖離) 12:46, 6 February 2019 (UTC)Reply

If Old Japanese were to be added, I would suppose orthography with man'yogana because I believe that was how it was mainly written in its time.-- Huhu9001 (talk) 13:51, 6 February 2019 (UTC)Reply

Chiming in after a long absence. My initial responses:

Man'yōgana: While I am not opposed to having soft-redirect entries from man'yōgana spellings, I think we'd be insane to choose man'yōgana as the lemma forms. Which spelling would we use? How do we determine frequency? Based on what corpus, of which works, and which editions of those works? What about cases where a single kanji stands in phonetically for multiple phonemes, such as using 庭 to spell the particle combination に (ni) + は (ha)? Etc., etc.

If we are to include every known man'yōgana spelling of a given word in the entry itself, we must do so in a collapsible section, and we must also ideally indicate which source(s) each is from.

Re: Μετάknowledge's comment that "[Eiríkr] seems to think that OJP variation is such that "all words in all languages" should be thrown out the window." -- not at all. I am fully supportive of including all words. I think we differ in that I view different spellings of a single word to be just that -- one word with multiple renderings. I want to catalog the words themselves, and I'm less bothered by the umpteen variant spellings possible. As I stated, I see "little value in including this [all the man'yōgana spellings]": I'm not opposed to others adding them, I just don't see the value myself. And if we are to add them, I would like to see that done in a sensible manner.

Lemma spellings: Romaji appears to be the best path forward -- I'm opposed to man'yōgana as above, and I am also opposed to kana, as the phonetic values of kana do not express the full range of reconstructed OJP phonemes, due for instance to the collapse of the /e/, /ye/, /we/ distinctions, or the collapse of the 甲類・乙類 distinctions.

However, even there, we need to hammer out how we'd indicate those linguistically-important 甲類・乙類 distinctions. The rough consensus I see in the literature is subscripted ₁ for 甲類, and subscripted ₂ for 乙類. That said, do we use these subscripted number glyphs in page titles? How do we ensure that users can access such pages? Does the MediaWiki back-end even support using these codepoints in page titles? Etc., etc.

(Dine2016 mentions "index notation" with no explanation -- is that a reference to subscripts for 甲・乙?)

Lemma forms: As best I can tell, Japanese dictionaries have used a specific conjugated form for verb entries since Japanese dictionaries have existed. I propose that we follow that long-established and expected convention, rather than locating lemma entries at theoretical root forms like ar-.

As for which conjugated form, I'd like to propose the 終止形 (shūshikei, “terminal form”), again in line with long-standing practice. We would presumably also have (or at least allow) entries for all conjugated forms, so a user who, for instance, looks up the OJP verb form aru (equating to modern ある (aru, “to be, to have”), but in OJP, the 連体形 (rentaikei, “attributive form”)) would still be able to get to the lemma page.

‑‑ Eiríkr Útlendi │^{Tala við mig} 17:42, 2 April 2019 (UTC)Reply

Ah, sorry, I changed my mind. I now prefer that Old Japanese terms be lemmatized at the actual used writing system (possibly man’yōgana), not romanizations, for consistency with the rest of Wiktionary. --Dine2016 (talk) 01:49, 3 April 2019 (UTC)Reply

I think it's worth pointing out that native Japanese dictionaries do not use man'yōgana spellings for lemma forms of OJP terms. They use modern kanji and historical kana spellings -- so modern 思(おも)う (omō) is listed in OJP forms as 思(おも)ふ (omofu), and 絵師(えし) (eshi) is listed with the historical kana ゑし (eshi). Hentaigana are not used at all, from what I've seen, even in quotes, despite the fact that historical texts that used kana used hentaigana extensively, as kana usage was not standardized until the spelling reforms of the 20th century.

By way of reference, see these various OJP or Classical entries at Weblio, in their 古文 section:

If we are to use kanji + kana as the lemma spellings for OJP entries, I insist that we hew to long-established practice as demonstrated by Japanese monolingual dictionaries. ‑‑ Eiríkr Útlendi │^{Tala við mig} 16:43, 3 April 2019 (UTC)Reply

Analysis of Old Japanese

Latest comment: 5 years ago9 comments2 people in discussion

I don't like the traditional, kana-based analysis of Old Japanese (or Japanese in general). While the nomenclature of the conjugational classes is helpful, the segmentation of verb forms can be cumbersome when the morphological boundary occurs within a syllable (or “within a kana”). For example, the verb “to exist, to be” is あり with six “stem forms” ―ら・―り・―り・―る・―れ・―れ in the traditional analysis, while it is simply ar- in a phonemic analysis. The stative auxiliary り has the same “stem forms” and attaches to the 命令形 (!) of 四段活用 verbs and 未然形 of サ変 verbs in the traditional analysis, but it is simply -e₁r- (e.g. 咲く sak-u → 咲けり sak-e₁r-i) in a phonemic analysis. Citing verbs and auxiliaries by their stem rather than the dictionary form also makes the development of words clearer, because the formant of the dictionary form may change over time. For example, the causative auxiliary remains /-(s)ase-/ over time, but the dictionary form is /-(s)asu/ → /-(s)asuru/ → /-(s)aseru/. --Dine2016 (talk) 05:04, 8 February 2019 (UTC)Reply

Re: analysis, I agree that an alphabet makes things easier than a syllabary when the morphological roots themselves manifest the abstract concept of final consonants.

Re: the stative auxiliary り, the variation in conjugated forms is an artifact of vowel fusion and not a syntactic or grammatic outcome -- see り#Etymology_3. So for 咲(さ)けり (sakeri) as 咲く + り (ri), this derived as saki + ari → sake₁ri, where e₁ is the fusion of -i + a-.

Re: the causative auxiliary, it starts out as the incomplete verb stem ending in -a + regular "to do" verb su, reanalyzed as a regular yodan conjugation rather than the irregular サ変活用. We still see this in everyday modern Japanese with forms like 飲ます. The se- vowel form appears historically later, when the su causative ending was reanalyzed from the regular yodan pattern to a shimo nidan pattern. Some sources I've seen describe this as su in the 連用形 of si + ari → seri. This jives with man'yōgana usage, such as MYS poem 2354, showing 隠在 as the spelling for かくせる. This suggests the oddity that the -r- in ari instead disappears from other conjugated forms, which leads me to think that something else might be going on instead -- the -r- from ari is very persistent in other places, and this might be more of a semi-regular shimo nidan kind of shift (just much earlier than the later Muromachi-era one). At any rate, the only unchanging portion of the causative across both history and conjugated forms is the s. FWIW, the longer form saseru is analyzed as sa-, the incomplete form of auxiliary su, + seru also from su -- bringing us back to s as the unchanging stem.

Cheers, ‑‑ Eiríkr Útlendi │^{Tala við mig} 18:20, 2 April 2019 (UTC)Reply

@Eirikr: Ah, you're right. The causative auxiliary wasn't stable in Old Japanese. I should have used another verb as an example (such as “to open”: stem: /ake₂-/ → /ake-/, dictionary form: /aku/ → /akuru/ → /akeru/). --Dine2016 (talk) 01:56, 3 April 2019 (UTC)Reply

"Open" in OJP was aku at its most basic form, which was ambitransitive. Transitivity only became morphologically (as opposed to syntactically) evident when the verb was conjugated into different forms. The theoretical root would thus be /ak-/. Likewise for most of the other modern transitive verbs ending in /-Ceru/: the historically stable root stem just ends in the /-C/ consonant, with valency shown syntactically for plain present tense / aspect, and morphologically in other conjugations when the /-Ce-/ pattern appears.

There are interesting patterns in valence switching, where root stem /CVC-/ appears as plain form /CVCu/ as an ambitransitive verb, and then the yodan and shimo nidan conjugation patterns split, where one is transitive and the other intransitive. However, from what I've read, there doesn't appear to be any consensus yet on how to predict whether a given verb will be transitive in yodan and intransitive in shimo nidan, or the other way around. In modern Japanese, these patterns have evolved to where the yodan (now godan) plain form is treated as its own verb, and the shimo nidan became a shimo ichidan independent form that is also treated as its own verb, removing the ambitransitive ambiguity. C.f. modern verb pairs tsuku (intransitive) ↔ tsukeru (transitive), nuku (transitive) ↔ nukeru (intransitive), etc. ‑‑ Eiríkr Útlendi │^{Tala við mig} 17:54, 3 April 2019 (UTC)Reply

Thanks. Maybe we are referring to different analyses of Old Japanese. In the analysis of A History of the Japanese language, ak- (yodan) and ake- (shimo nidan) are distinct lexical items derived from the same root (ak-), except that the latter has a derivative -e- “opposite transitivity”. “[R]oot + derivative constitute the lexical base” (page 52), so they are different verbs. They just happen to have the same conclusive form aku: the former has the formant -u attached directly, while the latter involves a morphophonemic rule Vl + V2 => V2 (page 40): ake + u => aku. --Dine2016 (talk) 01:51, 4 April 2019 (UTC)Reply

Sorry, ak- “to open (intr.)” was not found in Old Japanese. The earliest example given in Nikkoku is from the Taketori Monogatari (9C末 - 10C初). --Dine2016 (talk) 08:47, 4 April 2019 (UTC)Reply

That said, ak- still appears to be the root form rather than ake-, albeit difficult to source. See one such rare example from MYS poem 3034, where we encounter the conjugated form あくれば, pointing to root form あく in the 已然形 of あくれ, aligning with the standard conjugation paradigm for shimo nidan. So even if we cannot find intransitive examples of aku until later, we can clearly infer that transitive aku existed.

(after reading through the struck text) The verb paradigm for the transitive might include the -e-, but it's only apparent in conjugated forms, and thus is not part of the root, as I understand it, as in "that part of a verb that is constant and unvarying across different conjugated forms". ... Perhaps you and I are talking past each other in terms of what constitutes a "root"? ‑‑ Eiríkr Útlendi │^{Tala við mig} 00:28, 5 April 2019 (UTC)Reply

Sorry, but I think we're using different terminology. In Frellesvig's usage, the "root" is the unchanging part across etymologically related verbs. For example, in modern Japanese, 生きる and 生かす share the same root ik-, and 喜ぶ and 喜ばしい share the same root yorokob-. The "stem" is the unchanging part across different conjugated forms of the same verb. Thus 喜ぶ has the stem yorokob- and 喜ばしい has the stem yorokobashi-. Turning back to Old Japanese, assuming we have

	未然形	連用形	終止形	連体形	已然形	命令形
“to open” (intr.)	aka	aki₁	aku	aku	ake₂	ake₁
“to open” (tr.)	ake₂	ake₂	aku	akuru	akure	ake₂

The "stem" of the intransitive verb is obviously ak- and the "stem" of the transitive verb depends on how you analyse it. Frellesvig analyses it as ake₂- because there is a morphophonemic rule that allows you to attach -u to ake₂- to get aku. (This is the same morphophonemic rule that turns 吾が妹 into 吾妹（わぎも）.) However, if you analyse the stem of the transitive verb as ak-, then it can be justified to consider “to open (intr.)” and “to open (tr.)” the same verb, and the transitivity is exhibited by the different conjugational patterns. --Dine2016 (talk) 01:21, 5 April 2019 (UTC)Reply

Interesting. I find Frellesvig's contention to be odd, as /e₂/ is generally reconstructed as a glide like /əj/ or /we/. See w:Old_Japanese#Vowels, for instance. Such a /we/ would suggest /u/ + /e/, not /e/ + /u/. (The example of 吾が妹 is explainable as contraction similar to the modern shift we see in terms like takai → takē, where the following vowel causes raising and then flattening into a monophthong. We see the same effect in 我家: wagie (ancient wagipe) from wa ga ie (ancient wa ga ipe).)

That said, thank you for describing where you're coming from. And I apologize for introducing confusion to the thread re: "stem" vs. "root". I'm used to "stem" referring to the conjugated form of the verb before adding any other elements, such as kaeshi as the "-masu stem" and kaesa as the "negative / passive / causative stem" of kaesu. Cheers! ‑‑ Eiríkr Útlendi │^{Tala við mig} 19:53, 5 April 2019 (UTC)Reply

Revised `{{ux|ojp}}` template

Latest comment: 5 years ago2 comments2 people in discussion

Refer to 心 for normal {{ux|ojp}} and 庭つ鳥 for {{ja-usex}}, latter might change language to OJP. Any ideas for a {{ojp-usex}}? The 甲類/乙甲 distinction is important. ～ POKéTalker（═◉═） 20:12, 7 March 2019 (UTC)Reply

The current infrastructure is certainly unnecessarily ja-centric. —Suzukaze-c ◇◇ 03:30, 9 March 2019 (UTC)Reply

Classical VS old

Latest comment: 4 years ago2 comments2 people in discussion

There seems to be a lack of distinction between Old Japanese and Classical Japanese. Many of the words catergorized as "old Japanese" are instead 古典日本語 classical Japanese / early middle Japanese. In their corresponding Japanese wiktionary articles, they are correctly listed as so, but not in English wiktionary. XS2003 (talk) 18:47, 22 March 2020 (UTC)Reply

@XS2003, do you have a list of any such entries? ‑‑ Eiríkr Útlendi │^{Tala við mig} 20:25, 23 March 2020 (UTC)Reply

Problems of Old Japanese entries half-independent from Japanese

Latest comment: 4 years ago7 comments5 people in discussion

See also User_talk:Poketalker#日に日に. I found some ill effects in making Old Japanese entries independent from Japanese ones.

The first is that most pages only make the dead copies of Japanese entries. As long as regarding Classical Japanese as subset of (non-Old) Japanese, and there are no words that exists only in Old Japanese. In large Japanese dictionary, modern and old words are to be mixed, and the vowel distinction in OJP noted in some words. I have not confirmed the existence of dictionaries specialized for OJP. For the old words and meanings, it's sufficieent to use {{lb|archaic}} and {{defdate}}. As I posted in the above, Old Japanese conjugation (except the 連体形 of Eastern dialect) is identical with Classical Japanese.

Second reason is a problem that OJP entries lemmatized in the form that not used in actual OJP texts. Kana script is not invented before 800 AD, nor made for 8-vowel system. And yet some OJP words are lemmatized with a kana script included. They are depending on non-Old Japanese by halves. That way is not consistent with the way of other languages(like between Egyptian and Coptic, between Old Persian and Persian, between Irish, Middle Irish and Old Irish and etc). If modern Japanese does not exist in this world, those forms makes no sense.

Third, when displaying old usage examples of the word with man'yōgana, moving them to OJP and removing modern spelling reduces the quality of the article. It would be very inconvenient if someone wanted to know how old a Japanese word used.

Fourth is a problem when compared to the same East Asian language, Chinese. The oldest written source of Chinese dates back to 1200 BC; in the length of 3200 years, this language have been major changes and branched out various dialects. History of the written Japanese is only half of that; it's changes over the times are also gradual. Mandarin and Min dialect branched off in the 3rd century AD, and in about same time, also Japanese and Ryukyuan branched off. As long as the Ryukyu languages are not included in Japanese, our Japanese scope will never date back to more than that time. Please compare; an assembly of many dialects of 3200 years which treated as one language, and less diverse 1800 years which divided into the two languages. They lack balance.

In my opinion (based on @Poketalker's proposal), OJP should to be incorporated into Japanese entries (like Chinese) and making independent OJP lemmas should to be restricted, to solve these problems. @Dine2016, Huhu9001, Eirikr, Mellohi!, suzukaze-c, TAKASUGI Shinji, Metaknowledge--荒巻モロゾフ (talk) 18:40, 21 June 2020 (UTC)Reply

I supported the split between JA and OJP above, but I am also fine with a merger. To me, the important part is for someone reading, say, the Kojiki in its original orthography to be able to look up words in their man'yōgana spellings, and find an entry to tell them the OJP vowel distinctions and verbal inflection, which Japanese entries will not necessarily have. If you can make this happen, then I support any effort to that end. —Μετάknowledge^{discuss/deeds} 19:26, 21 June 2020 (UTC)Reply

I do not take Chinese as an example for any other languages. Because Chinese is a very unique shit in that: It is actually a language family, but thanks to the logogrammatic Chinese characters, all of its languages except Standard Mandarin Chinese are either poorly recorded, or dying really fast, which renders separating them almost meaningless. -- Huhu9001 (talk) 21:05, 21 June 2020 (UTC)Reply

Oppose merger on the basis that OJP is sufficiently different from modern Japanese phonologically, grammatically, and orthographically in the first place to be listed separately on this wiki. cf. separating Arabic from Moroccan Arabic and the like. Chinese is not to be taken as a model for criteria of language inclusion, as per Huhu's comments, it's really a language family that is the exception, not the rule.

As a long aside, I would chime in with my analogical experiences with Old, Middle, and Early Modern Irish, since you brought those languages up. We treat those three languages separately, and only Early Modern Irish gets merged unto "Irish". The major authoritative dictionary of the three languages, the Dictionary of the Irish Language, treats all three languages together despite their blatant differences. Many texts considered Old Irish are infested with typos and other intrusions introduced by Middle Irish or Early Modern-era scribes. We also try our best to keep the lemmas to the ones attested (so if the DIL headword is not the Old Irish form, we move it to one that is if attested; if we extend this analogy to Old Japanese, we move each and every OJP entry to a man'yōgana spelling or whatever logography used, which is fine by me). There's also many words inherited to Middle Irish and even Modern Irish with unchanged spellings but with hilariously different pronunciations, e.g. /korʲe/ and /kɛɾʲə/ being both spelled coire. Also, in 1800 years we went from:

Proto-West Germanic to Old English, Old High German, Old Dutch, Old Saxon, etc. to English, Dutch, and German.
Proto-Slavic to Russian, Ukrainian, Polish, Czech, Serbo-Croatian, Bulgarian, etc.
Classical Arabic to Maltese and various other Arabic descendants
Primitive Irish to Old Irish to Middle Irish to Irish
Vulgar Latin to the Romance languages

This brings us to rather simply moving the OJP entries to the attested spellings and leaving them be. mellohi! (僕の乖離) 21:17, 21 June 2020 (UTC)Reply

One serious challenge in lemmatizing OJP is spelling.

Much of the Kojiki and Nihon Shoki were written in 漢文 (kanbun), a kind of dialect of written classical Chinese. Limited portions of these, primarily poems and songs, were written in 万葉仮名 (man'yōgana), as was most of the Man'yōshū. Broadly speaking, man'yōgana spellings were vaguely like Chaucer's Middle English orthography: loose, and varied, with the same word potentially appearing with multiple spellings even in the same sentence, let alone across an entire work.

Thus, for most OJP terms, there is unfortunately no matter of simply moving the OJP entry to an attested spelling.

I have no opposition to the creation of entries for man'yōgana spellings.

I have serious reservations about lemmatizing at man'yōgana spellings.

How do we choose which one of many spellings to use as the lemma form? Even if we select, say, usage frequency as a criterion, what corpus do we use as the baseline for analysis, and which editions of the selected works? How do we treat spellings using archaic kanji forms? How would we ensure that users could still find the OJP entries?

Notably, even Japanese dictionaries that focus on so-called 古文 (kobun, “classic literature”) or 古語 (kogo, “old words”) lemmatize at modern spellings (such as the Benesse 古語辞典 here, recommended to me years ago by my Japanese teacher at the time when I evinced an interest in the Kojiki, or the kobun version of Weblio at https://kobun.weblio.jp/).

Lemmatizing at the modernized spellings strikes me as a compelling approach. In the absence of any other clear route forward, and considering the various usability and discontinuity issues raised by lemmatizing at man'yōgana spellings, I contend that we should follow the trends in Japanese lexicography and lemmatize OJP terms at modernized spellings. ‑‑ Eiríkr Útlendi │^{Tala við mig} 05:19, 22 June 2020 (UTC)Reply

@Metaknowledge: Regarding each man'yōgana forms, it's better to make those entries like "man'yōgana form of—" as a kind of soft redirect to the Japanese lemmas. In this case, it might be okay to create them in Old Japanese section to distinguish from modern spellings.

@Mellohi!: In phonologically, OJP changed very regularly with simple merger of some vowels and simple change of few consonants. In grammatically, basic structure of OJP is identical with modern Classical Japanese, and not changed from the branch off of Ryukyuan. In orthographically, even after the era of OJP, many variants of hiragana called hentaigana which followed Man'yōgana's method, were used without unification until the Meiji Restoration. For that reason it's difficult for modern Japanese to read old documents before Edo period. Changes from Middle to Modern Japanese is much larger than changes from Old to Middle. The examples of pairs of the language you shown are all inflectional languages, and have undergone severe historical changes under the influence of large ethnic migration. Japanese, an agglutinative language that has not experienced the conquest by foreign empires, is too stable compared to them. For Japanese, hundreds of words can be listed that have not changed for 1,300 years, but for English it may be impossible (due to the changes of inflection system, The Danelaw, The Norman Conquest and The Great Vowel Shift). Though, as mentioned above OJP forms attested needed to have articles. Even if it says merger, it does not mean that there is no lemma for the OJP category. What I said "like Chinese", means automatic categorization to OJP by the function of new template.

@Eirikr: If it's difficult to decide on one form to be lemma, then it's better to make them all to be soft redirects to the Japanese entries. Actually, the set of calques used to read kanbun in Japanese is called ”訓点語 (kuntengo)”, and there are specialized dictionaries for them[1][2]. The earliest examples of them date back to the late Nara period, when OJP was still used. This tradition to read kanbun is also one of the factors that makes it difficult to separate OJP from Japanese.--荒巻モロゾフ (talk) 10:22, 22 June 2020 (UTC)Reply

The Ryukyuan languages themselves have their own headaches with their non-existence of a standardized orthography for any of those languages.

Anyhow, @Eirikr: we could try to use the strategy of how we handle Egyptian entries: Egyptian entries use romanization as the lemmas and list the spellings in the entry itself, albeit for an entirely different reason (technical problems). This means we can redirect the attested kanji spellings (whether man'yōgana or logographic) to these romanizations. (I distinctly recall also discussing a similar solution to modern given names, where the kana was used as the lemma instead of the attested spellings due to the spellings providing way too many headaches.) That brings the problem of trying to fish out words only attested logographically; but that can be dealt with by having a special exemption.

And there still has not been given a reason why Old and Classical Japanese should be treated as the same as modern vernacular Japanese when they're clearly not even the same language, and it was even admitted that Middle and Modern are very different (with Classical being essentially Early Middle Japanese but literarized the way Latin was used all over Europe even ages after Classical ceased to be the usual vernacular). mellohi! (僕の乖離) 15:58, 22 June 2020 (UTC)Reply

Rethinking of the Old Japanese entries being dead copies of the Modern Japanese ones

Latest comment: 9 months ago5 comments3 people in discussion

Hello folks, I'd like to rethink about problematic entries of Old Japanese again. It's not preferable that Old Japanese articles lemmatised as the form which is mixture of kanji and kana. Such type of writing is the way when they are embedded or quoted in the classical form of non-Old Japanese as its subset, not Old Japanese itself. Currently, only those who can understand the non-Old Japanese words are easy to access the Old Japanese lemmas, that lacks convenience as a dictionary of an independence language. So if you are going to make independence of Old Japanese from non-Old Japanese, it might be good to lemmatise the main articles in Latin alphabet, I think. This Wiktionary is English version, thus to optimise for English speakers would be better way. An example of entries made in Latin characters, even though they are already displayable on Unicode, is the ones of Pali. Having how Pali lemmas do as a precedent, it might be better way to make link from the main articles in Latin script (with ₁ and ₂) to the sub articles in alternative forms (man'yōgana/kanji, as well as katakana with 甲・乙 annotation if it was necessary). @Eirikr, TAKASUGI Shinji, Atitarev, Fish bowl, Poketalker, Cnilep, Marlin Setia1, Huhu9001, 片割れ靴下, Onionbar, Shen233, Alves9, Cpt.Guapo, Sartma, Lugria--荒巻モロゾフ (talk) 11:08, 3 January 2023 (UTC)Reply

As time has passed, I find myself thinking more and more along similar lines to Aramaki's post above -- locating OJP entries at modernized kanji + kana spellings is not ideal, as this is a "lossy" transcription that obscures the 甲 (kō) and 乙 (otsu) vowel distinctions.

Consequently, I am increasingly open to the idea of indexing OJP entries by romaji (that is, moving existing entries to, and creating new entries at, the respective romanized spellings), using ₁ and ₂ subscripted numerals (the dedicated characters) to indicate 甲 (kō) and 乙 (otsu) vowel categories respectively.

Within such entries, we could list known man'yōgana spellings. Where appropriate, we could even create stub entries for such spellings, perhaps even copying {{ja-see}} to something like {{ojp-see}} to use on those stub entries.

→ I am curious what others think. ‑‑ Eiríkr Útlendi │^{Tala við mig} 23:34, 6 January 2023 (UTC)Reply

I tried to do this; removing all kana from the verbal entries (probably adjectives?), and moving them to man'yogana spellings, however @Poketalker undid all of my work, claiming "there's no consensus". For くぢら, it might be reasonable, but reversions like at 出, 淺? No! There clearly dead (alias non-existent) copies of Classical and/or Modern Japanese. Old Japanese has been severely neglected, and I agree with @荒巻モロゾフ, who unfortunately disappeared from WT for some very weird reason. It seems that @Poketalker doesn't understand that even in Jidai-betsu Kokugo Daijiten (Jodai-hen), the most comprehensive/unabridged/whatever you call it Old Japanese dictionary, use just the kanji spelling with no okurigana. However, kana writings are completely inappropriate, and we must lemmatize at the earliest spelling if there are no semantic spellings; use the spellings in the earliest manuscripts or ONCOJ if you can (or we could use Latin, but this is inappropriate for lemmatizing languages that never use Latin script). Second, not all kyujitai forms of kanji are use in Old Japanese, but the Shinjitai. Can we do something about this?

Also I suggest kanbun not be use especially for phonographic attestation.

Items which only discovered from Mokkan can be allowed, as long as it's phonographically attested. Chuterix (talk) 20:00, 15 January 2024 (UTC)Reply

(Notifying Eirikr, TAKASUGI Shinji, Atitarev, Fish bowl, Poketalker, Cnilep, Marlin Setia1, Huhu9001, 荒巻モロゾフ, 片割れ靴下, Onionbar, Shen233, Alves9, Cpt.Guapo, Sartma, Lugria, LittleWhole, Mcph2): This. Chuterix (talk) 20:44, 4 March 2024 (UTC)Reply

I think more discussion is needed about how to proceed with Old Japanese. As I described above in this same thread, I don't think that man'yōgana spellings should be the main or lemma form for Old Japanese, for many reasons. Some of these:

Man'yōgana spellings are too varied. We would need to identify a specific corpus (set of Old Japanese documents, and specific editions of those as well), and then do complicated analysis to figure out what spellings exist, and how frequent they are.
Man'yōgana spellings are difficult to enter, even for native Japanese speakers. This is even more difficult if we use kyūjitai characters.
No other Japanese resource that I'm aware of lemmatizes at man'yōgana spellings.

Whatever else, we should discuss this more, and with more people. The Beer parlour is probably the best venue for such a thread. ‑‑ Eiríkr Útlendi │^{Tala við mig} 01:41, 5 March 2024 (UTC)Reply

Logographic spellings and lemmatization

Latest comment: 9 months ago4 comments2 people in discussion

The Manyoshu uses logographic spellings; in which scholars such as Vovin capitalize them (see latest example for an instance). A kungana reading does not count as a logographic spelling.

Another example from ONCOJ:

打歌山乃

ututu (NO₂) YAMA no₂

mountain of reality

.

Mixed spellings should be spelled with the semantic component capitalized but the phonetic component left lower case. For example: 烏梅 (uME₂), 苦流思 (KUrusi), and 苦流之美 (KUrusimi₁).

Also like @荒巻モロゾフ says, the Old Japanese copies are just dead copies of Japanese, mixed with kanji and kana, even for a hapax legomenon like *磯邊 (OSIPE₁, OSUPI₁) (note how I gave asterisk because this spelling is not attested in Eastern Old Japanese, let alone any logographic spelling; the only remote spelling of ISO₁PE₁ is attested in a place name in MYS.11.2444, but replaced 磯 (iso) with 石 (ishi < isi) showing that iso₁ meant rock or stone originally: 石邊山 (ISO₁PE₁ (NO₂) YAMA (NO₂), “the mountain of Isope”), or stuff like EOJ 火 (pu), also a hapax legomenon really spelled as 布 /pu/ (but there's middle Japanese 陽炎 (kagerofu > kagerō)). Being a high school freshman is tough, so I cannot be online every single hour. At least one day I'll retrieve the Jidai Betsu Kokugo Daijiten from the ILL loan, depending on how busy and willing my mother is. I've also done some serious work in the Japonic field, talking to professors such as Ms. Nakagawa about the Shodon dialect and the production of a dictionary. Chuterix (talk) 00:13, 8 November 2023 (UTC)Reply

About logographic spellings: 打歌 has no entry anywhere I can find, and no ututu reading seems to fit — the closest I can divine would be utu + uta based on regular kun'yomi. 打 wasn't used in man'yōgana, and 歌 was used to spell si, so that doesn't fit either. How would 打歌 result in a reading of ututu? This seems to be neither kungana nor logographic usage.

About your mention of "mixed spellings": me is a goon reading of 梅, and me₂ is a known man'yōgana use of this character, so 烏梅 read as ume₂ can be viewed as a purely phonogramic spelling. Similarly, 苦 is used in man'yōgana spellings with a phonetic value of ku, so 苦流思 read as kurusi is again interpretable as a purely phonogramic spelling.

About MYS 11 poem 2444: FWIW, the version on the University of Virginia Japanese Text Initiative site glosses the line 石邊山 as いしへのやまの. Our copy at Wikisource (https://ja.wikisource.org/wiki/%E4%B8%87%E8%91%89%E9%9B%86/%E7%AC%AC%E5%8D%81%E4%B8%80%E5%B7%BB#:~:text=11/2444) uses the same gloss. ‑‑ Eiríkr Útlendi │^{Tala við mig} 01:12, 5 March 2024 (UTC)Reply

The "mixed spellings" are due to its coincidental semantic nature. One could also hypothetically spell this completely phonographic as 烏米, 于米, 宇米, etc. Chuterix (talk) 12:13, 5 March 2024 (UTC)Reply

Also ututu is a typo. Ututa no yama no, the mountain of Ututa. Chuterix (talk) 16:47, 7 March 2024 (UTC)Reply

Lemmatization

Latest comment: 6 months ago2 comments2 people in discussion

I want to consider to not use okurigana whatsoever should the entry be lemmatized at kanji. Either I want full kana, or the kanji associated with the word. E.g. 天照 (AMATERASU), but never *天照らす, or 忘 (wasuru), but never *忘る. For particles, I'm fine with the kana being used; alternatively see the very controversial line below.

I'm fine with abolishing kana from being used in any OJP entry, but removing kana is extremely shaky at best by many editors ("by whom?" you say: literally any editor who is fine with aramaki morozov being non-existent). But what about Old Korean? No one's gonna lemmatize at a spelling that began literally hundreds of years later after Old Korean; hangul is never used for Old Korean. Old Korean is just mixture of Chinese with coda syllables which mostly void of any phonographical information. What do we do about Old Korean, if we're going to let this happen? Chuterix (talk) 18:14, 27 May 2024 (UTC)Reply

What do Japanese references do?

For terms like wasuru, they adopt the same basic orthography for Old Japanese entries as for classical and (with some caveats) modern: kanji for the unchanging portion of the verb's morphology, and kana for the changing portion (inflecting endings). For wasuru, the wasu- never changes, while the -ru does. So dictionaries index this at the 忘る spelling. This spelling is the common spelling for Old and classical Japanese both (上代 and 古文/文語).
For terms like ama-terasu (a verbal phrase used as a modifier), the same convention applies. The noun portion ama never changes, so this is spelled with the kanji 天. For the verb portion terasu, the te- is the only moraic element that never changes, so this is spelled with the kanji 照. The -rasu portion is an inflecting ending, so this is spelled in kana as らす.
For terms like Amaterasu (a name), this is whole string is unchanging, and it is spelled in its entirety in kanji, as 天照.

→ If we want to entirely ignore how Japanese references index entries, I would recommend that we discard both kanji and kana from our Old Japanese headwords, and index at romanized spellings, using the "1" and "2" after vowels to specify 甲・乙 variants.

That said, I do not recommend entirely ignoring how Japanese references index entries. Principle of least surprise, not reinventing the wheel, etc.

What editors decide upon for Old Korean is not all that germane to Old Japanese. Different language, different texts, different conventions.

And seriously, please stop bringing up Aramaki Morozov. They could be dead for all we know. It is not helpful nor appropriate to continue with that. ‑‑ Eiríkr Útlendi │^{Tala við mig} 21:57, 5 June 2024 (UTC)Reply

RFC discussion: May 2018

Latest comment: 6 years ago1 comment1 person in discussion

The following discussion has been moved from Wiktionary:Requests for cleanup (permalink).

This discussion is no longer live and is left here as an archive. Please do not modify this conversation, but feel free to discuss its conclusions.

Middle Japanese

Since nothing has been done, I am putting these here: かめ, かへる, かへす, かはる, かはす, かふ. DTLHS (talk) 22:44, 18 May 2018 (UTC)Reply