Wiktionary talk:About Indonesian
Add topicAppearance
Latest comment: 4 years ago by Metaknowledge in topic RFD discussion: April 2019–February 2020
Guidelines for lemma
[edit]@Xbypass Thanks for creating this guideline on how to create entries for Indonesian. I've added more information to it recently.
- Would you mind checking Wiktionary:About Indonesian#Language consideration to see if there are any issues/concerns?
- I've removed the section on Jawi script as an alternative form because the Jawi script is generally used for Malay, and not Indonesian.
- I have some concerns over the consideration of affixed forms, e.g. membelikan as a non-lemma. I think that affixed words (kata berimbuhan) are valid lemmas as well. For comparison, the following forms in English (which have irregular patterns of word formation, i.e. not all adjectives can have such forms) are considered as English lemmas:
- On the other hand, the following words (which have regular patterns of word formation, i.e. almost every verb/adjective can have such forms) are considered as non-lemmas:
- Likewise, for Indonesian, I think that these words which have regular patterns of word formation can be considered as non-lemmas:
- bukuku (“my book”)
- bukumu (“your book”)
- bukunya (“his/her/their book”)
- bukulah (“emphatic form of book”)
- bukukah (“questioning form of book”)
- dirajai (“passive form of merajai”)
- dirajakan (“passive form of merajakan”)
- kurajai (“1st person passive form of merajai”)
- kurajakan (“1st person passive form of merajakan”)
- On the other hand, affixed forms of words (kata berimbuhan) such as bercerita and membelikan which do not have predictable pattern of word formation (the affixed word can sometimes have new meanings) should be considered as lemmas. What do you think? KevinUp (talk) 01:18, 10 November 2019 (UTC)
- @KevinUp Thanks for adding more information to it, while I had some issue with my schedule. I will answer based on my personal consideration, as far as I remember, when I wrote it.
- While the criterion of Wiktionary:About Indonesian#Language consideration is straight forward ("[...] are not considered lemmas in Indonesian unless they are attested in spoken or written forms of Indonesian"), it has problem with the condition of Indonesian speaker as most of Indonesian speakers (if not all of them, as Malay is considered different with Indonesian) are, in fact, L2 speakers. This condition create code-switching or language alternation situation, diglossia and heteroglossia. Thus, it is not an easy task to define the limit between Indonesian and those regional languages (including Malay).
- This section was written based on principle "While Indonesian is written in Latin script officially, it is possible to write it in other scripts (transliteration)". Although it is uncommon, the real-life example is Balinese script-ed Indonesian on government building front-plate.
- This section was written based on general rule about lemma which is "In morphology and lexicography, a lemma (plural lemmas or lemmata) is the canonical form, dictionary form, or citation form of a set of words (headword)" and inspirational Japanese consideration "Wiktionary may be used by students who are not proficient". In those terms, in order to avoid non-proficient users misunderstanding, the lemma definition shall be created on base of real-time Indonesian dictionary behave and thought. Thus, the lemma entry in Indonesian is the kata dasar (“lexical stem”) in Latin script using the latest spelling (2015 Ejaan Bahasa Indonesia). Everything else, including the affixed terms, are non-lemma. This rule can be confirmed in the dictionary as beli » membelikan and cerita » bercerita.
In relation with previous point, most affixed terms are regular and can be predicted with{{id-der}}
which based on{{ms-der}}
. However, there are some irregular affixed terms with same affix but different affixed terms, such as pengajian and pengkajian through peng--an. As I don't understand the code of those template, I will not edit those template as it is used in Malay. Perhaps, it can be modified to include three class of derived terms, ie the regular affixed terms (which have derivation rule and can be predicted), irregular affixed terms (which same as regular affixed terms, but the affixed term must be entered) and compound terms. On the current situation, I don't recommend the usage of{{id-der}}
.
- Thanks. Xbypass (talk) 15:52, 11 November 2019 (UTC)
- @KevinUp Thanks for adding more information to it, while I had some issue with my schedule. I will answer based on my personal consideration, as far as I remember, when I wrote it.
- @Xbypass: Thanks for replying.
- I've reworded Wiktionary:About Indonesian#Language consideration so that priority is given for the creation of entries in these regional languages. Yes, I'm aware of the situation of diglossia or heteroglossia in the daily life of Indonesian speakers. For Wiktionary, the criteria used is if the word can be found in three independent examples of permanently recorded media (e.g. newspapers, video, audio) of works in Indonesian, then the word can be considered a borrowing. For example, if there exists an audio recording that can be understood by Indonesian speakers from other regions, but has one or two words that cannot be understood, then these words can be considered as borrowings, as long as two other independent usages of these words in different sources can also be found.
- This is interesting. Theoretically, this is possible for any language, e.g. using Hangeul to transliterate Indonesian words in a "Learn Indonesian for Koreans" book or video, so it is not ideal to include such usages on Wiktionary.
- If the canonical form or dictionary form is taken as a lemma, then words listed in KBBI such as beli, membelikan would be considered as lemmas while words that exist in Indonesian but are not listed such as dibeli, dibelikan, kubelikan, belilah would be considered as non-lemmas.
- Note that the current English definition for lemma on Wiktionary is:
The canonical form of an inflected word; i.e., the form usually found as the headword in a dictionary, such as the nominative singular of a noun, the bare infinitive of a verb, etc.
- However, I think the headword criterion is not suitable to decide what qualifies as an Indonesian lemma because:
- Indonesian dictionaries have a different method of arranging words compared to English, i.e. lexical stem first, followed by an exhaustive list of affixed forms whereas English words are arranged A to Z regardless of whether it is an affixed word or not.
- Unlike European languages, an Indonesian verb does not have a bare infinitive ("to be") form which also happens to be an unaffixed stem. Simple verbs such as bekerja (“to work#Verb”) can be formed out of nouns such as kerja (“work#Noun”).
- In Indonesian, not all affixes can be applied to the same lexical stem. Affixed words such as kemakanan simply do not exist and would appear ungrammatical when encountered. In contrast, other languages such as Japanese or Korean have a fixed set of affixes that can be applied to any lexical stem.
- Affixed terms often have figurative senses that need to be learned individually. For example, compare:
- a. raja (“king”)
- b. hujan (“rain”)
- menghujankan (“to let rain fall upon; (figuratively) to release”)
- menghujani (“to release bullets, arrows; to lash out words, statements; to perform cloud seeding”)
- Regarding beli » membelikan and cerita » bercerita, the arrows used in the online version of KBBI is used to create a hyperlink that links towards the lexical stem. It does not indicate that the affixed word is a non-lemma. On the other hand, words with nonstandard spelling such as bapao [1] and ceritera [2] use a different type of arrow to redirect to its standard forms and have no definitions. Such entries are the non-lemmas in KBBI as they contain no definitions.
- 4. Regarding
{{id-der}}
, I would like to point out that it is the affix (morphemes such as ber-, meng-) that has regular meaning, and not the affixed terms which may contain figurative senses. Yes, the code is not able to handle irregular affixed terms, which is why I would include these irregular spellings after the numbered forms. The template is not easy to use but you can list out the regular spellings and I will attempt to convert them to the numbered forms when I see them. KevinUp (talk) 02:14, 12 November 2019 (UTC)
- My explanation above is rather lengthy, but my main point is that all words listed in KBBI are legit lemmas, whereas those not included in KBBI such as the following:
- are to be considered as non-lemmas. I hope you will consider my proposal regarding non-lemmas and consider affixed terms such as merajai, merajakan as lemmas, otherwise a lot of cleanup has to be done. KevinUp (talk) 02:14, 12 November 2019 (UTC)
- @Xbypass: Thanks for replying.
The following information passed a request for deletion (permalink).
This discussion is no longer live and is left here as an archive. Please do not modify this conversation, but feel free to discuss its conclusions.
No useful content. The one example given does not correspond with reality. SemperBlotto (talk) 06:31, 6 April 2019 (UTC)
- Keep. There is a bit of useful content, and the one example can easily be fixed, as I am about to do. —Μετάknowledgediscuss/deeds 03:19, 27 May 2019 (UTC)
- RFDO-kept. Heavily improved in the mean time, it seems. —Μετάknowledgediscuss/deeds 06:41, 24 February 2020 (UTC)