Wiktionary talk:About Vietnamese

From Wiktionary, the free dictionary
Latest comment: 2 months ago by MuDavid in topic RFM discussion: August 2024
Jump to navigation Jump to search

As there were several request for this page, I created it based on what I feel seems to be common consensus. Please expand where necessary. One issue I still see is where to draw the line between modern and Middle Vietnamese. I took "from the 19th century" from Wikipedia, but we may want to draw the line differently. MuDavid 栘𩿠 (talk) 09:15, 5 June 2019 (UTC)Reply

Spelling

[edit]

This article currently states:

Wiktionary uses the "modern" spelling, always writing i instead of y in monophthongs ( instead of mỹ) and putting the tone accent on the second vowel oa, uy, oe (khoẻ instead of khỏe). Other spellings should be added as "alternative spelling of".

This is an accurate statement of the English Wiktionary's current practice, owing to efforts by Wyang and others to ensure page naming consistency. There's some discussion of it at Talk:tuỵ đạo. However, I'm concerned that it would be considered a policy around here based on this article. I understand the desire for consistency, but Wiktionary should not pick sides when it comes to either orthographic reform. The tone mark placement reform is largely ignored outside Vietnam, and the i/y reform is inconsistently applied even within Vietnam.

More practically, I'm concerned that our strict application of this policy would lead to misleading entries. For example, Tòa Bạch Ốc implies that Toà Bạch Ốc is the more mainstream spelling. But this term is primarily used by overseas communities that largely ignored the tone mark placement reform; Nhà Trắng is preferred in Vietnam where the reform has taken hold. In more extreme cases, this policy has required us to use unattested spellings of words that became obsolete prior to the reform, such as tuỵ đạo.

I created {{vi alternative spelling of}} with the ability to autodetect the spelling used; I'd like to see it used instead of {{alternative spelling of}} on entries like tụy đạo. But to more fully address the concerns above, I think we should look to our treatment of European versus Brazilian Portuguese – making sure to always qualify items in "Alternative forms" sections and with {{term-label}} – as a way to remain a little more neutral and less proscriptive as a project.

 – Minh Nguyễn 💬 20:08, 4 April 2020 (UTC)Reply

Hi Minh, how are you? (BTW, I think I wrote you some more messages over at vi.wikt.)
It's quite interesting to see there are people who actually care about Vietnamese orthography, even though they're living at the other side of the world. :-) (In Vietnam, people happily type cuả and qui, and I've seen signs saying GÔ˜ and worse.)
Concerning attestation, it is a general (but apparently not unanimous) understanding here at en:wikt that only terms need to be attested, not spellings. So the fact that tuỵ đạo is not attested as a spelling is actually no problem, as it is the same term as tụy đạo, which is attested. Many other languages (Middle English, for example) have similar issues, and also solve it by (arbitrarily) picking one standard, regardless of whether it is consistently attested. The template {{vi alternative spelling of}} is otherwise a nice idea, so as far as I'm concerned it can be implemented.
Another issue you raised (in an edit for Vy, I think) is the reform in names. I guess you're right that names weren't reformed and that the traditional spelling should therefore remain the main lemma. But when it comes to names, different spellings could actually be considered different names, so attestation becomes more of an issue. Attestation criteria at en.wikt for proper names are messy at the moment, so idiosyncratic spellings like Thy and Sỹ could become an issue.
There's been some discussion in the beer parlor about picking sides when it comes to orthography, which is an issue in many other languages (including English), but it seems to be a big challenge to solve it.
MuDavid 栘𩿠 (talk) 09:01, 29 April 2020 (UTC)Reply
tiếng Mĩ/tiếng Mỹ is another example. In this case, "Mỹ" was not exempted from the spelling reform. I would think there's some value in giving priority to a very common way that Vietnamese speakers in America refer to English over a theoretical way in which Vietnamese speakers in Vietnam could refer to English if they wanted to sound like Americans. – Minh Nguyễn 💬 21:37, 14 January 2021 (UTC)Reply

Duplication in adverbs identical to adjectives

[edit]

Just like German and Dutch, Vietnamese adjectives can be used as adverbs without change, leading to a lot of duplication (see for example chỉn chu, đơn phương). Would it not be better if we, like the other languages, introduced the rule not to create a separate adverb section if this is the case? (For example: Wiktionary:About_Dutch#Adjectives_and_adverbs.) (Pinging @PhanAnh123, Mxn.) MuDavid 栘𩿠 (talk) 03:50, 16 February 2022 (UTC)Reply

No reaction yet? Pinging @ColePeltier93 who seems to be active in Vietnamese as well. MuDavid 栘𩿠 (talk) 02:25, 2 March 2022 (UTC)Reply

As nobody objected, I just plunged ahead and wrote something more general. If anyone disagrees, please say so before I do mass cleanups. ☺ MuDavid 栘𩿠 (talk) 03:51, 29 March 2022 (UTC)Reply

RFD discussion: November 2023

[edit]

The following discussion has been moved from Wiktionary:Requests for deletion (permalink).

This discussion is no longer live and is left here as an archive. Please do not modify this conversation, but feel free to discuss its conclusions.


The Vietnamese SOP Problem

Is no one of a high authority going to sort this problem out? There are roughly 80 articles in the "Requests for deletion in Vietnamese entries" page, with a majority of the reasons being either SOP or Ad Hoc constructions. I understand that from the perspective of an English speaker, these "words" such as "hạt dẻ" (chestnut) "kẻ giết người" (murderer) or "người khuyết tật" (disabled) to such a person may seem reasonable considering they are "words" in English, but as a Vietnamrse, they can clearly be de- and reconstructed without changes in meanings; something like "xe đạp" or "hạt dẻ" may be more dubious, since I haven't heard anyone say "dẻ" itself and "xe đạp" means something more than just "xe" + "đạp". Other entries, however, are rather laughable, e.g. "hổng dám đâu", "rắn lại" or "được kính trọng". Maybe my point of view is wrong or short-sighted, and these are indeed "words"; nonetheless, it's piling up and someone should do something about it in my opinion. 2402:800:B180:4872:61D3:7621:B1BC:C358 13:07, 4 November 2023 (UTC)Reply

Okay, maybe if "không dám đâu" means something larger than its combined parts, I guess it's not so laughable. 2402:800:B180:4872:61D3:7621:B1BC:C358 13:12, 4 November 2023 (UTC)Reply
The process is that we discuss the request and reach consensus on whether a term meets the criteria for inclusion; if not it is deleted. All users have equal authority in the discussion. The problem is that only people who are competent speakers of Vietnamese are able to present reasoned arguments, and we have only seven active editors who are native speakers of Vietnamese, perhaps two of which occasionally visit this page.  --Lambiam 13:37, 7 November 2023 (UTC)Reply
The problem is not specific to Vietnamese. Editors working on languages they don’t know well enough are a site-wide problem, but we’re working on it. “Roughly 80 articles” aren’t that much of a problem, which is why we’ve been quite blasé about it and it’s only now I’ve decided to do some mass nominations (see higher on this page). And if you think “someone should do something about it”, don’t forget you can be that someone. MuDavid 栘𩿠 (talk) 01:56, 8 November 2023 (UTC)Reply
This problem is not unique to Vietnamese but also to languages fitting the w:scriptio continua description, even if Vietnamese uses spaces between syllables (rather than no spaces at all), which still makes it difficult what a word is.
Editors for Mandarin Chinese or Thai (using a more active group of editors as an example, compared to others) face the same problem but seem to agree on individual cases or words.
I haven't seen many discussions on Vietnamese but a separate WT:CFI, specific to Vietnamese would be good to make a decision easier whether a word should or should not be included.
Regarding kẻ giết người, I am afraid there is no rule described, e.g. if "kẻ + verb" type of words should be included. Try checking with native speakers, like User:PhanAnh123.
I agree with @MuDavid and since we don't have any Vietnamese specific CFI, it goes by a vote.
I am OK to delete the term and split translations into components, as in murderer#Translations like this {{t|vi|kẻ giết người}} to get kẻ giết người. Anatoli T. (обсудить/вклад) 03:30, 8 November 2023 (UTC)Reply
That’s what we’ve been doing. (But there’s a gazillion noobs who don’t.) And it’s User:PhanAnh123 who nominated kẻ giết người. MuDavid 栘𩿠 (talk) 06:56, 8 November 2023 (UTC)Reply
As a native Vietnamese speaker, I agree. Penn Zero MSSJ (talk) 17:33, 11 November 2023 (UTC)Reply
One problem I see with this view point is that you're saying that Vietnamese natives would be able to deconstruct these words but English speakers would not. These words are on the English Wiktionary site, thus they're meant for English speakers. It would follow that keeping compounds makes sense so long as an English speaker could not reliably deconstruct them. I also think some of these "SOP"s are overzealous in nature.
For example, I created an article for tạo ra because I couldn't find it on Wiktionary, but my Vietnamese dictionary (Từ điển Lạc Việt) had it. I can find many other Vietnamese dictionaries that list this as a separate entity than its parts, yet my page was listed as an SOP. If Vietnamese dictionaries list these as separate entities, then why shouldn't Wiktionary? LeChatParle (talk) 19:00, 27 November 2023 (UTC)Reply
As I told you, different dictionaries have different criteria for inclusion. Tạo ra is two words; you should look up tạo and ra separately. If there’s anything not yet clear then, then it’s the pages tạo and ra that need editing. And there’s no need for “deconstructing”, whatever that means. MuDavid 栘𩿠 (talk) 01:28, 28 November 2023 (UTC)Reply
"as I told you" - this is the first time we've ever talked, so you haven't said that to me before.
"whatever that means" - it's actually clear what it means, which is ironic given the circumstances. It's defined as: "1. analyze (a text or a linguistic or conceptual system) by deconstruction. 2. reduce (something) to its constituent parts in order to reinterpret it or present it differently".
It doesn't seem like you're interested in a genuine discussion on the topic based on the way you've replied, so I won't continue this conversation. LeChatParle (talk) 01:59, 28 November 2023 (UTC)Reply
I did talk to you before, but maybe you’ve been ignoring me. The point remains that nonidiomatic expressions should not be included in Wiktionary. If you don’t agree, there are other forums (such as the beer parlour) where you may make your case for a change to our criteria for inclusion, but good luck with that. We don’t have will jump, or had spoken, no matter how difficult it may be to “deconstruct” this for speakers of languages without tenses, and there’s no reason for us to include nói ra, suy nghĩ ra, phát ra, sản xuất ra, or any such, as these can be “deconstructed” by anyone with a basic grasp of Vietnamese grammar. If you believe tạo ra is different, that’s what you should be arguing. (And by the way, the “many Vietnamese dictionaries” you refer to are mostly plagiarized from one and the same dictionary, and they usually copy it errors and all. Don’t take those as your guiding compass.) MuDavid 栘𩿠 (talk) 03:18, 28 November 2023 (UTC)Reply


Reduplication

[edit]

I propose to add the following section:

Reduplication

Vietnamese has many reduplication patterns with varying levels of productivity. For a full treatment, see our appendix on the subject. For purposes of inclusion in this Wiktionary:

  • Words formed through -iếc and -ủng reduplication should not be included.
  • Full reduplication of adjectives and verbs without modification (of type đỏđỏ đỏ) should not be included.
  • Diminutive reduplication of adjectives (of type đỏđo đỏ) should be included as soft redirects to the main lemma ({{form of|vi|[[Appendix:Vietnamese reduplication#Adjectives|diminutive reduplication]]|đỏ}}), provided they meet the criteria for inclusion (such as three independent uses).
  • Full reduplication (with or without modification) of other parts of speech usually gives a different lemma (such as người người or nhền nhện).
  • Other types of reduplication are considered wholly independent lemmas.

In the first three cases, cites, quotes, example sentences, etc. should be put at the main lemma (so quotes of đỏ điếc, đỏ đỏ, and đo đỏ should go at đỏ), unless they are to prove the existence of the reduplicated form.

@PhanAnh123, Duchuyfootball, Mxn Opinions? I edited lạ and là lạ in accordance with my proposal. If you agree with the above, I’ll create some templates. MuDavid 栘𩿠 (talk) 03:42, 20 February 2024 (UTC)Reply

Agree. Duchuyfootball (talk) 09:25, 20 February 2024 (UTC)Reply
Why not allow the creation of a soft redirect for nghỉ nghiếc and other formations like it, as long as they’re attested? Is it because -iếc can be formed out of just about any word? How will readers learn what “nghiếc” is doing in a sentence, since it’s written as if it’s a word or part of a compound word, rather than a suffix in the Western sense? Minh Nguyễn 💬 01:24, 21 February 2024 (UTC)Reply
Because it can, indeed, be formed out of just any word. Entries with English schm- are also not included, see Talk:work schmerk and Talk:rational-shmational. The only exceptions are where the reduplicant has taken on an independent life, as with schmexy and schmancy. (That’s our current policy, so my proposal is not to change that. If you want to change that part, you’re welcome to open a new discussion.) MuDavid 栘𩿠 (talk) 03:12, 21 February 2024 (UTC)Reply
That’s fair. It’s still a bit awkward for those unaware of the morphological rule, but perhaps linking -iếc, as you did above, is the best we can do. Is it possible to represent this kind of reduplication in chữ Nôm as well? If so, how should we document the relationship between the quốc ngữ and Nôm forms? Minh Nguyễn 💬 15:25, 21 February 2024 (UTC)Reply
I don’t know much about Nôm script, but as far as I know -iếc reduplication is rather informal so I wouldn’t expect it to be very widely attested in Nôm manuscripts. Are many reduplications of đo đỏ or lung ta lung tung type even attested in Nôm script at all? MuDavid 栘𩿠 (talk) 02:55, 23 February 2024 (UTC)Reply
@MuDavid: Yes, this paper discusses the various forms of từ láy that are attested in one 1887 work written in chữ Nôm. This page from Nguyễn Trãi's Quốc Âm Từ Điển discusses the rules for transliterating từ láy into quốc ngữ. I haven't fully digested either resource yet, but I have a feeling that we've just scratched the surface. Minh Nguyễn 💬 02:37, 27 February 2024 (UTC)Reply
Okay, so the first paper you link to gives nontrivial Nôm characters for đo đỏ type reduplication (儒𡮈 for nho nhỏ), which gives an extra argument for the creation of entries for this type of reduplication. I’ll add the section I proposed, and if you find instances of -iếc type reduplication in Nôm script, we can refine our approach.
BTW, @PhanAnh123, Duchuyfootball, Mxn, currently our appendix says of lung ta lung tung type reduplication that as it is “(generally) productive, the reduplicatives are not considered lemmas.” The arguments for inclusion of đo đỏ type as soft redirects apply here as well, so should we not include those as soft redirects also? MuDavid 栘𩿠 (talk) 02:32, 4 March 2024 (UTC)Reply
I'm confused. Are you saying we should include "lung ta lung tung type reduplication" as soft redirects also? Duchuyfootball (talk) 14:19, 4 March 2024 (UTC)Reply
I think, by your own arguments, they should be included. ☺ Soft redirects (in the style of “-a reduplication of lung tung”) would do the job, I think. Or do you think they should be full-fledged lemmas? MuDavid 栘𩿠 (talk) 00:53, 5 March 2024 (UTC)Reply
Soft redirects seem reasonable to me. We also have templated soft redirects for orthographical variations, such as xòe (which I suppose can be reduplicated too). Minh Nguyễn 💬 05:38, 7 March 2024 (UTC)Reply

Done, and templates created. I'm awaiting your input concerning -a reduplication (lung ta lung tung type). MuDavid 栘𩿠 (talk) 07:30, 5 March 2024 (UTC)Reply

It appears nobody objects, so I’ll edit policy as I proposed above. MuDavid 栘𩿠 (talk) 02:30, 22 March 2024 (UTC)Reply

RFD discussion: October 2023–March 2024

[edit]

The following discussion has been moved from Wiktionary:Requests for deletion (permalink).

This discussion is no longer live and is left here as an archive. Please do not modify this conversation, but feel free to discuss its conclusions.


Vietnamese, reduplication in the style of schm-. Should not be included according to our appendix. MuDavid 栘𩿠 (talk) 03:48, 19 October 2023 (UTC)Reply

It would be unfair for every other reduplication if this word is deleted. Its meaning has a nuanced experession to just "lạ". Duchuyfootball (talk) 15:20, 16 November 2023 (UTC)Reply

@Duchuyfootball, how is this “unfair”? We’re just following our rules. And how is the nuance of this instance of full adjective reduplication different from other instances of full adjective reduplication? Did you read our appendix? MuDavid 栘𩿠 (talk) 03:46, 17 November 2023 (UTC)Reply
I would argue that there are only a number of adjectives having reduplication form. Nobody uses "loàng loãng", "dề dễ", "mầm mập" and the list goes on. Even then, the extra word used for reduplication varies from words to words, so why should they be omitted altogether? Maybe if extra pages are redundant, someone should make a table for different form of the word, like cases in Latin? Duchuyfootball (talk) 08:36, 17 November 2023 (UTC)Reply
@Duchuyfootball You’d be surprised at what people use:
@PhanAnh123 I’ve been thinking about this, and I believe it may be better to treat this kind of reduplication as a derivation, such that là lạ and its ilk would be converted to soft redirects (as English -ing forms are, for example). What do you think? MuDavid 栘𩿠 (talk) 02:18, 1 December 2023 (UTC)Reply
I admit I was taken aback, only because the first two examples are typos.
Example 1: I believe the correct usage is "làm loãng" (to make diluted).
Example 2: Should be "Đề dễ" (the test was easy) if you read the whole paragraph. The sentence means "The test was easy but was a bit long."
Example 3: OK, there is usage of this word, but it is really outdated.
As for derivation part, no opposition. Nonetheless, I think if somebody created a whole page just for the purpose of redirecting the phrase back to its root word, then why not just give a definition anyway? Duchuyfootball (talk) 06:11, 1 December 2023 (UTC)Reply
P/S: I just read the criteria, and I think "mầm mập" is not widely used to such an extent that it can be attested as a new expression. Duchuyfootball (talk) 08:51, 1 December 2023 (UTC)Reply

Same. MuDavid 栘𩿠 (talk) 03:48, 19 October 2023 (UTC)Reply

Same. MuDavid 栘𩿠 (talk) 03:48, 19 October 2023 (UTC)Reply

Resolved through modification of policy. MuDavid 栘𩿠 (talk) 02:44, 6 March 2024 (UTC)Reply

RFM discussion: August 2024

[edit]

The following discussion has been moved from Wiktionary:Requests for moves, mergers and splits (permalink).

This discussion is no longer live and is left here as an archive. Please do not modify this conversation, but feel free to discuss its conclusions.


Moving Vietnamese river names to name without sông

All Vietnamese river names can be preceded by sông. In some cases (like sông Hồng), this sông is part of the name (in which case official rules decree an uppercase S), but in most it is not. Many entries (see Category:vi:Rivers in Vietnam for example) include this sông. I think we should not, as all those names can be attested without, making those entries SoP. (In some cases the form without sông is rare, such as for the Hàn river, which is homophonous with Korea, but it’s possible.)

I propose that all those entries (except sông Hồng) be moved to the name without sông (and also that sông Hồng be moved to Sông Hồng). I also think it’d be nice to create a template in the style of {{zh-div}} to indicate to beginners that the sông is usually there. @PhanAnh123, Mxn, Duchuyfootball MuDavid 栘𩿠 (talk) 03:45, 21 August 2024 (UTC)Reply

Can we really exclude sông? The word is there to specify that this is a river and not anything else. In case of Chinese, river names have suffixes like "河" (river) or "江" (river). Could you give us some instances where Vietnamese river names can be attested without? Duchuyfootball (talk) 04:34, 21 August 2024 (UTC)Reply
I gave this link with an example. ☺ Did you read the decision 4148-QĐ/VPTW I linked to above? MuDavid 栘𩿠 (talk) 06:50, 21 August 2024 (UTC)Reply
I missed the example, sorry. The exclusion of sông is possible in this particular context because the word is already used once in the sentence, therefore adding it again with the names will cause a lot of repetition. Also take into account the literature context of the whole post. The removal of sông is intentional, not because the name can be used seperatedly, but because it helps with the tone, the format, etc.
The point is the word sông must appear somewhere so that readers can get its context. I argue that beginners MUST know that the sông is always there. When they take advanced Vietnamese literature lessons, they might understand that the emission is possible contextually.
By the way, when you type the name in the search box, the whole phrase appears anyway. So I don't think we need to worry if someone encounters a river name and tries to find it with the search box.
I've read the decision, and agree that we create Sông Hồng. Duchuyfootball (talk) 03:32, 23 August 2024 (UTC)Reply
But I digress. Let's say we go ahead and remove sông. How do we make sure which name can be used without sông? Duchuyfootball (talk) 03:43, 23 August 2024 (UTC)Reply
If contexts exist where a part of a word can be removed, then that part is not truly a part of the word. For example the world Netherlands is almost always preceded by the, but it is, in certain convoluted examples, possible to leave it out. That’s why our entry is at Netherlands and not at the Netherlands (which is a redirect). The same goes for rivers in Vietnamese. As I proposed above, we can create a template in the style of {{zh-div}} to tell beginners that sông is (almost) always there, for example:
Hàn
  1. Korea
  2. (sông ~) the Hàn River (a river in Danang, Vietnam)
As to deciding whether sông is part of the name: if the name is an adjective (like hồng) then sông can, obviously, not be removed. MuDavid 栘𩿠 (talk) 07:07, 23 August 2024 (UTC)Reply
I think it's fine if there are redirects. As to your deciding method, I think the first word of the name is written in capital anyway, so why adjectives? Duchuyfootball (talk) 08:59, 23 August 2024 (UTC)Reply
@MuDavid I'm leaning towards no. This is a style guide for government publications, not a law binding on Vietnamese speakers writ large. It doesn't necessarily dictate usage in other contexts and certainly isn't intended to be a linguistic treatise. The prefixed toponyms can certainly get awkward sometimes, but I'm thinking this might be a step too far. Mỹ is a slang shortening of châu Mỹ and Mỹ châu, but that doesn't necessarily mean châu Mỹ and Mỹ châu (or tiếng Mỹ and người Mỹ) are sums of parts in modern usage. The ~ definitions are in keeping with print dictionary conventions, including in Vietnamese, but Wiktionary already goes well beyond those conventions with separate entries for plurals, conjugations, and the like. I'm not sure how overloading the shortened entry with the original terms will help with clarity, especially in situations where the same "root" with a qualifier can refer to a number of loosely related or unrelated concepts. – Minh Nguyễn 💬 01:57, 28 August 2024 (UTC)Reply
By analogy (ignoring different case conventions across languages), we have an entry for Red River, and Red does not mention the sense of any river by that name, even though one can shorten it to “Red” in conversation, for example: “I’ve crossed the Red before.” [1] The same is possible in Vietnamese, though in more of a literary context rather than in conversation. – Minh Nguyễn 💬 02:52, 28 August 2024 (UTC)Reply
Erm, wot? Did you even read what I’m proposing?
This is a style guide for government publications, not a law binding on Vietnamese speakers writ large. What does that have to do with the fact that sông Vàm Cỏ is sum of parts?
Mỹ is a slang shortening I’m not talking about removing sông from exceptions like sông Hồng, so why bring up even more spectacular exceptions?
that doesn't necessarily mean châu Mỹ and Mỹ châu (or tiếng Mỹ and người Mỹ) are sums of parts in modern usage. Nobody says they are. Châu, tiếng, and người are completely unlike sông. (The first is not even an independent word, and the other two change the meaning.) What do they have to do with the issue at hand?
I'm not sure how overloading the shortened entry with the original terms will help with clarity, How will creating the entry Vàm Cỏ lead to overloading?
we have an entry for Red River, and Red does not mention the sense of any river by that name, Please read what I actually wrote. Sông Hồng will not be moved to Hồng.
I have to impression you’re actually arguing in favour of (part of) what I’m proposing all while saying you’re arguing against. MuDavid 栘𩿠 (talk) 03:42, 28 August 2024 (UTC)Reply
@MuDavid: OK, my apologies, I was confused by your example of “Hàn”, which is hopefully just illustrative. The mere fact that a type word can be omitted from a toponym in some contexts does not necessarily mean that we can analyze the full form as an SOP; we also need to consider the manner of derivation, which in the case of both sông Hàn and Hàn Quốc is more complex. Otherwise, I agree with trimming type words that only serve to introduce proper nouns. As for capitalization, “sông Hồng” and “biển Đông” are both common enough that at least they should have soft redirects as alternative forms, as we have for alternative tone mark placement. It may be idiosyncratic, but that’s what we’re here for. 🙂 – Minh Nguyễn 💬 15:48, 28 August 2024 (UTC)Reply
Okay, I understand the confusion. I should’ve given more illustrative examples. I’ll leave the more complex ones alone for now, then. MuDavid 栘𩿠 (talk) 02:36, 30 August 2024 (UTC)Reply