Jump to content

Wiktionary:Language treatment

From Wiktionary, the free dictionary
(Redirected from Wiktionary:LANGTREAT)

The distinction between languages and dialects is not clear-cut. A lect that some regard as a dialect of a certain language may be regarded as a full, separate language by others. This page contains a list of languages and their (ISO-code-having) dialects, with notes on whether or not the dialects are treated as separate languages on Wiktionary. If there is no note about the status of a particular language+dialect group, the situation is not yet regulated. If multiple dialects are treated as a single language on Wiktionary, but there is no ISO code that represents all of them, the code of one of the dialects is used as the code for the whole language, or an exceptional code is created (for more, see Wiktionary:Languages).

For the most part, this page documents cases where Wiktionary’s treatment of lects deviates from that of the ISO/SIL, e.g. cases where we have merged lects that they have not. Cases where an ISO code has been excluded from Wiktionary altogether (typically because it was too vague to be meaningful) are also documented. Cases where the ISO/SIL itself has merged lects which they formerly granted separate codes, and we have followed suit, are not necessarily documented here.

Discussions about splitting, merging, deleting, adding or renaming lects may be conducted at Wiktionary:Language treatment requests (WT:LTR).

List of languages and dialects

Colour coding:

  • gcf: This code is treated as a language. Entries in this language are allowed.
  • acf: This code is not treated as a separate language. See the "Treatment" column to understand which code should be used instead. In certain cases, these codes may be permitted in etymologies (so-called "etymology-only" languages).
  • ber: This code represents a language family. It is not itself a language, although it may have a proto-language (ber-pro).
  • cmn: The disposition of this code is complex or unclear. See the "Treatment" column.
Code Macrolanguage Subdivisions Treatment
ak Akan fat (Fanti), tw (Twi) Only the macrolanguage is treated as a language. (discussion)
sq Albanian aae (Arbëreshë Albanian), aat (Arvanitika Albanian), aln (Gheg Albanian), als (Tosk Albanian) Only the macrolanguage is treated as a language. (discussion 1, discussion 2)
gsw Alemannic wae (Walser) Only the macrolanguage is treated as a language. (discussion)
gcf Antillean Creole gcf, acf, scf Antillean Creole is treated as a single language with the code gcf. The ISO had coded two dialects separately, using acf for “Saint Lucian Creole French” and gcf for “Guadeloupean Creole French”. (discussion) San Miguel Creole French, which is also often considered part of Antillean Creole, is treated as a separate language with the code scf.
ar Arabic abh (Tajiki Arabic), abv (Baharna Arabic), acm (Iraqi Arabic), acw (Hijazi Arabic), acx (Omani Arabic), acy (Cypriot Arabic), adf (Dhofari Arabic), aeb (Tunisian Arabic), afb (Gulf Arabic), ajp (South Levantine Arabic), apc (North Levantine Arabic), apd (Sudanese Arabic), arb (Standard Arabic), arq (Algerian Arabic), ars (Najdi Arabic), ary (Moroccan Arabic), arz (Egyptian Arabic), auz (Uzbeki Arabic), ayl (Libyan Arabic), ayn (Yemeni Arabic), ayp (North Mesopotamian Arabic), pga (Juba Arabic), shu (Chadian Arabic), ssh (Shihhi Arabic) Both the macrolanguage and its subdivisions are treated as languages, though not every subdivision with an ISO code or Ethnologue entry is deemed distinct enough for lexicographic division, and the macrolanguage code is used in place of the code (arb) which the ISO gave the standard variety of the language.
Jewish varieties of Arabic, for which there have been introduced some spotty ISO codes (ajt, aju, jye, yhd, yud, jrb), are included within the general regional dialects, or the standard variety, of which the bulk of medieval texts falling under the “Judeo-Arabic” label turn out Hebrew-script forms—as also other sectarian or tribal language forms. NOTE: Per 2023, ISO has merged apc (North Levantine Arabic) and ajp (South Levantine Arabic), and it has been requested to do the same here at Wiktionary. There is no conceptual objection to this merger, but it is awaiting resources on the part of the proposers of the merge, as it will take significant effort.
sem-ara Aramaic arc (Imperial Aramaic), oar (Old Aramaic), aii (Assyrian Neo-Aramaic), aij (Lishanid Noshan), amw (Western Neo-Aramaic), bhn (Bohtan Neo-Aramaic), bjf (Barzani Jewish Neo-Aramaic), cld (Chaldean Neo-Aramaic), hrt (Hértevin), huy (Hulaulá), jpa (Jewish-Palestinian Aramaic), kqd (Koy Sanjaq Surat), lhs (Mlahsô), lsd (Lishana Deni), mid (Modern Mandaic), myz (Classical Mandaic), sam (Samaritan Aramaic), syc (Syriac; Classical Syriac), syn (Senaya), tmr (Jewish Babylonian Aramaic), trg (Lishán Didán), tru (Turoyo), xrm (Armazic) Some varieties are treated as languages, others are not:
The code oar for “Old Aramaic” (up to 700 B.C.E.) is not used; it has been superseded by arc and syc.
“Jewish Babylonian Aramaic” (circa 200–1200 C.E.) is also not allowed L2s, as it has been superseded by arc, but its code tmr is allowed in etymologies.
Assyrian Neo-Aramaic (aii) and Chaldean Neo-Aramaic (cld) are currently treated as languages, as are aij, amw, bhn, bjf, hrt, huy, kqd, lhs, lsd, mid, myz, sam, syn, trg, tru, xrm, and arc and syc. jpa and syr (the latter, “Syriac”, being a macrolanguage covering aii and cld; see discussion) are not currently treated as languages.
hy Armenian hyw (Western Armenian) Only the macrolanguage is treated as a language. (discussion)
ay Aymara ayc (Southern Aymara), ayr (Central Aymara) Only the macrolanguage is treated as a language. (discussion)
az Azeri azb (South Azerbaijani), azj (North Azerbaijani), qxq (Qashqai, Kashkay) Only the macrolanguage is treated as a language; dialects, including Afshar and Sonqor, are not. (discussion 1, discussion 2)
sit-bai Bai bca (Central Bai), bfc (Panyi Bai), bfs (Southern Bai), lay (Lama Bai) Not yet discussed.
bal Baluchi bcc (Southern Baluchi/Balochi), bgp (Eastern Baluchi/Balochi), bgn (Western Baluchi/Balochi) Only the macrolanguage is treated as a language. (discussion)
lod Berawan zbc (Central Berawan), zbe (East Berawan), zbw (West Berawan) Only the macrolanguage is treated as a language. (discussion)
ber Berber auj (Awjilah), swn (Sawknah), siz (Siwi), cnu (Chenoua), jbe (Judeo-Berber), shi (Tashelhit), tzm (Central Atlas Tamazight), zgh (Standard Moroccan Tamazight), kab (Kabyle), gha (Ghadamès), jbn (Nafusi), sds (Sened), gho (Ghomara), oua (Tagargrent), tjo (Temacine Tamazight), grr (Taznatit), mzb (Tumzabt), sjs (Senhaja Berber), rif (Tarifit), shy (Tachawit), tia (Tidikelt Tamazight), thv (Tahaggart Tamahaq), ttq (Tawallammat Tamajaq), thz (Tayart Tamajeq), taq (Tamasheq), zen (Zenaga) Only the subdivisions are treated as languages. (discussion 1 and discussion 2, discussion 3, discussion 4)
bik Bikol agk (Isarog Agta), agz (Mount Iriga Agta), atl (Mount Iraya Agta), bcl (Bikol Central) (Central Bikol), bln (Southern Catanduanes Bicolano), bto (Iriga Bicolano), cts (Northern Catanduanes Bicolano), fbl (West Albay Bikol), lbl (Libon Bikol), rbl (Miraya Bikol), ubl (Buhi'non Bikol) Only the subdivisions are treated as languages, the macrolanguage is not. (discussion)
bnc Bontoc rbk (Northern Bontoc), vbk (Southwestern Bontoc), lbk (Central Bontoc), ebk (Eastern Bontoc), obk (Southern Bontoc) Only the subdivisions are treated as languages, the macrolanguage is not. (discussion)
bua Buryat bxm (Mongolian Buriat), bxr (Russian Buriat), bxu (Chinese Buriat) Only the macrolanguage is treated as a language. (discussion 1, discussion 2, discussion 3)
zh Chinese cdo (Eastern Min), cjy (Jin), cmn (Mandarin), cnp (Northern Pinghua), cpx (Puxian Min), csp (Southern Pinghua), czh (Huizhou), czo (Central Min), dng (Dungan), gan (Gan), hak (Hakka), hnm (Hainanese), hsn (Xiang), ltc (Middle Chinese), luh (Leizhou Min), lzh (Classical Chinese), mnp (Northern Min), nan (Southern Min), nan-hbl (Hokkien), nan-tws (Teochew), och (Old Chinese), wuu (Wu), wxa (Waxiang), yue (Cantonese), zho (inclusive code), zhx-sht (Shaozhou Tuhua), zhx-sic (Sichuanese), zhx-tai (Taishanese) Unique treatment: for the purpose of language headers only, only the macrolanguage is treated as a language (superseded discussion; vote), but the romanizations of Standard Mandarin (Hanyu Pinyin) and Standard Cantonese (Jyutping) are allowed as non-lemmas, while the romanization of Hokkien (Pe̍h-ōe-jī) and Dungan terms in Cyrillic script and Xiao'erjing are allowed as lemmas (see this guideline). For all other purposes, the constituent codes are treated as full languages.

Southern Min is treated as a language family, and its members are given exceptional codes.

cr Cree atj (Atikamekw), crj (Southern East Cree), crk (Plains Cree), crl (Northern East Cree), crm (Moose Cree), csw (Swampy Cree), cwd (Woods Cree), moe (Montagnais), nsk (Mansaka) Not yet discussed.
dih Diegueño nai-ipa (Ipai), nai-kum (Kumeyaay), nai-tip (Tipai) Only the subdivisions are treated as languages. (discussion)
dif Dieri dit (Dirari) Only the macrolanguage is treated as a language.
din Dinka dib (South Central Dinka), dik (Southwestern Dinka), dip (Northeastern Dinka), diw (Northwestern Dinka), dks (Southeastern Dinka) Not yet discussed.
doi Dogri dgo (Hindi Dogri), xnr (Kangri) Not yet discussed.
kzh Dongolawi
(= Kenuzi-Dongola)
dgl (Andaandi / Dongolawi), xnz (Kenzi / Mattoki) Only the macrolanguage is treated as a language. (discussion)
ddr Dhudhuroa xjt (Yaitmathang) Only ddr is treated as a language. (discussion)
gmw-ecg East Central German sli (Silesian), sxu (Upper Saxon) Only the macrolanguage is treated as a language. (It also includes Thuringian, Lusatian German, Erzgebirgisch, and High Prussian.) (discussion 1, discussion 2)
en English pld (Polari) Only the macrolanguage is treated as a language. (discussion of pld)
et Estonian vro (Võro) Both the macrolanguage and its subdivision vro are treated as languages, but the macrolanguage code is used in place of the code (ekk) which the ISO gave the standard variety of the language.
evn Evenki evn (Evenki), tuw-sol (Solon) Both Evenki and Solon are treated as languages. (discussion)
ff Fula ffm, fub, fuc, fue, fuf, fuh, fui, fuq, fuv Only the macrolanguage is treated as a language. (discussion)
fr French roa-gal (Gallo), nrf (Norman – see separate entry in this table), frc (Cajun French / Louisiana French), fr-aca (Acadian French) Both the macrolanguage fr and its subdivisions roa-gal and nrf are treated as languages. frc and fr-aca are etymology-only languages (and the former is not to be confused with Louisiana Creole French, which is treated as a full language). (discussion of Acadian French)
gmw-fri Frisian frr (North Frisian), stq (Saterland Frisian), fy (West Frisian) The Frisian languages are treated as separate languages.
kmc Gam doc (Northern Dong), kmc (Southern Dong) Gam is treated as a single language with the code kmc. The ISO had coded two dialects separately, using doc for “Northern Dong” and kmc for “Southern Dong”. “Cao Miao” (cov) is tentatively still treated as a separate language, pending further discussion. (discussion)
cel-gau Gaulish xtg (Transalpine Gaulish), xcg (Cisalpine Gaulish) The two varieties of Gaulish have been merged under the code cel-gau, though they may still be separated in etymologies. (discussion)
gba Gbaya bdt, gbp, gbq, gmm, gso, gya Not yet discussed.
gio Gelao aou (A'ou), giq (Hagei, Green Gelao), gir (Vandu, Red Gelao), giw (Telue, White Gelao, Duoluo), giu (Mulao), gqu (Qau) Not yet discussed.
ka Georgian jge (Judeo-Georgian) Only the macrolanguage is treated as a language. (discussion)
gon Gondi gno (Northern Gondi), ggo (Southern Gondi, retired), esg (Aheri Gondi), wsg (Adilabad Gondi) Only the macrolanguage is treated as a language. (discussion)
grb Grebo gbo, gec, grj, grv, gry, ktj, oub, pye, ted Not yet discussed.
grc Greek, Ancient gkm (Byzantine Greek), gmy (Mycenaean Greek), xmk (Ancient Macedonian) Among the ancient languages, gmy and xmk are treated as separate languages. gkm is an etymology-only language, as are the various ancient dialects (see the family tree table at Category:Ancient Greek language). (discussions of gkm: discussion 1, discussion 2)
el Greek, Modern cpg (Cappadocian Greek), grk-cal (Calabrian Greek), grk-ita (Italiot Greek), grk-mar (Mariupol Greek), tsd (Tsakonian) Among the modern languages, both the macrolanguage and its subdivisions are treated as languages.
gn Guaraní gnw, gug, gui, gun, nhd Not yet discussed.
hai Haida hax (Southern Haida), hdn (Northern Haida) Not yet discussed.
he Hebrew hbo (Biblical Hebrew) Biblical Hebrew does not have an L2 separate from Hebrew; the code he is used for both. In etymologies, however, the two may be distinguished. (See WT:AHE.)
hmn Hmong cqd (Chuanqiandian-cluster Miao), hea (Northern Qiandong Miao), hma (Southern Mashan Hmong), hmc (Central Huishui Hmong), hmd (A-Hmao / Large Flowery Miao), hme (Eastern Huishui Hmong), hmf (Hmong Don), hmg (Southwestern Guiyang Hmong), hmh (Southwestern Huishui Hmong), hmi (Northern Huishui Hmong), hmj (Ge), hml (Luopohe Hmong), hmm (Central Mashan Hmong), hmp (Northern Mashan Hmong), hmq (Eastern Qiandong Miao), hms (Southern Qiandong Miao), hmv (Hmong Do), hmw (Western Mashan Hmong), hmy (Southern Guiyang Hmong), hmz (Hmong Shua), hnj (Hmong Njua), hrm (Horned Miao), huj (Northern Guiyang Hmong), mmr (Western Xiangxi Miao), muq (Eastern Xiangxi Miao), mww (White Hmong), sfm (Small Flowery Miao) Tentatively, all subvarieties are accepted.
The old macrolanguage code blu and the newer macrolanguage code hmn are not used.
cqd (Chuanqiandian-cluster Miao) is an umbrella term for various varieties of Hmong in China. (discussion 1, discussion 2)
huv Huave hve (San Dionisio del Mar Huave), hvv (Santa María del Mar Huave), hue (San Francisco del Mar Huave), huv (San Mateo del Mar Huave) Only the macrolanguage is treated as a language, using the code huv. (discussion)
iu Inuktitut ike (Eastern Canadian Inuktitut), ikt (Western Canadian Inuktitut) Only the macrolanguage is treated as a language. (discussion)
ik Inupiak esi (North Alaskan Inupiatun), esk (Northwest Alaskan Inupiatun) Only the macrolanguage is treated as a language.
ill Iranun ilm, ilp In 2015, the ISO split ill into ilm (the Iranun of Malaysia) and ilp (the Iranun of the Philippines). Wiktionary has not made this split at this time.
kdv Kado zkd (Kadu proper), zkn (Kanan) In 2012, the ISO split kdv into zkd and zkn. Wiktionary has not made this split at this time. (discussion)
kln Kalenjin enb, eyo, niq, oki, pko, sgc, spy, tec, tuy Only the macrolanguage is treated as a language.
kr Kanuri bms (Bilma Kanuri), kau, kbl (Kanembu), kby, knc, krt Only the macrolanguage is treated as a language. (discussion)
kzk Kazukuru drr (Dororo), gli (Guliguli) Only kzk is treated as a language.
kel Kela-Yela kel, yel Kela-Yela is treated as a single language with the code kel. The ISO had coded two dialects separately, using kel for “Kela” and yel for “Yela”.
kca Khanty kca-eas (Eastern Khanty), kca-nor (Northern Khanty), kca-sou (Southern Khanty) Eastern (kca-eas), Northern (kca-nor) and Southern Khanty (kca-sou) are treated as separate languages with exceptional language codes. (discussion)
km Khmer khm, kxm Not yet discussed.
cnk Khumi Chin cek (Eastern Khumi) Only cnk is treated as a language.
ktu Kituba mkw Kituba is treated as a single language with the code ktu. The ISO had coded two dialects separately, using ktu for the variety spoken in the Democratic Republic of the Congo and mkw for the variety spoken in the Republic of the Congo. {discussion)
kv Komi koi, kpv Only the subdivisions are treated as languages. (discussion)
kg Kongo kng (Koongo), kwy (San Salvador Kongo), ldi (Laari), yom (Yombe) Only the macrolanguage is treated as a language. (discussion)
kok Konkani gom, knn Only the macrolanguage is treated as a language.
kpe Kpelle gkp, xpe Not yet discussed.
khi-kun ǃKung mwj (Sekele / Maligo), knw (Ekoka ǃKung), oun (ǃOǃKung), gfx (Mangetti Dune ǃXung) Only the macrolanguage is treated as a language. (discussion 1, discussion 2, discussion 3)
kjn Kunjen olk Kunjen is treated as a single language with the code kjn. The ISO had coded two dialects separately, using kjn for “Uw Oykangand” and olk for “Uw Olkola”, and had not given codes to the varieties Ogh-Undjan, Kawarrangg, or Athima.
ku Kurdish ckb (Central Kurdish), kmr (Northern Kurdish), sdh (Southern Kurdish) Central (ckb), Northern (kmr) and Southern (sdh) are treated as separate languages. (discussion, follow-on discussion.) It has been proposed in 2023 to rename Northern Kurdish to Kurmanji and Central Kurdish to Sorani, but the discussion ended in no consensus.
unn Kurnai ihw (Bidhawal, Birrdhawal) Only the macrolanguage is treated as an individual language.
lah Lahnda hnd, hno, jat, phr, pmu, pnb, skr, xhe Not yet discussed.
lv Latvian ltg (Latgalian) Both the macrolanguage and its subdivision ltg are treated as languages, but the macrolanguage code is used in place of the code (lvs) which the ISO gave the standard variety of the language.
del Lenape
(= Delaware)
umu (Munsee), unm (Unami) Only the subdivisions are treated as languages. (discussion)
lt Lithuanian sgs (Samogitian) Both the macrolanguage and its subdivision sgs are treated as languages.
lrk Loarki gda (Gade Lohar) Only the macrolanguage is treated as an individual language. (discussion)
nds Low German
  • nds-de (German Low German) – wep (Westphalian), frs (East Frisian)
  • nds-nl (Dutch Low Saxon) – act (Achterhoeks), drt (Drents), gos (Gronings), sdz (Sallands), stl (Stellingwerfs), twd (Twents), vel (Veluws)
  • pdt (Plautdietsch)
The code nds is deprecated. nds-de is used for German Low German varieties. nds-nl is used for Dutch Low Saxon varieties. Plautdietsch (pdt) is a separate lect. (discussion of language name, discussion of drt, gos, twd, general discussion 1, general discussion 2, discussion of nds-de, general discussion 3 (permalink))
luy Luhya bxk, ida, lkb, lko, lks, lri, lrm, lsm, lto, lts, lwg, nle, nyd, rag Not yet discussed.
xlu Luwian hlu Luwian is treated as a single language with the code xlu. Luwian is written in two scripts, and the ISO had coded each separately, using xlu for “Cuneiform Luwian” and hlu for “Hieroglyphic Luwian”. (discussion, permalink)
mg Malagasy bhr (Bara Malagasy), bjq, bmm (Northern Betsimisaraka Malagasy), buc (Bushi), bzc (Southern Betsimisaraka Malagasy), mlg, msh, plt, skg, tdx, tkg, txy, xmv, xmw Only the macrolanguage is treated as an individual language. (discussion 1, discussion 2)
ms Malay bjn, btj, bve, bvu, coa, dup, hji, id, jak, jax, kvb, kvr, kxd, lce, lcf, liw, max, meo, mfa, mfb, mhp, min, mqg, msi, mui, orn, ors, pel, pse, tmw, urk, vkk, vkt, xmm, zlm, zmi, zsm The codes zsm, zlm are not used; ms is used instead. The status of the remaining lects remain unclear. (discussion 1, discussion 2, discussion 3, vote 1, vote 2 (Failed votes on unification of Malay and Indonesian), discussion 4, discussion 5)
man Mandingo emk, mku, mlq, mnk, msc, mwk, myq Not yet discussed.
mns Mansi mns-cen (Central Mansi), mns-nor (Northern Mansi), mns-sou (Southern Mansi) Central (mns-cen, including Western and Eastern Mansi), Northern (mns-nor) and Southern Mansi (mns-sou) are treated as separate languages with exceptional language codes. (discussion)
Mantharta djl (Jiwarli (macrolanguage code)), dze (Jiwarli (proper)), iin (Thiin), dhr (Tharrgari), wri (Warriyangga) Jiwarli is treated as a single language with the code djl; the ISO’s split of that code into dze for Jiwarli proper and iin for Thiin has not been followed. However, dhr (Tharrgari) and wri (Warriyangga) have tentatively been retained as languages, rather than being merged, with Jiwarli, into the single language Mantharta. (discussion)
chm Mari mhr (Eastern Mari), mrj (Western Mari) Only the subdivisions are treated as languages, the macrolanguage is not. (discussion 1, discussion 2, discussion 3)
mwr Marwari dhd, mtr (Mewari), mve, rwr, swv, wry Only the macrolanguage Marwari (mwr) and Mewari (mtr) are treated as languages. (discussion 1, discussion 2)
mnt Maykulan wnn (Wunumara), xyj (Mayi-Yapi), xyk (Mayi-Kulan), xyt (Mayi-Thakurti) Only the macrolanguage is treated as a language. (discussion)
mn Mongolian khk (Khalkha Mongolian), mvf (Peripheral Mongolian) Only the code mn is used for Mongolian; khk is redundant to it and mvf is not usable. Note that Kalmyk (xal) and Buryat (see its entry in this table), which some scholars consider dialects of Mongolian, are treated as independent languages on Wiktionary. (discussion)
mjg Monguor mjg-mgr (Mangghuer), mjg-mgl (Mongghul) Only the subdivisions are treated as languages, the macrolanguage is not. (discussion)
nbf Na nru (Narua), nxq (Naxi) Only the subdivisions are treated as languages, the macrolanguage is not.
nah Nahuatl azd (Eastern Durango Nahuatl), azn (Western Durango Nahuatl), azz (Highland Puebla Nahuatl), naz (Coatepec Nahuatl), nch (Central Huasteca Nahuatl), nci (Classical Nahuatl), ncj (Northern Puebla Nahuatl), ncl (Michoacán Nahuatl), ncx (Central Puebla Nahuatl), ngu (Guerrero Nahuatl), nhc (Tabasco Nahuatl), nhe (Eastern Huasteca Nahuatl), nhg (Tetelcingo Nahuatl), nhi (Zacatlán-Ahuacatlán-Tepetzintla Nahuatl), nhk (Cosoleacaque Nahuatl), nhm (Morelos Nahuatl), nhn (Central Nahuatl), nhp (Pajapan Nahuatl), nhq (Huaxcaleca Nahuatl), nht (Ometepec Nahuatl), nhv (Temascaltepec Nahuatl), nhw (Western Huasteca Nahuatl), nhx (Mecayapan Nahuatl), nhy (Northern Oaxaca Nahuatl), nhz (Santa María La Alta Nahuatl), nlv (Orizaba Nahuatl), nuz (Tlamacazapa Nahuatl), ppl (Pipil), xpo (Pochutec) The macrolanguage code nah is no longer used for entries; its subdivisions are treated as languages. A number of nah translations remain and require to be assigned to the appropriate Nahuatl language. (discussion)
mff Naki jms (Mashi), buz (Bukwen) Only the macrolanguage is treated as a language. (discussion)
Nambu nmx (Nama), nkm (Namat), ncm (Nambo), mxw (Namo, Dorro), nex (Neme), nqn (Nen) Not yet discussed
Nenets yrk (Tundra Nenets), syd-fne (Forest Nenets) Only the subdivisions are treated as languages, the macrolanguage is not. (The code the ISO gave the macrolanguage, yrk, is used for Tundra Nenets.) (discussion) “Yurats” (rts), a barely-attested extinct lect which was probably just a variety of Enets, or a transitional variety between Enets and the Nenets varieties, is excluded for now. (discussion)
ne Nepali npi Only the code ne is used.
Ngbandi ngb, deq, ... deq is not included.
nrf Norman roa-nor (Norman), nrf-grn (Guernésiais), nrf-jer (Jèrriais) Only the macrolanguage Norman is treated as a language, and the code nrf (which the ISO assigned to Guernésiais and Jèrriais) is used for all varieties of it, which had previously been granted separate exceptional codes. Compare fr. (discussion)
no Norwegian nb, nn In practice, both the macrolanguage and its subdivisions are treated as languages. There has been discussion of either treating only the macrolanguage as a language, or of only treating the subdivisions as languages, but there is no consensus about which of these to do. (discussion 1, discussion 2, discussion 3, discussion 4, discussion 5, discussion 6, stalemated vote)
nkt Nyika nkt, nkv Nyika is treated as a single language with the code nkt. The ISO had coded two regional varieties separately, using nkt for the Nyika of Tanzania and nkv for the Nyika of Malawi and Zambia.
oc Occitan prv (Provençal), sdt (Shuadit) Only the macrolanguage is treated as a language. prv is an etymology-only language.
or Odia ori, ory, ort (Adivasi Odia), dso (Desiya), spv (Sambalpuri, Kosli, Kosali) Neither of the three-letter codes the ISO assigned to Odia proper (ori and ory) is used; instead, only the two-letter code or is used. Of the three other codes the ISO assigned to Odia varieties, spv is currently not used, while dso and ort are.
oj Ojibwe ciw, ojb (Northwestern Ojibwa), ojc (Central Ojibwa), ojg (Eastern Ojibwa), ojs (Severn Ojibwa), ojw (Western Ojibwa), otw (Ottawa) Both the macrolanguage and its subdivisions are treated as languages, except that the code ciw is not used, having been merged into oj.
fro Old French xno (Anglo-Norman), zrp (Judeo-French) Only the macrolanguage is treated as a language. xno is an etymology-only language. (discussion)
mgx Omati-Mini jbk (Barikewa), jmw (Mouwase) Only the subdivisions are treated as languages, the macrolanguage is not.
om Oromo gax, gaz, hae (Eastern / Ittu / Qottu / Harar Oromo), orc Only the macrolanguage is treated as a language. (discussion)
kpp Paku Karen jkp (Paku Karen), jkm (Mobwa Karen) In 2012, the ISO split kpp into jkp and jkm. Wiktionary has not made this split at this time. (discussion)
ps Pashto pbt (Southern Pashto), pbu (Northern Pashto), pst (Central Pashto), wne (Waneci) Only the macrolanguage ps and the variety wne (Waneci) are treated as languages. (discussion of pbt, pbu, pst)
fa Persian
(= Farsi)
aiq (Aimaq), bhh (Bukhari), deh (Dehwari), haz (Hazaragi), jdt (Judeo-Tat), jpr (Judeo-Persian), pes (Western Persian), phv (Pahlavani), prd, prp, prs (Dari), tg (Tajik), ttt (Tat) Persian (fa), Tajik (tg), Judeo-Persian (jpr), Bukhari (bhh), Judeo-Tat (jdt), Tat (ttt) and Hazaragi (haz) are treated as separate languages. Western Persian (pes), Eastern Persian / Dari (prs) and Aimaq (aiq) are subsumed into fa. “Parsi” (prp) and “Parsi-Dari” (prd) are spurious and are excluded. The status of deh and phv is unresolved. (discussion of Tajik, of Judeo-Persian and Bukhari, and of Tat, discussion of pes, prs, aiq, haz, deh and phv)
pih Pitcairn-Norfolk cpe-pit (Pitcairn), cpe-nor (Norfolk) For a time, Wiktionary split Pitcairn-Norfolk into two varieties, granting each an exceptional code: Pitcairn was cpe-pit, Norfolk was cpe-nor. That split has been undone; only pih is now treated as a language. (discussion 1, discussion 2)
pl Polish csb (Kashubian), zlw-pom (Pomeranian), zlw-slv (Slovincian) In practice, both the macrolanguage and its subdivisions are treated as languages, although this has not been discussed.
pua Purepecha tsz Purepecha is treated as a single language with the code pua. The ISO had coded two dialects separately, using pua for “Western Purepecha” and tsz for “Eastern Purepecha”. (discussion)
qu Quechua cqu, inb, inj, qub, qud, quf, qug, quh, quk, qul, qup, qur, qus, quw, qux, quy, quz, qva, qvc, qve, qvh, qvi, qvj, qvl, qvm, qvn, qvo, qvp, qvs, qvw, qvz, qwa, qwc, qwh, qws, qxa, qxc, qxh, qxl, qxn, qxo, qxp, qxr, qxt, qxu, qxw, qwe-kch Only the macrolanguage Quechua (qu), Inga (inb), Jungle Inga (inj), Classical Quechua (qwc) and the standardized variety Kichwa (qwe-kch) are treated as languages. (discussion of Kichwa) NOTE: This is subject to change. There was a recent 2023 discussion about splitting Quechua that ended in a general consensus to split somehow, but no consensus on how to split.
raj Rajasthani bgq, gda, gju, hoj, mup, wbr Not yet discussed.
rom Romani rmc, rmf, rml, rmn, rmo (Sinte Romani), rmw, rmy Only the macrolanguage rom and the subdivision rmo have L2 headers, but all of the subdivisions can have nested lines in translations tables. (discussion)
ro Romanian mo (Moldavian) Only the macrolanguage (ro) is treated as a language. (vote)
jya rGyalrong sit-sit (Situ), sit-jap (Japhug), sit-tsh (Tshobdun), sit-zbu (Zbu) Only the subdivisions are treated as languages, the macrolanguage is not. (discussion)
rue Rusyn rsk (Pannonian Rusyn) Rusyn is treated as two separate languages: Carpathian Rusyn (rue) and Pannonian Rusyn (rsk). The ISO originally only had the former code, which was understood to comprise both Rusyn languages. The latter code was approved on 2022. The official designations for these codes are somewhat ambiguous, but are usually understood as rue for Carpathian Rusyn and rsk for Pannonian Rusyn. The English Wiktionary follows this interpretation. (discussion)
rw Rwanda-Rundi rn (Rundi proper), haq (Ha, Giha), suj (Shubi), han (Hangaza), vin (Vinza) Rwanda-Rundi is treated as a single language with the code rw. The ISO had coded the dialects separately, using rw for “(Kinya)rwanda” and rn for “(Ki)rundi”, etc. (discussion)
nai-spt Sahaptin uma, waa, yak, tqn Both the macrolanguage and its subdivisions are treated as languages.
sc Sardinian sdc (Sassarese), sdn (Gallurese), src (Logudorese), sro (Campidanese) Sassarese (sdc) and Gallurese (sdn) are treated as separate languages; Logudorese (src) and Campidanese (sro) are not. (discussion 1, discussion 2)
seh Sena bwg (Barwe), swk (Malawi Sena) Only the macrolanguage and the Barwe variety are treated as languages, while Malawi Sena is not.
sel Selkup sel-nor (Northern Selkup), sel-sou (Southern Selkup, including Central Selkup) Northern (sel-nor) and Southern Selkup (sel-sou, including Central Selkup) are treated as separate languages with exceptional language codes. (discussion)
sh Serbo-Croatian bs (Bosnian), hr (Croatian), sr (Serbian), cnr (Montenegrin), kjv (Kajkavian) Only the macrolanguage is treated as a language. (See discussion 1, discussion 2, discussion 3, discussion 4, discussion 5, this discussion from late 2023, more discussion, this old vote, this discussion of kjv, and many other discussions.)
kqu Seroa gku In 2014, the ISO restricted kqu to ǁKuǁe and split off gku ǂUngkue / ǂKunkwe. Wiktionary has not made this split at this time. (discussion)
den Slavey scs, xsl Not yet discussed.
sw Swahili sta, swc, swh, bnt-cmw (Chimwiini) Only the macrolanguage and Chimwiini bnt-cmw are treated as languages. (discussion of sta)
syr Syriac See the entry for “Aramaic”.
tl Tagalog fil (Filipino) Only the macrolanguage (tl) is treated as a language. (vote)
tmh Tamashek taq, thv, thz, ttq Not yet discussed.
xtz Tasmanian xpb (Pyemmairre), xpd (Paredarerme), xpf (Southeast Tasmanian), xph (Tyerrernotepanner), xpl (Port Sorell), xpv (Tommeginne), xpw (Peerapper), xpx (Toogee), xpz (Bruny Island), aus-lsw (Little Swanport) There is no single "Tasmanian language"; it is a geographic grouping of an unknown number of possibly unrelated families. See this 2024 discussion.
bo Tibetan bod (Lhasa, Ü, Dbus), adx (Amdo, Panang), khg (Khams), kbg (Khamba), tsk (Tseku), dre (Dolpo), hut (Humla, Limi), lhm (Lhomi, Shing Saapa), muk (Mugom, Mugu), kte (Nubri), ola (Walungge, Gola), thw (Thudam), loy (Lowa, Loke, Mustang), tcn (Tichurong) Only the macrolanguage is treated as a language, as well as xct (Classical Tibetan) and otb (Old Tibetan). Other Tibetic languages, such as Dzongkha (dz), are separate.
tid Tidong itd, ntd Only the subdivisions are treated as languages, the macrolanguage is not.
uz Uzbek uzn, uzs Only the macrolanguage is treated as a language.
xwk Wangkumara xpt (Punthamara), eaa (Karenggapa) Only the macrolanguage Wangkumara (xwk) is treated as a language, and it is treated as a language, rather than as a dialect of Ngura (which the ISO used to consider a single language with the code nbx).
xww Wemba-Wemba rbp (Baraba-Baraba), rnr (Nari-Nari), weg (Wergaia), xwt (Wotjobaluk) Only the macrolanguage is treated as a language. (discussion)
wnw Wintu nol (Nomlaki), pwi (Patwin), wnw (Wintu) Both the macrolanguage and its subdivisions are treated as languages. (discussion)
yxl Yarli wdk, yga Only yxl is treated as a language.
yen Yendang ynq (Yendang proper), yot (Yotti) Only the macrolanguage is treated as a language. (discussion)
yi Yiddish ydd, yih Only the macrolanguage is treated as a language. (discussion)
yiy Yir-Yoront yyr (Yir-Yoront), yrm (Yirrk-Mel / Yirrk-Thangalkl) Only the macrolanguage is treated as a language. (discussion)
yok Yokuts yok-ply (Palewyami), yok-bvy (Buena Vista Yokuts), yok-tky (Tule-Kaweah Yokuts), yok-kry (Kings River Yokuts), yok-gsy (Gashowu), yok-svy (Southern Valley Yokuts, including Yawelmani), yok-nvy (Northern Valley Yokuts, including Chukchansi), yok-dly (Delta Yokuts) Only the subdivisions are treated as languages, the macrolanguage is not.
zap Zapotec zaa, zab, zac, zad, zae, zaf, zai, zam, zao, zaq, zar, zas, zat, zav, zaw, zax, zca, zoo, zpa, zpb, zpc, zpd, zpe, zpf, zpg, zph, zpi, zpj, zpk, zpl, zpm, zpn, zpo, zpp, zpq, zpr, zps, zpt, zpu, zpv, zpw, zpx, zpy, zpz, zsr, zte, ztg, ztl, ztm, ztn, ztp, ztq, zts, ztt, ztu, ztx, zty Not yet discussed.
zza Zazaki diq, kiu Only the macrolanguage is treated as a language.
za Zhuang zch (Central Hongshuihe Zhuang), zeh (Eastern Hongshuihe Zhuang), zgb (Guibei Zhuang), zgm (Minz Zhuang), zgn (Guibian Zhuang), zhd (Dai Zhuang), zhn (Nong Zhuang), zlj (Liujiang Zhuang), zln (Lianshan Zhuang), zlq (Liuqian Zhuang), zqe (Qiubei Zhuang), zyb (Yongbei Zhuang), zyg (Yang Zhuang), zyj (Youjiang Zhuang), zyn (Yongnan Zhuang), zzj (Zuojiang Zhuang) Only the macrolanguage is treated as a language. (discussion)

Excluded codes

The following ISO 639-3 codes have been excluded without being subsumed into a single other code:

  • bpw (“Bo”, “Po”, “Sorimi”) is excluded for now because its existence as a distinct language is unconfirmed and undocumented, and if it were included, a naming conflict would exist with bgl (discussion).
  • bzt (“Brithenig”), a minor constructed language (see WT:CFI § Constructed languages).
  • dgu (“Degaru/Dhekaru”), spurious according to Ethnologue (see w:Dhekaru); it’s a caste, not a language.
  • dws (“Dutton World Speedwords”), a minor constructed language (see WT:CFI § Constructed languages).
  • ekc, called “Eastern Karnic” by the ISO, is excluded from Wiktionary because it is not clear that there is any data for this “language”, or even that it is a single real language.
  • ghc, called “Hiberno-Scottish Gaelic” by the ISO, is treated as Irish (ga) or Scottish (gd) according to whether the word in question is from an Irish text, a Scottish one, or both (discussion).
  • gok (“Gowli”), spurious according to Glottolog (see w:Spurious languages); it’s a caste, not a language.
  • jrt (“Jorto”), spurious (discussion).
  • luw (“Luo”), an unclassified extinct language, is excluded for now because no includable content in it has been put forth, and if it were included, a naming conflict would exist with luo.
  • myi (“Mina (India)”) is excluded because it is spurious. Furthermore, if it were included, a naming conflict would exist with hna.
  • vms (“Moksela”) is extinct and unattested (see Moksela language).
  • wai (“Wares”), spurious (discussion).
  • wwb (“Wakabunga”), an unclassified extinct language, is excluded for now because no includable content in it exists: the only wordlist which was labelled Wakabunga turned out to be Kalkatungu.
  • yrs (“Yarsun”), spurious (discussion).
  • zba ("Balaibalan"), a minor constructed language (brief discussion).
  • zbl (“Blissymbols”, “Blissymbolics”, “Semantography”), a minor constructed language / script (see WT:CFI § Constructed languages).
  • zxx, the code for “No linguistic content”.