Wiktionary:About Albanian

This is a Wiktionary policy, guideline or common practices page. Specifically it is a policy think tank, working to develop a formal policy.

Policies – Entries: CFI - EL - NORM - NPOV - QUOTE - REDIR - DELETE. Languages: LT - AXX. Others: BLOCK - BOTS - VOTES.

Shortcut:
WT:ASQ

See also: Entry layout explained and Category:Albanian language

Creating Albanian entries

See also WT:ELE.

Etymologies of Slavic loans

See also WT:ETYM.

Albanian contains many Slavic loans. Due to various factors, including identical forms in several Slavic languages, poor or nonexistent dialectological studies in both Slavic and Albanian, the scarcity of written evidence before the 16th century, and highly variable time depth, it is not always possible to pinpoint the exact donor language. At best, it can be said that the loan is South Slavic; beyond that, nothing is certain. The preferred way of formatting etymologies of such words is as follows (adopted from the etymology of brazdë):

===Etymology===
{{bor+|sq|zls}}, compare {{cog|mk|бразда}}, {{cog|sh|brázda//бра́зда}}.

In this case, all South Slavic languages have the exact same form as the potential source word. Macedonian and Serbo-Croatian, the two closest neighboring South Slavic languages, are given for comparison.

Definiteness of given names

Given names are to be lemmatised as indefinite whenever possible with the exception of female given names ending in -a to reflect the tendency to refer to these names chiefly in their definite form. So for example, Agim is lemmatised at its indefinite form while Drita at its definite form.

Inflected forms

Inflected forms should not have an etymology section, or have it simply as {{nl}} in cases of homography with other terms. The definition should be built with {{infl of}}. Please avoid using genitive in favour of dative, passive, reflexive or middle in favour of mediopassive and past simple or past historic in favour of aorist. Do not specify a form as vocative if it is also nominative, and do not specify a form as singular or plural if the lemma is a singularia or pluralia tantum or a proper noun.

Historical orthographies

Albanian has been written in many orthographies and different scripts. When creating an entry it is preferrable, whenever possible and deemed appropriate to do so in the particular case, for the term be converted into the modern orthography while keeping the form written in the alternative orthography as an alternative spelling at best, whenever considered particularly valuable, or simply left unmentioned. Quotes, however, should always be given in their original script, with the modern orthography given as the normalisation parameter.

Latin script

Catholic orthography

Early authors of the Counter-Reformation as well as later Italian missionaries, albeit with their differences, used more or less the same orthography. Due to poor support by Unicode for this orthography, we have to deal with some compromises.

The character which alone stands for /ð/ and double for /θ/, which looks like some sort of barred zed, should be encoded as the Greek ⟨Ξξ⟩, despite it not looking like it.
The letter ⟨Ɛɛ⟩ standing for /z/ should be encoded as ɛ U+025B LATIN SMALL LETTER OPEN E and Ɛ U+0190 LATIN CAPITAL LETTER OPEN E, not as a Greek epsilon or a Cyrillic reverse ze.
The letter ⟨Ȣȣ⟩ should be encoded as ȣ U+0223 LATIN SMALL LETTER OU and Ȣ U+0222 LATIN CAPITAL LETTER OU.
The Serbian ⟨Ћћ⟩ used by Buzuku should be encoded at its Cyrillic codepoint, not as the Latin pharyngeal.
What may appear as ⟨ÿ⟩ is actually most likely ⟨ij⟩ and should be encoded as such.
Long s ⟨ſ⟩ and r rotunda ⟨ꝛ⟩ should be encoded as simple ⟨s⟩ and ⟨r⟩, and the same should be done about scribal variants and ligatures of this nature.
The distinction between ⟨Uu⟩ and ⟨Vv⟩ should be inserted wherever it is possible to assume it. The letters ⟨Ii⟩ and ⟨Jj⟩, even wherever mere variants of eachother, should be encoded distinctly, depending on how they appear.
Whether or not to resolve the scribal abbreviation ⟨ꝑ⟩ as ⟨per⟩ (with ⟨e⟩ being ambiguous for both /e/ and /ə/), or not may remain ambiguous for now. If it indeed always stands for ⟨per⟩, it would make sense to encode it as the three separate letters. If however it is discovered to be ambiguous for any ⟨pVr⟩ sequence, the preferred conclusion would be to keep the abbreviation as is.

Other

Girolamo De Rada, and after him all the poets of the "Calabrian school", used a Latin alphabet enriched with many Greek letters. Things to be wary of about the encoding of texts in this orthography is the fact that the script (i.e. either Latin or Greek) of the accent, be it acute or grave, should match the script of the vowel it is on. Note also how some letters which may look the same when uppercase, e.g. ⟨Ζ⟩, the zeta, and ⟨Z⟩, the zed, must be encoded accordingly to what sound they are supposed to stand for, despite looking identical both in the printed source and here in the project.

Many obsolete Latin-script orthographies which were not tied to any particular orthographical tradition often opted to use a lot of diacritics and other visually questionable approaches. Despite how ugly they look, they should be fairly straight-forward to encode.

Greek script

The Greek script enjoyed (and reportedly still enjoys, although in a much smaller scale) great use to write Albanian. As far as Albanian is concerned, the code Grek is used exclusively, regardless of the text being monotonic or polytonic. Some remarks about encoding:

The vowel /ə/ is often expressed as an epsilon with a tilde or a horizontal line below. This is preferrably encoded as ⟨Ε̰ε̰⟩, i.e. an epsilon + ◌̰ U+0330 COMBINING TILDE BELOW. The combining tilde should be employed even when it looks like a horizontal line.
Earlier, Theodore Kavalliotis and his student Daniel Moscopolites used ⟨ᾼᾳ⟩ for the schwa. It should be encoded as a simple alpha with iota subscript.
Kostandin Kristoforidhi in his 1904 Albanian–Greek dictionary used ⟨Ε̯ε̯⟩ for the schwa. It should be encoded as an epsilon + ◌̯ U+032F COMBINING INVERTED BREVE BELOW.
The overdot was occasionally used on ⟨π̇ τ̇ δ̇ γ̇ χ̇ λ̇⟩ etc. It is encoded as ◌̇ U+0307 COMBINING DOT ABOVE.
The letter ⟨σ̈⟩ should be encoded as a sigma + ◌̈ U+0308 COMBINING DIAERESIS.
Latin letters ⟨Bb Dd Ee⟩ are occasionally used in the Greek script for Albanian, they should be encoded simply at their Latin codepoints.
1. Whenever the Cyrillic ⟨Б⟩ is employed for the capital ⟨b⟩, to distinguish it from the capital ⟨β⟩, the Cyrillic codepoint should be used for the capital form only.
2. Whenever a capital ⟨Β⟩ or ⟨Ε⟩ is visually ambiguous, it should be encoded at its Greek or Latin codepoints depending on which sound we can assume it is supposed to stand for.
Note however that ⟨Ϳϳ⟩ have their own unicode codeponts at ϳ U+03F3 GREEK LETTER YOT and Ϳ U+037F GREEK CAPITAL LETTER YOT, and should hence be encoded as such, not as the Latin Jj.
The ligatures ⟨ȣ⟩ and ⟨ϛ⟩ should be untied into ⟨ου⟩ and ⟨στ⟩ respectively. The same should be applied to other ligatures of this sort.

For the correct display of the Greek-script quotes, especially of diacritics, which may not line up correctly in standard pre-installed fonts, we advise to install a font with proper Greek support, such as Gentium Plus.

Perso-Arabic script

The Perso-Arabic script was used extensively by the bejtexhinj, Islamic Albanian poets. For the encoding, follow the guidelines for Ottoman Turkish. The script code used is, at it stands, ota-Arab.

Elbasan script

English Wikipedia has an article on:

Elbasan alphabet

Wikipedia

The code is Elba. No normalisation is needed as the transliteration automatically generated is enough. If the text doesn't appear correctly, we suggest to install Noto Sans Elbasan. To type it with ease you can use this virtual keyboard.

Normalisation

For the normalisation, specified as the |norm= parameter in quotations, note the following guidelines:

It is not a literal transliteration, nor a translation into the modern language, but rather a transcription in the modern orthography. To clarify, think not "what did the author write", but rather "what sounds was the author most likely trying to express, given what he wrote".
When a vowel is written twice in the orginal orthography (including ⟨ij⟩ whenever the ⟨j⟩ is simply a visual variant of ⟨i⟩), the normalisation should write the vowel only once with a macron, ˉ, on it. Note that vowels which we know must have been long but are not spelled twice should not use the macron.
The circumflex, ˆ, should be used whenever a nasal vowel is explicitely noted, in a way or another, in the original orthography. Like the macron of the note above, vowels which we know must have been nasal but are not explicitely written so (i.e. most of the time), should not use the circumflex.
Schwas, or any other similarly lost phonemes, should not be inserted in the normalisation wherever they are not actually written in the original text.
The sound pair /l/ and /ʎ/ found in many Tosk dialects, should usually be normalised as ⟨ll⟩ and ⟨l⟩ respectively. There may be some dialects in which this adaptation would make little sense, so the final choice is ultimately left to the editor.