User talk:Isomorphyc
Add topicThe adjective is derived specifically from the participle. Technically, participles are adjectives, and kind of act like lemmas in their own right. They can have new words formed from them independent of the rest of the verb. This is one of the reasons why I like the part-of-speech header "Participle", rather than labelling them "Verb" like English does. —CodeCat 18:08, 25 May 2016 (UTC)
- Hi @CodeCat:
- I appreciate your stopping by. My sense is the vowel change from [a] in praeditus is made necessary by the existing PP for praedor, but it also felicitously shares the vowel and most of its spelling with dīs, the contraction of dīves so that there is a bit of a pun with a spurious non-reduplicated PP of dītō, that is *dītus rather than dītātus, that helped a few of these senses become conflated.
- To my mind the participle arrangement for Latin here is good and bad. We have any number of examples like vigilāns, sublātus, and temperāns with both adjective and participle when the line between them ought to be much blurrier. I would have preferred if the participle nominatives had also been non-lemma forms, because the meaning inheres in the verb at least 95% of the time, and the bot-generated sematics can drive one crazy, but are beyond human scale to correct beyond egregious cases. Not that I would propose a change, however.
- Could you please clarify a few things?
- You removed my sense markers for the -or and ēs suffixes because there is only one sense in each. Is it not necessary to distinguish between Etymology 1 and Etymology 2 because the other etymology is non-lemma?
- For the genitive stem for -ō and PP/supine stem for -tō: I was trying to be friendly, but you are right it is too verbose. Increasingly I find all the detail you are giving in the stem entries obviates the need to gloss so much.
- For tiō/-atiō in obsecrātiō and others: I don't understand how this is different from ōrō/ōrāre/ōrātum -> ōrātiō. I'm sceptial this suffix should exist at all if it is just a surface form of -tiō for first conjugation. Perhaps I misunderstand. I agree with all of your other changes. I was so sloppy in a enough cases that it is a bit mortifying.
- Thank you for all of your help!
- Isomorphyc (talk) 19:22, 25 May 2016 (UTC)
- The
idN=
parameters on{{affix}}
also change the name of the category. So they should be used when the category needs to be disambiguated, like for -ō. When there is only one suffix at a given entry, there's no need for disambiguation. - I agree with you on ōrātiō, so I've changed that to -tiō too. The suffix -ātiō, if it is indeed a separate suffix the way -ēscō is, was very clearly formed from the stem suffix -ā- of the first conjugation, and the more basic suffix -tiō. So just as I converted all cases of -ēscō to -scō where they derive from second conjugation verbs, I think we should also convert all cases of -ātiō from first conjugation verbs to -tīo. Once that's done, we can look at what remains, to evaluate whether there is actually a distinct -ātiō or not.
- I hope I'm not annoying you too much with my "corrections". Your work is very much valued so far! —CodeCat 20:08, 25 May 2016 (UTC)
- I forgot to reply to the first part of your message. The change of a to i in praeditus is from the Old Latin vowel reduction, so it's a regular sound change and it also attests to the relative age of the word: according to Wikipedia, the initial-syllable stress which triggered the reduction remained until 250 BCE. So words which show reduction must be older, or must be analogical.
- I agree that the distinction between participle and adjective is kind of artificial at times. My own preferred arrangement would be to list them all under either participle or adjective headers, and have the first sense line be "participle of x" or similar. If there are additional senses that do not follow from the senses of the verb, those can be added as additional senses. If we want to adopt that practice, we may want to ask other (Latin) editors about it first. —CodeCat 20:21, 25 May 2016 (UTC)
- Hello @CodeCat:, your help and corrections are not at all annoying, but indeed very welcome. I would only be willing to make much more minor changes if I did not know you were watching. Thank you for moving the other three items from the -ātiō list. I moved everything that needed less than a second of thought. I also agree about penetrabilis. I linked it, as in the past, because I wanted to link to the existing stem that is the closest match, but I do think this is not a helpful surface form.
- In the participle/adjective scheme you are suggesting, would the first step be separating out the hand-edited participle entries from the unaltered machine-derived text? I'd certainly be willing to support some kind of renovation if you wanted to advocate it in public, though it is not something I would want to propose on my own at this time.
- I am surprised the word praeditus exists recognisably in Ennius-and-earlier Latin in a way that can be distinguished from praedatus-- I also admit I have trouble trying to get a feel for optimality type issues in languages that have both stress vowel length distinctions-- though if the stress rule is simple, like initial-stresss, then it doesn't add a lot of information and so doesn't confuse the vowel quantities, which are so necessary for keeping the inflectional house of cards from falling apart, as it did in proto-Romantic.
- Isomorphyc (talk) 03:18, 26 May 2016 (UTC)
- The
Disambiguation
[edit]Please only provide the disambiguation parameter for categories where there are multiple homonymous affixes. There is only one -ārium suffix. —CodeCat 19:51, 26 May 2016 (UTC)
- @CodeCat:I created a second, and I was going to ask you about it before I populated it. It is clearly different from `place of,' because it is a substantive of the adjective. But that's also not exactly an affix relationship. I wanted to correctly categorise the following nouns which I believe are not or may not be `place of': millarium, sudarium, albarium, fustarium, probably cellarium, alvarium, possibly vaporarium, caldarium. I didn't edit those etymologies because I wanted to ask you if it was an appropriate second sense first. I'm not really satisfied with it. I'm about halfway through the etymologies which are obviously `place of.' Thanks Isomorphyc (talk) 19:59, 26 May 2016 (UTC)
- Or, am I misunderstanding what you mean. I was confused when it turned out to be inappropriate to use the senseids to disambiguate two homonymous stems under different etymology headings each with one sense. This is the error that I have been making which confuses me most. Isomorphyc (talk) 20:04, 26 May 2016 (UTC)
- The substantive of the adjective can't really be -ārium, because the adjective itself is already -ārius, which has the same stem. There's no suffixation involved when creating the noun from the adjective. So it shouldn't really have its own category. We don't currently have any category system set up for derivations which change the part of speech without changing the stem. It could be useful for many languages, like in English where this kind of derivation is very common.
- I'm not sure what you are confused about though. Can you give an example that confused you? —CodeCat 20:05, 26 May 2016 (UTC)
- Perhaps I will express these etymologies: Substantive of -arius adjective X from noun Y, possibly because Z. I'm not trying to create non-orthogonal categories, though I realise I could create more templates to do that; I just don't really know what the axes are.
- For the sense ids: I have used them four times, and three were not quite right. Aside from the current instance, the two which I do not understand are -ēs and -or. Is it unnecessary to label because there is only one etymology which is a lemma, and therefore the reference is obvious? Also, an etiquette question: under what circumstances is the `reply to' template on talk pages generally used? Isomorphyc (talk) 20:26, 26 May 2016 (UTC)
- "Is it unnecessary to label because there is only one etymology which is a lemma, and therefore the reference is obvious?" Yes, pretty much. The id is only needed if we want multiple categories to separate different affixes that coincide in form.
- When nouns are substantivised from adjectives, we usually just write "Substantivisation of X" as the etymology.
- Some people add reply to to talk pages, others don't. I don't think there's much of an etiquette about it yet. —CodeCat 20:32, 26 May 2016 (UTC)
- I'm going to move my second sense to usage notes for the time being to avoid the id if possible. However, I suspect I will be advocating a two-sense entry when all of this class of etymologies are labelled, because all -arium is Substantivisation of X, and while it is customary to regard it as the suffix for Substantivisation of Place, because about 80% of this class is this way, it is helpful to give a proper sense to non-place substantivisations to avoid confusion when the other sense is labelled. I think this is especially helpful because when a suffix is highly productive, as -arium for place is, one needs to know why certain spaces are `reserved,' that is, why sudarium can't be used productively to mean `sauna' because it already means `handkerchief.' This is quite similar to the two uses of -tus (the one which derives from PI *-tos), in which one usage is infinitely productive (the participle process) while the other is limited to three noun/adjective pairs which lost their intermediate verbs. But I had hoped to have a list of non-place -arium words at this hour, and I do not yet. Isomorphyc (talk) 00:41, 27 May 2016 (UTC)
- As is the case with -ēscō, it may be that the suffix -ārium has been used to derive nouns directly, without any intermediate adjective. In that case, it can certainly have a category and an entry. However, there is only one -ārium suffix, so disambiguation isn't necessary. —CodeCat 12:44, 27 May 2016 (UTC)
- I think I see what I am misunderstanding. I am apologise I am so dense about this. One needs to disambiguate -tus not because there are two senses in Etymology 1, but instead because there are three Etymology headings. The fact that the disambiguation template happens to highlight the first sense is irrelevant, what is really being linked is the appropriate etymology of the suffix, not the appropriate sense. I can differentiate the sense I intend within a given etymology simply by glossing appropriately. Am I understanding now? Thanks. Isomorphyc (talk) 12:57, 27 May 2016 (UTC)
- It's really much simpler. If we want two categories, then we use
idN=
to keep them apart. If we want just one, we don't. —CodeCat 13:05, 27 May 2016 (UTC)- I don't know what it is about this, but I simply don't get it, because I don't know when to want two categories. I'm going to do this: I'll stay away from making new categories until I figure it out. Isomorphyc (talk) 13:08, 27 May 2016 (UTC)
- It's really much simpler. If we want two categories, then we use
- I think I see what I am misunderstanding. I am apologise I am so dense about this. One needs to disambiguate -tus not because there are two senses in Etymology 1, but instead because there are three Etymology headings. The fact that the disambiguation template happens to highlight the first sense is irrelevant, what is really being linked is the appropriate etymology of the suffix, not the appropriate sense. I can differentiate the sense I intend within a given etymology simply by glossing appropriately. Am I understanding now? Thanks. Isomorphyc (talk) 12:57, 27 May 2016 (UTC)
- As is the case with -ēscō, it may be that the suffix -ārium has been used to derive nouns directly, without any intermediate adjective. In that case, it can certainly have a category and an entry. However, there is only one -ārium suffix, so disambiguation isn't necessary. —CodeCat 12:44, 27 May 2016 (UTC)
- I'm going to move my second sense to usage notes for the time being to avoid the id if possible. However, I suspect I will be advocating a two-sense entry when all of this class of etymologies are labelled, because all -arium is Substantivisation of X, and while it is customary to regard it as the suffix for Substantivisation of Place, because about 80% of this class is this way, it is helpful to give a proper sense to non-place substantivisations to avoid confusion when the other sense is labelled. I think this is especially helpful because when a suffix is highly productive, as -arium for place is, one needs to know why certain spaces are `reserved,' that is, why sudarium can't be used productively to mean `sauna' because it already means `handkerchief.' This is quite similar to the two uses of -tus (the one which derives from PI *-tos), in which one usage is infinitely productive (the participle process) while the other is limited to three noun/adjective pairs which lost their intermediate verbs. But I had hoped to have a list of non-place -arium words at this hour, and I do not yet. Isomorphyc (talk) 00:41, 27 May 2016 (UTC)
- Hi @CodeCat, sorry to bother you. Would be it correct if I wanted to make a new senseid to link to the second etymology of mānus (not `hand', but Old Latin for `good') from the etymology of immānis? Thanks, Isomorphyc (talk) 17:39, 19 June 2016 (UTC)
- Yes. —CodeCat 17:42, 19 June 2016 (UTC)
- Thank you! Isomorphyc (talk) 17:50, 19 June 2016 (UTC)
- Hi again, @CodeCat; thanks for fixing my bad hierarchical mark-up. Did you really mean to make the genitives a third etymology? We inconsistently include macron-varying inflections of lemma forms (habitus), I noticed, but when we do it is usually under the same etymology (abactus). Does the pronunciation necessitate a new etymology heading for type-related reasons? Isomorphyc (talk) 18:29, 19 June 2016 (UTC)
- I prefer treating lemmas and nonlemmas as separate etymologies in all situations. I even once proposed making a specially-named etymology section just for nonlemmas, but that wasn't accepted. —CodeCat 18:38, 19 June 2016 (UTC)
- I think it should perhaps be a collapsed sub-heading of Inflection, and if somebody wants to record the pronunciation of the genitive, for some reason, it is always possible to write something like:
* {{audio|la-cls-manus.ogg|Audio (Classical, genitive)|lang=la}}
and make a second bullet point in the main Pronunciation section. I do agree the inconsistency of the non-lemma clutter is unfortunately one of the uglier things in the Latin section. Isomorphyc (talk) 18:59, 19 June 2016 (UTC)- It's pretty rare for a lemma form to coincide with its own nonlemma form in spelling, and not in pronunciation. In Dutch and German, the lemma of verbs (infinitive) coincides with the present plural form, but these are pronounced the same so we don't create any entry or definitions for them. The verb read doesn't get any entry for its past tense, just a note in the pronunciation, but there are no macrons to indicate pronunciation in English the way there are in Latin. So Latin is kind of a unique situation that way. Maybe something to ask on the BP? —CodeCat 19:23, 19 June 2016 (UTC)
- Do you have a link to the discussion you had last time? I wouldn't mind implementing a prototype and asking around in the BP, if we can agree on something. But neither of our solutions is quite satisfactory. (Mine is littering and your is a landfill.) Isomorphyc (talk) 22:23, 19 June 2016 (UTC)
- It's pretty rare for a lemma form to coincide with its own nonlemma form in spelling, and not in pronunciation. In Dutch and German, the lemma of verbs (infinitive) coincides with the present plural form, but these are pronounced the same so we don't create any entry or definitions for them. The verb read doesn't get any entry for its past tense, just a note in the pronunciation, but there are no macrons to indicate pronunciation in English the way there are in Latin. So Latin is kind of a unique situation that way. Maybe something to ask on the BP? —CodeCat 19:23, 19 June 2016 (UTC)
- I think it should perhaps be a collapsed sub-heading of Inflection, and if somebody wants to record the pronunciation of the genitive, for some reason, it is always possible to write something like:
- I prefer treating lemmas and nonlemmas as separate etymologies in all situations. I even once proposed making a specially-named etymology section just for nonlemmas, but that wasn't accepted. —CodeCat 18:38, 19 June 2016 (UTC)
- Yes. —CodeCat 17:42, 19 June 2016 (UTC)
One of the bits you wrote mentioned an "Aureate" period. Keep in mind that Wiktionary is intended for a general audience, not just those who study Roman history. So it's best to keep things as free of jargon and in-knowledge as possible. Using actual years/centuries is better, or if there's no way else to explain it, at least wikilink the term so that people can find what it means easily. —CodeCat 20:35, 27 May 2016 (UTC)
- I'll use centuries; I will go back and change a previous instance which I think I used too. Thanks! Isomorphyc (talk) 20:41, 27 May 2016 (UTC)
- Quick further question for you: since this is an independent suffix of -ārius, the majority of adjectives through which the derivation theoretically takes place were never attested. I didn't want to link them as words, because in most cases there are other forms that would be preferred, but they're also not quite reconstructions. Is using an asterisk with non-linking text in italics a reasonable compromise? Isomorphyc (talk) 20:48, 27 May 2016 (UTC)
- Yes, that's good. We use non-linking text when we don't think we actually want the entry to exist. That can change with time, though; at one point, no reconstruction entries were desirable, but we allow them now. —CodeCat 21:27, 27 May 2016 (UTC)
- Quick further question for you: since this is an independent suffix of -ārius, the majority of adjectives through which the derivation theoretically takes place were never attested. I didn't want to link them as words, because in most cases there are other forms that would be preferred, but they're also not quite reconstructions. Is using an asterisk with non-linking text in italics a reasonable compromise? Isomorphyc (talk) 20:48, 27 May 2016 (UTC)
- I'll use centuries; I will go back and change a previous instance which I think I used too. Thanks! Isomorphyc (talk) 20:41, 27 May 2016 (UTC)
LSJ reference bot
[edit]If/when your bot gets approved, it needs to abide by the botting policy, which requires following our formatting guidelines (WT:NORM). I noticed in this edit your bot did not leave a blank line before the new section heading (however, in this edit it did). Make sure your bot always adds a blank line (see WT:NORM#Headings). --WikiTiki89 14:41, 3 June 2016 (UTC)
- Thanks, I had also made a note of that from the trial run. I think the current version will fix that on the next run through the articles. If the bot isn't approved, I will do it manually. Isomorphyc (talk) 14:57, 3 June 2016 (UTC)
- Also, normally bots' userpages link to their operator's user page. Since you don't have a user page, could you either create one and link to it, or link to your talkpage? --WikiTiki89 15:00, 3 June 2016 (UTC)
- I just added a link to this page; thank you again! Isomorphyc (talk) 15:05, 3 June 2016 (UTC)
- Thanks! --WikiTiki89 15:13, 3 June 2016 (UTC)
- Not sure if there's a reason you don't want to have a userpage, but having a BabelBox (WT:BABEL) would be very helpful. —Μετάknowledgediscuss/deeds 04:20, 7 June 2016 (UTC)
Babel
[edit]Could you add {{Babel}}
to your user page? I'd appreciate it. --Dan Polansky (talk) 06:53, 19 June 2016 (UTC)
.gz.b64
[edit]I'm glad I'm not the only crazy person who's created files with that extension. --WikiTiki89 02:11, 14 July 2016 (UTC)
- @Wikitiki89: Just one more and we will meet attestation requirements. Isomorphyc (talk) 02:49, 14 July 2016 (UTC)
Module memory usage
[edit]I reverted OrphicBot's addition of the reference templates to a because they used too much memory. There's a limit of 50 MB for Lua, and the references section with your templates used 18 MB. Please figure out a way to be more selective about which modules are loaded- the ones I looked at had nothing to do with the entry, and there were over a dozen of them in use. Chuck Entz (talk) 13:59, 19 July 2016 (UTC)
- Hi @Chuck Entz: The reason for the large number of module transclusions is that each phrase produced by R:M&A is potentially found in one of a hundred different data modules having a name from Module:data tables/dataC0 to Module:data tables/dataC99. Approximately 99% of the words in Latin and Greek have fewer than a dozen phrases, so memory use is on the order of 2-3 MB for that module (or R:Woodhouse in Greek), because the number of modules required will always be less than or equal to the number of phrases. The very worst offender is habeo with 102 phrases found in 42 data modules. There are two solutions to this problem: I already sharded the data out to 100 modules, and it would take me a few minutes to shard it to out 1000 modules. Then habeo, a, and in will transclude about 100 modules each about 5 KB in size, for total memory use of about 3 MB (including about 7x runtime inflation). I didn't want to do this earlier because I'm not sure if it would flood Recent Changes, and I made the 100 shards before I had a robot account. The other option is to simply remove the most common words from the module. My personal experience with Woodhouse and Meissner/Auden is that they are most valuable for idiomatic uses of common words. Hence, I would prefer to try the 1000 shards solution first to see if it works and is acceptable to others. Would that be all right with you, provisionally speaking? I would be willing to reverse it later if this seems disliked. I apologise for the technical detail in this note; since you did look into the modules, if you click on either of the two links above, you will find a line such as "local la_RMA_index_to_phrases". Inside is a mapping from numbers to phrases. At least a few phrases in this list appear in the pull-down list, which is why the module was transcluded. If I made the change I am suggesting, the modules would be much smaller and would be more focused on relevant information. Thanks for taking a look at this. I would have done this earlier myself, except I felt I needed permission to create 1000 shards, especially before my robot account was approved. Isomorphyc (talk) 16:43, 19 July 2016 (UTC)
- @Chuck Entz: : I made the change I suggested. The results are: R:M&A (a) : 18.6 MB -> 6.92 MB. R:M&A (habeo) : 18.6 MB -> 7.13 MB. There is no longer a memory overflow on a using my template, but the total is 44 MB. If this is cutting it too close I tried special casing "a", in which case the memory consumption is 1.7 MB, and "a" falls to 39 MB. It is ugly, but it works. Which do you think is preferable? Isomorphyc (talk) 02:26, 20 July 2016 (UTC)
- I'm not really qualified to judge the merits of your solution, except to point out that organizing your data better would seem to me to make it less necessary to load everything every time. When I see data from several different templates in the same data module, it makes me wonder why you would need to load data for all those other templates when you're just using the one. But that's not why I'm here. The entry at nilus has had a module error since the changes you mentioned above. Please fix it. Thanks! Chuck Entz (talk) 22:33, 24 July 2016 (UTC)
- @Chuck Entz: Fixed; thank you for noticing. I had apparently made an error in creating a data table a month ago, with an extra comma, like this: ["nilus"]={",299"}, preventing the lookup from resolving. Strangely, this is the only error of this type. It appeared when OrphicBot added references to 'nilus,' which was almost the same time I fixed the other problem. As for the module performance: the modules never load more than 4% of the shard set, and the mean, median and mode cases involves loading 0.2% of the data. 95% of the time, less than 1% of the data is loaded. Unfortunately, 4% is about 40 data modules, and the high traffic cases are important words with the most stringent memory requirements. The cost of good mean, median, and mode performance is bad outlier performance. I will think this over; I think the current solution is decent, but if the outliers are the most high traffic words, this is indeed a problem. Isomorphyc (talk) 00:10, 25 July 2016 (UTC)
- Hi @Chuck Entz: The reason for the large number of module transclusions is that each phrase produced by R:M&A is potentially found in one of a hundred different data modules having a name from Module:data tables/dataC0 to Module:data tables/dataC99. Approximately 99% of the words in Latin and Greek have fewer than a dozen phrases, so memory use is on the order of 2-3 MB for that module (or R:Woodhouse in Greek), because the number of modules required will always be less than or equal to the number of phrases. The very worst offender is habeo with 102 phrases found in 42 data modules. There are two solutions to this problem: I already sharded the data out to 100 modules, and it would take me a few minutes to shard it to out 1000 modules. Then habeo, a, and in will transclude about 100 modules each about 5 KB in size, for total memory use of about 3 MB (including about 7x runtime inflation). I didn't want to do this earlier because I'm not sure if it would flood Recent Changes, and I made the 100 shards before I had a robot account. The other option is to simply remove the most common words from the module. My personal experience with Woodhouse and Meissner/Auden is that they are most valuable for idiomatic uses of common words. Hence, I would prefer to try the 1000 shards solution first to see if it works and is acceptable to others. Would that be all right with you, provisionally speaking? I would be willing to reverse it later if this seems disliked. I apologise for the technical detail in this note; since you did look into the modules, if you click on either of the two links above, you will find a line such as "local la_RMA_index_to_phrases". Inside is a mapping from numbers to phrases. At least a few phrases in this list appear in the pull-down list, which is why the module was transcluded. If I made the change I am suggesting, the modules would be much smaller and would be more focused on relevant information. Thanks for taking a look at this. I would have done this earlier myself, except I felt I needed permission to create 1000 shards, especially before my robot account was approved. Isomorphyc (talk) 16:43, 19 July 2016 (UTC)
Reference-Module Errors
[edit]You left things kind of half... done. I was able to fix Module:R:Autenrieth, since it was just a matter of changing the name of the collision table. That got rid of the module errors in 36 Ancient Greek entries. There are still a number of errors at various templates whose names don't match the names in your template's data, but I'll leave that for you. Then there's an invisible module error at castus, though it doesn't seem to be from any of those templates. Chuck Entz (talk) 04:34, 2 August 2016 (UTC)
- Autenrieth shouldn't have been using LSJ's collision table (my mistake in the first place). I was going to upload the respective collision tables tomorrow. The template page errors originated because template pages don't call modules through a template, so there's no name, and the null result needs to be a special case. Castus was my misuse of the senseid template. Thank you for watching out of these; I apologise for the work I left you to do. The good thing is that I have just learned about this page: [Category:Pages with module errors]. I will watch it in the future. Isomorphyc (talk) 05:22, 2 August 2016 (UTC)
Not sure what this experiment is, but don't forget to use num=sg! —Μετάknowledgediscuss/deeds 05:07, 28 August 2016 (UTC)
- @Metaknowledge: I don't think I had ever looked at the markup for a singular-only inflection before, though I have certainly seen them. Thank you! Apologies for the mess. I have finally got OrphicBot to add references to new entries as they are created. I needed to do some testing for a few corner cases of the logic which keeps it out of edit wars with people. Isomorphyc (talk) 15:43, 28 August 2016 (UTC)
- Woo, more palindromes! —JohnC5 05:18, 28 August 2016 (UTC)
- @JohnC5: I was pretty happy to find that one. Under what circumstances do you add locatives? I would be a bit surprised if Aballaba has ever been used that way. Isomorphyc (talk) 15:43, 28 August 2016 (UTC)
- There has been an ongoing habit, most recently by @Samubert96 and others, to add locatives to all new city and river entries (I should point out that (s)he has been doing yeoman work adding so many of late). I'm not sure it necessarily hurts to have them, but a lot of them a definitely unhistorical. —JohnC5 18:40, 28 August 2016 (UTC)
- I have enjoyed Samuabert96's many new entries, especially the words from Pliny. It was largely to accommodate these that I wanted to keep a daemon process for references. For locatives, the forms are entirely unhelpful to add, and to offer them ahistorically is at least modestly misinformative. I'm not particularly sure I would want to be a curmudgeon about this, however. Isomorphyc (talk) 19:02, 28 August 2016 (UTC)
- Up to you. —JohnC5 19:18, 28 August 2016 (UTC)
- That probably came out wrong. I think of the locative as similar to how some places can take definite articles in English -- the Hague, the Netherlands, the Punjab, the Sudan, etc. It is rare enough one is pleasantly self-conscious even when using it correctly. If I had a computational way to make a list of permitted and required locatives, I would; but failing that, I don't see any harm in being broad-church about obscure places, because the very short lists are in all the grammars anyway. Good to have you back, by the way. Isomorphyc (talk) 19:38, 28 August 2016 (UTC)
- Thanks! —JohnC5 20:06, 28 August 2016 (UTC)
- That probably came out wrong. I think of the locative as similar to how some places can take definite articles in English -- the Hague, the Netherlands, the Punjab, the Sudan, etc. It is rare enough one is pleasantly self-conscious even when using it correctly. If I had a computational way to make a list of permitted and required locatives, I would; but failing that, I don't see any harm in being broad-church about obscure places, because the very short lists are in all the grammars anyway. Good to have you back, by the way. Isomorphyc (talk) 19:38, 28 August 2016 (UTC)
- Up to you. —JohnC5 19:18, 28 August 2016 (UTC)
- I have enjoyed Samuabert96's many new entries, especially the words from Pliny. It was largely to accommodate these that I wanted to keep a daemon process for references. For locatives, the forms are entirely unhelpful to add, and to offer them ahistorically is at least modestly misinformative. I'm not particularly sure I would want to be a curmudgeon about this, however. Isomorphyc (talk) 19:02, 28 August 2016 (UTC)
- There has been an ongoing habit, most recently by @Samubert96 and others, to add locatives to all new city and river entries (I should point out that (s)he has been doing yeoman work adding so many of late). I'm not sure it necessarily hurts to have them, but a lot of them a definitely unhistorical. —JohnC5 18:40, 28 August 2016 (UTC)
- @JohnC5: I was pretty happy to find that one. Under what circumstances do you add locatives? I would be a bit surprised if Aballaba has ever been used that way. Isomorphyc (talk) 15:43, 28 August 2016 (UTC)
Minerals References (moved from Sandbox --Isomorphyc (talk) 14:23, 14 September 2016 (UTC))
[edit]- David Barthelmy (1997–2024) “Laurelite”, in Webmineral Mineralogy Database.
- “Laurelite”, in Mindat.org[1], Hudson Institute of Mineralogy, 2000–2024.
- Mindat.org[2], Hudson Institute of Mineralogy, 2000–2024.
Is there any way to get (at least mindat) to refer to a glossary page instead of a mineral page? for instance, [3] is not recalled even from the Marlstone page because it's in a different library. Mindat.org[4], Hudson Institute of Mineralogy, 2000–2024. EI at10s (talk) 23:28, 31 August 2016 (UTC)
- @EI at10s: I noticed this. I wasn't sure how desirable it was to include the glossary because it had been only linked four times. Please see my glossary examples here: User:OrphicBot/Sandbox/mindat glossary links and at the links from that page whose references I have updated. I also noticed you have changed some of my References headers to External links. Do you prefer this treatment? I can change the headings in the sections I created if you think it is preferable that way. I used References to follow the convention we have used for a long time in Latin and Greek, which is actually probably not totally appropriate. Also, just so you know, I will be automatically notified if you contact me on my talk page or if you use the
{{ping}}
template, but not if you use one of the sandbox pages. That said, feel free to experiment in my sandbox pages if you would like. Your recent contributions to the mineralogy section is much appreciated! Isomorphyc (talk) 17:04, 1 September 2016 (UTC) - Edit: Does this answer your question about changing the hyperlink text? That is, did you envision another use-case for this than the glossary? It is preferable to keep the templates as structured as possible, so if all of the relevant links can be generated with the relevant mineral name and a glossary true/false value, this would be preferred to allowing arbitrary hyperlinks from arbitrary templates. Isomorphyc (talk) 17:10, 1 September 2016 (UTC)
- @Isomorphyc Gotcha! The story with the References/External links thing was, when I started editing these articles, I changed the initial ====References==== with the ===External links=== headers because it made more sense to me, at least, in the way it wasn't using reference templates that seemed to all be dictionaries; these mineralogy databases aren't dictionaries. Along the way I began to second guess the decision, and reverted some of the header tags, but Codecat reverted those reversions, so I took up changing them to External links. Even with a reference template, which is very useful, they're still not dictionaries but databases. In any case I'm going to use Codecat's practice as proof, so maybe going with the bot to change all those header tags would be very helpful. To me, at least.
- That does also answer my question enough. There being a gallery parameter makes things a lot easier and I think that'd be satisfactory for all cases. Thanks a ton. EI at10s (talk) 20:02, 1 September 2016 (UTC)
- @EI at10s: Great, and you're welcome. I'll rename the External links sections I created soon. (I won't rename any which I didn't create in case someone else had a good reason.) Isomorphyc (talk) 20:21, 1 September 2016 (UTC)
'See also' regarding Thai language
[edit]Your bot is adding 'see also' to Thai entries. But I think it is working improperly. For example, the bot adds อนุญาติ (à-nú-yâat) to อนุญาต (à-nú-yâat) when the former is merely a misspelling of the latter, and adds สั่ง (sàng), สิ่ง (sìng), สูง (sǔung), and ส่ง (sòng) to สิง (sǐng) when these terms are completely unrelated. Do an entry need to contain a see-also link to an unrelated entry?
Moreover, spelling variants are generally added in the 'alternative spellings' section and, I think, need not to be put at the top of the page as 'see also' again.
This topic might concern @Atitarev, @Octahedron80.
--YURi (talk) 03:26, 12 September 2016 (UTC)
- The
{{also}}
template is used to link between entries that have the same base form but different diacritics. It has nothing to do with semantic relations or any particular language. DTLHS (talk) 03:32, 12 September 2016 (UTC)
- What about misspellings (as อนุญาติ (à-nú-yâat) vs อนุญาต (à-nú-yâat))? I don't think a page really needs to contain a see-also link to a misspelling (esp when there won't be anything to see on the misspelling page). --YURi (talk) 03:41, 12 September 2016 (UTC)
- I don't see a problem with linking to misspellings. I think we should keep the rules as simple as possible. DTLHS (talk) 03:52, 12 September 2016 (UTC)
- I have no objection to linking to misspellings. But I think, on a misspelling page, adding a see-also link to the correct spelling should be avoided, because there already is a link to the correct spelling elsewhere on that page. See this page อนุญาติ (à-nú-yâat) for example. --YURi (talk) 05:54, 12 September 2016 (UTC)
- @YURi: I hope you don't mind, but I have mentioned your suggestion in the Beer Parlour: Wiktionary:Beer_parlour/2016/September#Centralization_of_also-information. I personally do not favour this idea, because I hesitate to create dependencies between templates which makes Wiktionary potentially much more complicated, even if the dependencies are merely conventional. But I will implement this if users broadly disagree with me about this. Isomorphyc (talk) 21:21, 12 September 2016 (UTC)
- Hi @YURi:, the wording is slightly confusing. I was about to say what User:DTLHS said. However, if Thai or any other language needs to be treated differently than I am treating it, I will be glad to make the relevant changes. Right now the diacritics treatment is entirely language-agnostic and per the Unicode handling of diacritics. I don't know yet if any languages will have local issues in which the Unicode treatment is incorrect. I imagine it is possible, but I was not able to think of any examples. Isomorphyc (talk) 03:38, 12 September 2016 (UTC)
I'm wondering what's going on here, where the bot added red links. DTLHS (talk) 03:42, 12 September 2016 (UTC)
- @DTLHS: Sorry for the delay. I think I can tell you what it did: it copied the red links from here: trừ. It is not supposed to do that, however. This affects less than 1% of the total run, and I will fix all of the red links soon. Thank you for pointing this out. Isomorphyc (talk) 04:24, 12 September 2016 (UTC)
- @DTLHS: Do you think it would be better to delete the original red links too, or just my copies of them? That is, do you think I should assume someone left the original ones there for a good reason and leave them alone? Isomorphyc (talk) 21:45, 12 September 2016 (UTC)
- Yes I do think they should be deleted. If someone wants to request an entry there are plenty of places to do so. The top of a potentially unrelated (except for spelled the same way) entry is no place for it. DTLHS (talk) 23:46, 12 September 2016 (UTC)
- @DTLHS: I agree; there should be no red links now. I'm still working on some smaller parts around the edges (especially appendices). Thanks for your involvement in this. Isomorphyc (talk) 02:28, 13 September 2016 (UTC)
OrphicBot reverts unrelated changes
[edit]Hi, OrphicBot made this questionable change. It meant to add 'also', but it reverted my recent change after ~40 min. Bot is supposed to read and write articles in short succession. Does OrphicBot cache content and later writes based on the previously cached data? I suspect this is what happens. Bot isn't supposed to use cached data. Yurivict (talk) 06:41, 12 September 2016 (UTC)
- It also did this here. Redboywild (talk) 10:15, 12 September 2016 (UTC)
- The process is the following: 1) It caches pages and revision ids and queues entries. 2) The updated entry from the cached data is sent to the server with the revision id of the edit on which it was based. 3) If a more recent revision exists than the stated revision id, the server rejects the change. 4) If the change is rejected, a fresh page copy is loaded and the robot tries again.
- This is what is supposed to happen. Thank you for pointing these out to me. I will look in to what went wrong in these cases and if there are any others. These revisions should not have have happened this way. Isomorphyc (talk) 11:17, 12 September 2016 (UTC)
- @Chuck Entz, Redboywild, Yurivict: It turns out there were 26 cases in which this happened. The list of errors (now fixed) is here: User:OrphicBot/EditLogs/12September2016 Incorrect Overwrites. The reason for this problem was that, although I incorrectly said above a revision id is sent to the server, instead, a timestamp of the last revision is sent, which in few cases was not unpacked correctly from a list, and was not sent. Per sandbox testing, this mistake is now fixed. Many thanks for bringing this to my attention, and apologies that you had to. Isomorphyc (talk) 16:30, 12 September 2016 (UTC)
- Thanks! Yurivict (talk) 17:25, 12 September 2016 (UTC)
Thank you!!!
[edit]Thank you so much for your bot edits with Template:also. I actually was thinking about suggesting to the community myself that we do a bot procedure like this. I can't thank you enough. I love this. (This isn't sarcasm, I'm being dead serious) I've been waiting for something like this for a very long time! PseudoSkull (talk) 02:10, 13 September 2016 (UTC)
- @PseudoSkull: You are very welcome! Isomorphyc (talk) 02:16, 13 September 2016 (UTC)
Tibetan "also" links
[edit]Hi, could you please remove the Tibetan subjoined letters (ྐྑྒྔྕྖྗྙྟྠྡྣྤྥྦྨྩྪྫྮྯྭྰྱྲླྴྶྷྸ) from the list of diacritics? Only the vowel signs (ཱིེོིུཷླཹཾཿ) are diacritic symbols for Tibetan. The bot added rku ba, rna ba, rmu ba and rtsa ba as also-see items at rtsi ba. Thanks, Wyang (talk) 10:10, 13 September 2016 (UTC)
- @Wyang: I will remove them soon; thank you for letting me know. I posted a list of diacritic marks here: User:OrphicBot/diacritics table. I hope it did not send you a very large number of messages, for all of the Tibetan rows. If you have any other concerns with what I have included, I would appreciate if you could take a look! Isomorphyc (talk) 14:30, 13 September 2016 (UTC)
- Update: It turns out all but one of the Tibetan also-links I had created were spurious: please see my change log here User:OrphicBot/EditLogs/13September2016_Incorrect_Tibetan_Subjoined_Consonant_Links. The only
{{also}}
retained was at ར. I'm not sure how sensible the Katakana and the Hanzi links are, but they were inherited from the Katakana entry. Isomorphyc (talk) 21:37, 13 September 2016 (UTC)
- Thanks for taking care of this; it is not easy to come up with the best algorithm. The other diacritics in the diacritics table seem okay to me. Most of the removed also-see items are spurious, but not all. For instance, it is fine to list ལྡག་པ (ldag pa) and ལྡེག་པ (ldeg pa) at ལྡོག་པ (ldog pa), but not ལག་པ (lag pa) (which was rightly removed). I think the Katakana and Hanzi links at ra are okay. Wyang (talk) 01:44, 14 September 2016 (UTC)
- @Wyang: You are very sweet; it is not algorithmically difficult, but I did it wrong. Is this an improvement? User:OrphicBot/EditLogs/13September2016_Incorrect_Tibetan_Subjoined_Consonant_Links_Corrected? Thanks, Isomorphyc (talk) 03:28, 14 September 2016 (UTC)
- Yes, definitely. Most of the cases are perfect. Only example that needs to be changed is ཁེ (khe), which should only have ཁ (kha) as an also-term. The other added also-terms (ཁྱི (khyi) and ཁྲི (khri)) contain subjoined letters, which are subscripts in this case. Fantastic work so far; I will add some similar-looking Tibetan letters to the equivalents table once that is ready, e.g. ཆ (cha) vs ཚ (tsha). Thank you! Wyang (talk) 03:42, 14 September 2016 (UTC)
- @Wyang: Thank you! I've committed the fixes now; it looks like ཁེ (khe) is correct; should I also remove ཁྱི (khyi) and ཁྲི (khri) from ཁ (kha), I think all of the items from (ཁྱི (khyi), and ཁྱི (khyi) from ཁྲི (khri))? I think this is working now-- it is mostly not anything I have done, but the Unicode specification is well organised enough that these changes are quite economical to make. If you want to add anything to the equivalence classes, that part is now almost fully operational also. Isomorphyc (talk) 04:07, 15 September 2016 (UTC)
- That's exactly right. Thank you so much for this! I added some Tibetan equivalents to the table. Wyang (talk) 08:13, 15 September 2016 (UTC)
Some more equivalencies
[edit]I would like to suggest some equivalencies for the {{also}}
bot in Cyrillic (also the capitalized versions of everything I list):
- е = ё = є = э = ѥ
- а = ꙗ = я = ѧ = ѩ
- ѫ = ѭ
- у = ꙋ
- з = ѕ = ꙁ
- и = і = ї = ѵ = ѷ
- ь = ъ = ѣ
- ы = ꙑ = ъі = ъи = ьі = ьи
- о = ѡ = ѽ = ѿ
- ц = џ
If possible, also something like this:
And is there some page where this can be edited, or should I just come to your talkpage with these suggestions every time? --WikiTiki89 21:38, 13 September 2016 (UTC)
- @Wikitiki89: Thank you -- I was almost going to ask you, in fact. You can see or add to my equivalence table here: User:OrphicBot/diacritics_table. It is directly from of the Unicode specification, but I am adding exceptions since Unicode's concerns are more typographical than orthographic. I'm not sure how easy the table is to use, but please feel free to write there or here, whichever is easier. Also, if you could have a glance over the Hebrew equivalences too I would appreciate it. Isomorphyc (talk) 21:42, 13 September 2016 (UTC)
- Edit: for clarity, the robot does not actually read the table. I will use it to edit the parameters. Isomorphyc (talk) 21:46, 13 September 2016 (UTC)
- Edit again: I should be clear about what I am doing currently. Of Wiktionary's 5m entries, about 500k belong to families with more than one member. The root of a family is calculated, currently, but removing punctuation and whitespace, setting all letters to lowercase, and removing everything considered in Unicode to be a diacritic, except (currently) for Tibetan subjoined consonants. It is easy to add your equivalences to my equivalence function, but I realised it is perhaps not so easy to add them to my table. Isomorphyc (talk) 21:51, 13 September 2016 (UTC)
- Ok, so you don't currently have an equivalence table, only a diacritic stripping table, if I understood correctly. Anyway, the way you explain it, it should be easy to handle these equivalencies by conversion of each character in the groups above to one chosen as "canonical", but it might be difficult to handle the multi-character sequences and the ones and the two rules in my second block above without accidentally breaking other rules. --WikiTiki89 22:17, 13 September 2016 (UTC)
- @Wikitiki89: You did a bit of work for this-- these are not modern characters. I think your pure equivalences in the top group will not present any problems, provided at the root each family are the modern Cyrillic forms. The lower group can be accomplised by adding the words with њ and љ to two different families without uniting the families. I will see how these work in practice and may post edit logs per-rule for you to look at before saving the edits.
- I suppose using a whitelist changeable table for this as a dependency is possible, but I'm not sure that's any easier than posting here, unless you think there are a lot of intricate historical ones you would like to test. Is this what you meant by a table? Isomorphyc (talk) 22:58, 13 September 2016 (UTC)
- What I meant by a table is a place where I can just add rules without having to bother you (or at least bother you as little as possible). I actually didn't do much work for the above list, that's what I came up with off the top of my head. In general I work with old languages a lot, so all these characters are actually on my custom Cyrillic keyboard. And I just happened to be currently doing things with Old Church Slavonic entries, which is how I noticed that these rules would be nice to have. And not all those characters are actually old, many of them are still used in modern languages. Also, I already remembered some characters I forgot to add above, and I can give you a list for Hebrew as well (which is much shorter), but I'm waiting for you to tell me where to put them. --WikiTiki89 23:05, 13 September 2016 (UTC)
- @Wikitiki89: Please put them here for now; hopefully I will have something serialisable to which I can give you a link, but I will need to try a few examples first. Your recent edit of a moment ago made me laugh. I did not say all the non-Russian characters were also not modern-- I should have said there were synchronic and diachronic variations in your list alike. Isomorphyc (talk) 23:14, 13 September 2016 (UTC)
- I added more Cyrillic ones to the list above. Here are some Hebrew ones:
- Also, punctuation like ״, ׳, ־ can be dropped (does that need to be added to User:OrphicBot/diacritics table?). --WikiTiki89 23:28, 13 September 2016 (UTC)
- @Wikitiki89: I have made a new file for equivalences here: User:OrphicBot/equivalences.txt. I am not sure if the format will support everything we need, but please feel free to edit it. I haven't tested it much, but if you would like to take a look at the new equivalence classes it has produced, please see here: User:OrphicBot/Sandbox3. I will work on this more tomorrow. I think there is no need for punctuation, as I am simply omitting everything that Unicode considers to be punctuation. If this needs to be more finely grained we can think about a way to notate this. Isomorphyc (talk) 02:41, 14 September 2016 (UTC)
- @Wikitiki89: Thank you for all of your recent changes! I have put a new catalogue of updates, given the equivalence classes, here: User:OrphicBot/14September2016 Proposed Changes with Equivalence Classes. I haven't committed anything yet, and I also haven't fully implemented a few features: multiple equivalence class membership (for multiple LHS entries), some punctuation, and appendices, but this file worked surprisingly smoothly, and the total number of changes is only ~10,000. Let me know if anything doesn't look quite right. I also think this file format is cleaner than a data module, at least for the time being. Isomorphyc (talk) 04:12, 15 September 2016 (UTC)
- @Wikitiki89: I have made a new file for equivalences here: User:OrphicBot/equivalences.txt. I am not sure if the format will support everything we need, but please feel free to edit it. I haven't tested it much, but if you would like to take a look at the new equivalence classes it has produced, please see here: User:OrphicBot/Sandbox3. I will work on this more tomorrow. I think there is no need for punctuation, as I am simply omitting everything that Unicode considers to be punctuation. If this needs to be more finely grained we can think about a way to notate this. Isomorphyc (talk) 02:41, 14 September 2016 (UTC)
- @Wikitiki89: Please put them here for now; hopefully I will have something serialisable to which I can give you a link, but I will need to try a few examples first. Your recent edit of a moment ago made me laugh. I did not say all the non-Russian characters were also not modern-- I should have said there were synchronic and diachronic variations in your list alike. Isomorphyc (talk) 23:14, 13 September 2016 (UTC)
- What I meant by a table is a place where I can just add rules without having to bother you (or at least bother you as little as possible). I actually didn't do much work for the above list, that's what I came up with off the top of my head. In general I work with old languages a lot, so all these characters are actually on my custom Cyrillic keyboard. And I just happened to be currently doing things with Old Church Slavonic entries, which is how I noticed that these rules would be nice to have. And not all those characters are actually old, many of them are still used in modern languages. Also, I already remembered some characters I forgot to add above, and I can give you a list for Hebrew as well (which is much shorter), but I'm waiting for you to tell me where to put them. --WikiTiki89 23:05, 13 September 2016 (UTC)
- I suggest you make a data module for this: other people will easily be able to edit it and you can export it to JSON when you want to run your bot. DTLHS (talk) 00:18, 14 September 2016 (UTC)
- @DTLHS: I will think about making a data module; for now there is a text file: User:OrphicBot/equivalences.txt. One will need to think of a maintainable way to give users incremental control without producing hundreds of thousands of edits by accident. I think this can be done. Isomorphyc (talk) 02:41, 14 September 2016 (UTC)
- Just letting you know that I fixed another typo after looking at User:OrphicBot/15September2016 Multiple Equivalence Class Membership List. --WikiTiki89 22:17, 15 September 2016 (UTC)
- Also, may I recommend using bulleAts rather than HTML line breaks on your bot's user page? --WikiTiki89 22:19, 15 September 2016 (UTC)
- I think you are right, the bullets are an aesthetic improvement to the presentation also. But I have found the lack of line breaks in native wikitext quite frustrating.
- The mismatched characters in in the multi-class production, made me wonder briefly, but I didn't know enough to see the problem. All of the new changes will probably commit tomorrow or the day after; then mainly the 8+ item appendices and the real-time client will remain. The current version reads the most recent whitelisted-user's revision of equivalences.txt each run, so any changes will eventually be committed, even if I don't notice them. Isomorphyc (talk) 03:49, 16 September 2016 (UTC
- @Wikitiki89, Wyang, Suzukaze-c Thank you all for your work on the equivalence list. Your changes have made this system much more flexible than it was when I had first implemented it. My current edit list at User:OrphicBot/16September2016 Multiple Class Equivalence Edit Log - additional or changed should take all of your changes into account. Could you please look over your respective language sections to see that everything looks approximately correct? The equivalence list can evolve over time, but if the current changes are an accurate reflection of the list as it is, I will go ahead and post them. Isomorphyc (talk) 02:30, 17 September 2016 (UTC)
- I skimmed through the hanzi entries and they look OK. —suzukaze (t・c) 02:51, 17 September 2016 (UTC)
- @Wyang : I don't know if this is a really bad idea, but I catalogued the traditional -> simplified mappings and it resulted in another 40,000 also template updates. Do you think these are worth keeping? My Mandarin is probably about HSK 4-5, so I have a bit of a sense about what I am looking at, but my only experience with traditional characters has been with etymologies. I hope I am not in over my head here. The links (and the second equivalence file for simplified to traditional) are linked from User:OrphicBot. Thanks. Isomorphyc (talk) 19:58, 17 September 2016 (UTC)
- Edit: looking over the links, it seems to me you all but did this yourself about two years ago in a more thorough way. Maybe the also links are superfluous, given this? But I did find 855 simplified entries which are not redirected. I might be misreading, however. They are here: User:OrphicBot/Simplified Chinese Entries without Redirects Isomorphyc (talk) 20:50, 17 September 2016 (UTC)
- Thanks for all your hard work Isomorphyc. I believe this should be discussed among the major Chinese editors - @Suzukaze-c, Justinrleung, Atitarev, Tooironic, Hongthay, Mar vin kaiser. (Sorry for hijacking your user talkpage for this; we can move this to another location if you prefer.) If I understand it (User:OrphicBot/Hanzi Simplified to Traditional Also Mappings Test 1/2.txt) correctly, the proposed change is that the
{{also}}
template is added universally to all Chinese Hanzi and Japanese Kanji pages, using a set of simplified-traditional-shinjitai correspondences. What are everyone's thoughts on this? The advantage of this approach is that it is systematic and an easy-to-apply rule, and the disadvantage is that it may duplicate the content of the{{zh-see}}
template on simplified Chinese pages. Personally, I am fine with this proposal. Wyang (talk) 04:59, 18 September 2016 (UTC)
- Thanks for all your hard work Isomorphyc. I believe this should be discussed among the major Chinese editors - @Suzukaze-c, Justinrleung, Atitarev, Tooironic, Hongthay, Mar vin kaiser. (Sorry for hijacking your user talkpage for this; we can move this to another location if you prefer.) If I understand it (User:OrphicBot/Hanzi Simplified to Traditional Also Mappings Test 1/2.txt) correctly, the proposed change is that the
- With regard to the unredirected simplified entries, I'm a little confused by how User:OrphicBot/Simplified Chinese Entries without Redirects was generated. There are some entries needing attention (痴騃), but some of the other traditional forms are incorrect (一瞭百瞭, 一粒老鼠屎壞瞭一鍋粥, 萬俟), and some Hanzi entries haven't been converted to unified Chinese (皚, 礱). Wyang (talk) 04:59, 18 September 2016 (UTC)
- @Wyang: I appreciate your help and patience with this. Greetings to the Chinese editors linked above. It is fine to have this conversation here, but if you wanted to move it to a place where relevant editors would see it, please feel free to do that also. Regarding the
{{also}}
template for simplified to traditional mappings, I am a little bit embarrassed that I proposed this change without being aware of the{{zh-see}}
template. Most of my experience with the Chinese entries was from the time when we had independent entries for simplified and traditional forms, a few years ago. Wyang, you did an extraordinary amount of work to clean this up, and I have only just realised this; thank you for doing this. I am concerned that the change I am suggesting is mostly redundant and contributes unnecessary clutter to these pages. The only advantages in adding the{{also}}
templates is that its is a very simple orthograpic template to maintain, and I think the simplified -> traditional mappings fall under the original intent of this template.
- @Wyang: I appreciate your help and patience with this. Greetings to the Chinese editors linked above. It is fine to have this conversation here, but if you wanted to move it to a place where relevant editors would see it, please feel free to do that also. Regarding the
- Regarding the unredirected simplified entries: I realised some of my traditional forms were incorrect. I believe I removed most (but not quite all) of them in my new version of the file (it has about 400 entries), here: User:OrphicBot/Simplified Chinese Entries without Redirects - incorrect reductions removed. I simply looked for Chinese words with simplified spellings which lacked the the
{{zh-see}}
template. I made some incorrect substitutions in forming the traditional forms, however, simply out of carelessness. (I made a random substitution rather that looking for correct ones). I could probably make a fully correct list if it were of any help to you -- but I suspect that Wyangbot must be more mature in this task already. I originally made the list to see if I would be adding{{also}}
templates to any simplified words which did not have{{zh-see}}
templates-- that is, how much value my templates were creating. It seems as though at most 1-2% of Chinese words that should have{{zh-see}}
are missing them, so the benefit exists, but it is small. Almost none of the errors in the list are in the also templates, because in making the original list I simply failed to reject spurious forms in favour of correct ones, even when the latter existed.
- Regarding the unredirected simplified entries: I realised some of my traditional forms were incorrect. I believe I removed most (but not quite all) of them in my new version of the file (it has about 400 entries), here: User:OrphicBot/Simplified Chinese Entries without Redirects - incorrect reductions removed. I simply looked for Chinese words with simplified spellings which lacked the the
- My substitutions were based on this list: User:OrphicBot/hanzi_s2t.txt, which is treated simply as another equivalence table, and based on these four pages:
- * zh:简化字总表/第一表
- * zh:简化字总表/第二表
- * zh:简化字总表/第三表
- * wikisource:第一批异体字整理表
- If you feel my amateur work on Chinese entries is creating more work for other editors than it is worth, I will understand; perhaps I should not have been trying to add to the Chinese equivalence list. Isomorphyc (talk) 16:49, 18 September 2016 (UTC)
- This work is what has been absent for years. It was discussed previously, although no one really got around to doing it, so thanks very much for taking it on. Pinging @Hongthay as you may be interested too. The format change was a joint effort and it is still ongoing; User:Suzukaze-c is doing most of the work on Hanzi entries. With regard to
{{also}}
, I don't really think it is a problem to apply the template to all Chinese and Japanese entries, since it's better to formulate and apply a formatting rule than having inconsistencies. There are a lot of Hanzi entries which use the{{also}}
template to list variants of the title already, such as 晩餐 ~ 晚餐 and 會議, and the also-links are beneficial. - I see what the list of undirected entries means now. The complexity is that there are many multiple-to-one traditional-to-simplified correspondences (e.g. simplified 了 corresponds to traditional 了 and 瞭), and one of the traditional words may be the same as the simplified word, thus it is not always the case that a simplified page title requires a
{{zh-see}}
template. There are many valid entries in that list though which require attention, and thanks for generating it. I converted it into a list so that they can be individually checked. Wyang (talk) 01:56, 19 September 2016 (UTC)
- This work is what has been absent for years. It was discussed previously, although no one really got around to doing it, so thanks very much for taking it on. Pinging @Hongthay as you may be interested too. The format change was a joint effort and it is still ongoing; User:Suzukaze-c is doing most of the work on Hanzi entries. With regard to
Latin references
[edit]This is an odd edit; it seems that your bot didn't recognise the case distinction and added references that belong at the (currently nonexistent) entry Nasi to nasi. Is there a way to prevent this from happening, so that we don't get spurious references? —Μετάknowledgediscuss/deeds 23:42, 17 September 2016 (UTC)
- @Metaknowledge: I like to put links to known lemmas in non-lemmas entries-- the results look a bit strange but I think it is ultimately a good idea. But several resources use mono-case orthography. In these cases, I link both forms. However, it probably makes more sense to link only the uppercase forms for
{{R:Smith's Persons}}
,{{R:Smith's Geography}}
, perhaps{{R:Smith's Antiquities}}
, probably{{R:PersEnc}}
and{{R:Stillwell}}
, and possibly{{R:Peck}}
. But this dual linking should likely be retained for{{R:du Cange}}
, where there is quite a mixture of proper and common entries which cannot be told apart. What do you think? These would be easy changes to make. I admit this behaviour never bothered me very much, because so many of these reference links would not exist otherwise. But clearly the correct solution to this latter problem is importing stubs. Isomorphyc (talk) 00:57, 18 September 2016 (UTC)- That makes sense. It's not a terribly bad thing, but it's also not good either (in my book). Your proposed changes would be sufficient for me. —Μετάknowledgediscuss/deeds 01:16, 18 September 2016 (UTC)
- @Metaknowledge: I would like to propose something slightly milder than what I proposed above-- but it might be what I implied. I would like to delete the mainly-proper reference entries for non-capitalised, non-lemma forms, but not for lemma forms. For example, Liberator is an epithet of Zeus, but I propose it continue to be linked under liberator, even though as an epithet it is proper. My contention is that when non-proper lemma forms are also proper lemma forms, the association is usually relevant. A bit of sampling suggests this is very much more than often the case in these works. Here are the two lists:
- * User:OrphicBot/Sandbox2 (725 non-lemma forms - references to be deleted)
- * User:OrphicBot/Sandbox3 (925 lemma forms - likely proper references to be retained)
- What do you think? I'll go ahead with these if it looks decent, and we could do another round if either of us sees anything else with a deterministically removable pattern. Isomorphyc (talk) 20:12, 18 September 2016 (UTC)
- I'd forgotten about that. That's certainly appropriate. —Μετάknowledgediscuss/deeds 20:25, 18 September 2016 (UTC)
- @Metaknowledge: Here is the edit log: User:OrphicBot/EditLogs/18September2016 Latin Proper Name Homonym Reference Removals. There were 690 changes, in addition to thirty-five null changes due to the presence of more ambiguously non-proper references. In the case of nasi, I think the reason the null change was registered is that you may have left a triple newline in your edit, which is perfectly harmless and acceptable from non-robot editors, but contrary to WT:NORM. I will look over these 35 items manually (edit: they are at User:OrphicBot/Sandbox4). Isomorphyc (talk) 21:48, 18 September 2016 (UTC)
- I'd forgotten about that. That's certainly appropriate. —Μετάknowledgediscuss/deeds 20:25, 18 September 2016 (UTC)
- That makes sense. It's not a terribly bad thing, but it's also not good either (in my book). Your proposed changes would be sufficient for me. —Μετάknowledgediscuss/deeds 01:16, 18 September 2016 (UTC)
Unfortunate error
[edit]See god förmiddag (diff) unfortunately you forgot to include the redirect {{see also}}
. Might have been prudent to bypass that redirect before doing this run. Hard lines, not like I've never done it. Renard Migrant (talk) 23:43, 17 September 2016 (UTC)
- @Renard Migrant: Thank you! I never noticed
{{see also}}
before. I will look through all of my old edits to repair these. Just so you know, I normally respond faster if you write on my talk page rather than OrphicBot's. But writing on OrphicBot's talk page will stop on-going automatic edits within about one second. I hope you don't mind, but I moved your message to make this conversation easier for me to track. Isomorphyc (talk) 21:30, 18 September 2016 (UTC)
- Thank you for creating the clean-up category. I have fixed all of the items. The edit log is here: User:OrphicBot/EditLogs/18September2016/Multiple Also Templates Fixed. Looking over your changes, I realise I may have also perpetuated a misusage of error in which users enumerate synonyms with this template. I will check for this soon. Isomorphyc (talk) 01:19, 19 September 2016 (UTC)
Overkill
[edit]This bot is adding numerous pointless instances of {{also}}
to entries for terms having simple alternative forms (which are often already listed in the "Alternative forms" sections of those entries). This creates redundant information in the entries and belabours the obvious. See, for example, the history for lap dog or trade wind from which I have now removed {{also}}
. -- · (talk) 07:29, 18 September 2016 (UTC)
- @Talking Point: I hope you don't mind that I have moved your note from OrphicBot's talk page to mine-- posting there will stop on-going automatic edits within about one second, but I might also take longer to notice, since I don't usually check there. I don't disagree with you about the value of the template for lapdog, but I think the cumulative value in consistency outweighs the clutter cost of the template.
{{also}}
is orthographic and multilingual while{{alternative form of}}
is lexical and intra-lingual.{{also}}
tells by omission what entries in other languages do not exist, which may even be useful to a visitor to lapdog. Isomorphyc (talk) 21:30, 18 September 2016 (UTC)
- It's longstanding practice that see alsos are included regardless of the content of the page, and alternative forms are included regardless of the see alsos. --WikiTiki89 01:37, 19 September 2016 (UTC)
- Longstanding doesn't make it right. Redundancy and clutter in the interest of consistency is still redundancy and clutter. -- · (talk) 02:22, 19 September 2016 (UTC)
- It doesn't matter what's right, it matters what there is a consensus for, and this has been discussed many times. There's reason for it, too. The see alsos are language-independent, while anything within the language section is language-dependent. Removing things from see also will prevent people from seeing it if they weren't look for the language that happens to appear in the entry. This doesn't exactly apply to "lap dog" and "trade wind" because it's very unlikely that the see alsos will point to anything in any other language, but still I think the consistency is important. Redundancy is not a bad thing in and of itself. And as for clutter, I really don't think the see also creates too much clutter. --WikiTiki89 09:55, 19 September 2016 (UTC)
- More to the point, if we want a bot to do this, we need to have rules that allow a bot to do it. Excluding alternative forms is going to make it way harder on a bot. —CodeCat 12:12, 19 September 2016 (UTC)
- It doesn't matter what's right, it matters what there is a consensus for, and this has been discussed many times. There's reason for it, too. The see alsos are language-independent, while anything within the language section is language-dependent. Removing things from see also will prevent people from seeing it if they weren't look for the language that happens to appear in the entry. This doesn't exactly apply to "lap dog" and "trade wind" because it's very unlikely that the see alsos will point to anything in any other language, but still I think the consistency is important. Redundancy is not a bad thing in and of itself. And as for clutter, I really don't think the see also creates too much clutter. --WikiTiki89 09:55, 19 September 2016 (UTC)
Some omissions from the recent bot run
[edit]I noticed your bot added {{also}}
to вешание and вешать, but not in the reverse direction to вещание and вещать. Is this an error, or are you simply not finished with the bot runs? --WikiTiki89 20:07, 20 September 2016 (UTC)
- @Wikitiki89: Thanks -- I had missed 868 of them by accident. I was about to commit a second batch, but those were all supposed to be Hanzi. Isomorphyc (talk) 21:31, 20 September 2016 (UTC)
- I noticed some more omissions: Азія and Азия don't link to each other. --WikiTiki89 18:38, 21 September 2016 (UTC)
- @Wikitiki89: I apparently made a change to the code which never saved. I would have caught it, but it was using an old check for equivalence class existence from when the tree was shallower. (It was omitting entries whose equivalence class names were not themselves titles, from when title names were stored where class names are now stored). Thanks for finding this. The logs are here: User:OrphicBot/EditLogs/21September2016 UpdateAlso - Missed Items (non-title family names) - 0 and User:OrphicBot/EditLogs/21September2016 UpdateAlso - Missed Items (non-title family names) - 1 (23769 items total). Isomorphyc (talk) 21:38, 21 September 2016 (UTC)
- Still more: Эстония/Эстонія and Естония/Естонія should be linking to each other. --WikiTiki89 15:50, 22 September 2016 (UTC)
- @Wikitiki89: Thanks for double checking all of this. The reason small batches are still trickling in is that I am still taking out some overprotective safeguards while verifying the underlying assumptions were indeed overprotective. Part of the reason for this complexity is that I didn't know initially how feature-rich the equivalences file would have to be, so I am only adding back some unifications now. In this case it is between capital and lowercase letters which have undergone equivalence transformations. The edit log (2612 items) is here: User:OrphicBot/EditLogs/22September2016 UpdateAlso - unifying some lettercase forms under equivalences. I really appreciate your on-going involvement in these changes. Isomorphyc (talk) 17:07, 22 September 2016 (UTC)
- Edit: for some of the others in this iteration, there are a lot of 7s and 8s without equivalence-class capitals being added because some of the class-size ambiguity also shrank, and I am not editing anything currently which may have more than eight items and require an appendix. Isomorphyc (talk) 17:12, 22 September 2016 (UTC)
- @Wikitiki89 : In a semi-related matter, I don't think the use of categories and appendices in these two entries' also sections is correct: -eg- and -et-, but I'm not sure how to reformat them. What do you think? Isomorphyc (talk) 18:58, 22 September 2016 (UTC)
Wider Latin Equivalences
[edit]Hi @Wikitiki89, Metaknowledge, CodeCat: Could I please get your thoughts if you have any improvements for my `Wider Latin' list of equivalences? They are the first commented out set of equivalences here: User:OrphicBot/equivalences.txt. Mostly these concern the North Germanic languages, and the changes they generate are here User:OrphicBot/Proposed Wider Latin Equivalences Changes 0 and : User:OrphicBot/Proposed Wider Latin Equivalences Changes 1. I have a few specific concerns:
- Is it best to reduce eth and thorn both to [th] and [dh]? Or is just [th] better?
- I feel I have no choice but to reduce the æ grapheme to both a and e, in addition to ae, because it is usually equivalent to a in Old English, but often to e in words derived from Latin. Does this seem okay, or should I omit reducing the ae and oe entirely? Is there a better option?
- Yogh reduces to a lot of things; is it worthwhile to reduce to gh, which I believe is likely the most common?
- Please feel free to add any other reductions you would like. Diacritics are allowed, and they increase the precision of a reduction, but if the only difference between two words is diacritical then no reduction is necessary.
I'll go ahead and skip these if they seem to add too much clutter, but I'm mildly leaning to keep them otherwise.
Thanks! Isomorphyc (talk) 01:49, 21 September 2016 (UTC)
- I think:
- d = ð
- þ = ð = th
- þ = y
- y = ȝ
- ȝ = g
- ȝ = gh
- æ = ae (only)
- œ = oe (only)
- I don't think these will add too much clutter in practice, but we should take a look at your changes file before you "commit" them. --WikiTiki89 02:13, 21 September 2016 (UTC)
- @Wikitiki89: I am glad you like my choice of verb. I agree about ae/oe/dh, and y/þ and the yogh variants are good additions. I will go with your version directly unless there are any changes. I re-notated it here: User:OrphicBot/Wider Latin Equivalences Version V.txt and the resulting changes are here: User:OrphicBot/Proposed Wider Latin Equivalences Changes - Version II. This is an improvement in that there are 2697 changes rather than circa fourteen thousand, many of the latter of which were noisy. Isomorphyc (talk) 12:14, 21 September 2016 (UTC)
- Edit: did you deliberately exclude beta/ss? Isomorphyc (talk) 12:27, 21 September 2016 (UTC)
- No, that was an accidental omission, we should definitely have ß = ss. Also, we might want y = ij. And perhaps ä = ae, ö = oe, and ü = ue. --WikiTiki89 13:09, 21 September 2016 (UTC)
- @Wikitiki89: This produces 6636 changes, including the 3853 from yogh, eth, thorn, beta, etc; hence only 2783 are new and due to the ij/digraph changes. Some are a little bit borderline, but very many are valuable. The changes are here: User:OrphicBot/Proposed Wider Latin Equivalences Changes - Version IIII (with y/ij, and umlaut/digraph). Note that running these equivalences in the opposite direction from which I ran them produces about sixty-thousand changes, mostly not good, due to the transitivity of class membership within differences of diacritical marks. Isomorphyc (talk) 14:56, 21 September 2016 (UTC)
- It seems there are more undesirable changes with y = ij than I would have liked. Also, we should have w = ƿ and å = aa. --WikiTiki89 15:42, 21 September 2016 (UTC)
- The ƿ and å changes look good. (They add 385 new items net, but appear in 4353 entries total; you can see them here: User:OrphicBot/Proposed Wider Latin Equivalences Changes - Version V (with a-ring and wynn)). I will go ahead and commit these. Please feel free to continue to add to equivalences.txt. You can include comments if you have specific concerns with any new additions; for the time being I'll continue to curate changes in general so that nothing absurd or too broad will be added automatically. Isomorphyc (talk) 16:23, 21 September 2016 (UTC)
- Edit: You can also put new changes in User:OrphicBot/equivalences_sandbox.txt. I will get it to produce non-committed edit-log files automatically, though not in real time. Isomorphyc (talk) 17:16, 21 September 2016 (UTC)
- It seems there are more undesirable changes with y = ij than I would have liked. Also, we should have w = ƿ and å = aa. --WikiTiki89 15:42, 21 September 2016 (UTC)
- @Wikitiki89: This produces 6636 changes, including the 3853 from yogh, eth, thorn, beta, etc; hence only 2783 are new and due to the ij/digraph changes. Some are a little bit borderline, but very many are valuable. The changes are here: User:OrphicBot/Proposed Wider Latin Equivalences Changes - Version IIII (with y/ij, and umlaut/digraph). Note that running these equivalences in the opposite direction from which I ran them produces about sixty-thousand changes, mostly not good, due to the transitivity of class membership within differences of diacritical marks. Isomorphyc (talk) 14:56, 21 September 2016 (UTC)
- No, that was an accidental omission, we should definitely have ß = ss. Also, we might want y = ij. And perhaps ä = ae, ö = oe, and ü = ue. --WikiTiki89 13:09, 21 September 2016 (UTC)
Hi. User:OrphicBot seems to be putting redirect pages into {{also}}
, which I think shouldn't be done. For example, at Dì-èrcì Shìjiè Dàzhàn, Dì'èrcì Shìjiè Dàzhàn was placed into also, but it actually redirects to the page itself. — justin(r)leung { (t...) | c=› } 02:54, 23 September 2016 (UTC)
- To clarify, Justinrleung is talking about soft redirects. I think we should have see-also links to soft redirects. --WikiTiki89 03:36, 23 September 2016 (UTC)
- @Justinrleung, Wikitiki89: I'm sorry, I don't see any redirects, soft or hard, on either of those pages in the recent history. They both seem to link independently to the simplified and traditional Hanzi forms. Am I misreading? Not understanding the example, a priori I think soft redirects probably ought to get also templates because it informs the reader another valid form exists. In this case especially it is non-trivial because there is no alternative-form template. I'm certainly willing to make changes if this is widely contentious, however. Isomorphyc (talk) 13:18, 23 September 2016 (UTC)
Actually, I was talking about a hard redirect on the pinyin page Dì'èrcì Shìjiè Dàzhàn to Dì-èrcì Shìjiè Dàzhàn. I've reverted the edits by OrphicBot twice. I'm fine with see-also links to soft redirect.— justin(r)leung { (t...) | c=› } 13:22, 23 September 2016 (UTC)
- Never mind, I messed up. I thought Dì'èrcì Shìjiè Dàzhàn was redirecting to Dì-èrcì Shìjiè Dàzhàn. — justin(r)leung { (t...) | c=› } 13:25, 23 September 2016 (UTC)
- Oh I see, you probably thought the edit was at dì'èrcì shìjiè dàzhàn. --WikiTiki89 13:31, 23 September 2016 (UTC)
- @Justinrleung, Wikitiki89:Thanks for the clarifications-- my wikitext parsing code rejects those kinds of pages automatically because they do not parse as wikitext. Isomorphyc (talk) 13:44, 23 September 2016 (UTC)
Which of these things is not like the other?
[edit]havin', bihar and havîn. More importantly, why did OrphicBot think they were the same? Chuck Entz (talk) 06:11, 1 October 2016 (UTC)
- @Chuck Entz: The reason is that
{{also}}
was incorrectly used at havîn for the synonym bihar. My earliest version attempted to enforce a transitive relationship amongst both computed and user-added equivalences. I realised quickly that there are too many idiosyncrasies and errors in the user data to do this. There are still a certain number of the original transitive relationships which I haven't removed yet because I've been slightly undecided about whether I should be removing the original clearly erroneous{{also}}
arguments in addition to their copies. I've been leaning in that direction, though I'm not quite there yet. Isomorphyc (talk) 07:36, 1 October 2016 (UTC)
Addition of also template by OrphicBot
[edit]Some of the edits by the bot are somewhat stupid, like [5]. Note that dots in Arabic and Arabic-derived scripts are not simply diacritics, so ت and ب are distinct letters, and they are located far from each other on standard keyboards, this means they may be similar in some ways but they are not mistaken for each other often, compare Q and O. ک and گ is a different issue though. --Z 20:24, 5 October 2016 (UTC)
- @ZxxZxxZ: Thanks for pointing this out. This is not a default behaviour but a result of the third and seventh lines in the Arabic section of the equivalences file: User:OrphicBot/equivalences.txt. @Wikitiki89, is this an unwanted side-effect which I should avoid in some cases? Isomorphyc (talk) 22:58, 5 October 2016 (UTC)
- @ZxxZxxZ, Isomorphyc: In my opinion,
{{also}}
should be used for any characters that are graphically similar, regardless whether the differences between them are considered "diacritics" or not. I realize that ت and ب are completely different letters, but if someone had forgotten to dot one of the letters or the dots are illegibile or for whatever other reason, they can easily be confused. --WikiTiki89 14:10, 6 October 2016 (UTC)
- @ZxxZxxZ, Isomorphyc: In my opinion,
- I think this is undesired. In many entries, it actually causes confusion. Dot is an essential part of this script and it is not likely that someone who works with the script forget it. --Z 21:39, 9 February 2017 (UTC)
- I support showing these variants in
{{also}}
. Whether they are an essential part of the script is not relevant, not everyone who has an interest in Arabic entries has a good knowledge of the script. For comparison, A and Ä are totally different letters in Finnish, but we show them under{{also}}
as well. Same for E and Ė, which are totally different in Lithuanian. —CodeCat 21:46, 9 February 2017 (UTC)- I think this is a mistake and needs to be undone. If you take the point about 'dots' then you could similarly make the point that 'dad' and 'bad' should be linked with a 'see also' template because they are just round blobs with sticks coming out of them. There was an established way of dealing with the 'see also' section that made sense and was useful. Having so many 'see also' sections makes that section nearly useless, in reality. Kaixinguo~enwiktionary (talk) 13:07, 10 February 2017 (UTC)
- That's not a good comparison. You are possibly neglecting the fact that Arabic script, unlike Latin and even all other Semitic scripts, is unique in this case, those dots are what actually define most of the letters. What you proposed also applies to υ and ν, or ו and ז (Hebrew). Anyway, I completely oppose any more edit for Arabic-script entries by the current code of OrphicBot, as it causes cases which are confusing instead of being helpful.[6][7] --Z 14:30, 10 February 2017 (UTC)
- It's tricky. People who have a good knowledge of the Arabic alphabet are definitely going to be confused by this. I have been several times, stumbling upon them, thinking "what is the relation between these two"? On the other hand, our users are quite likely not to have a good knowledge of the Arabic alphabet, and they might find this helpful now and then. Still, I agree that the way it's working now isn't right. Having بیژن (Bižan) pop up at ثئرن (ṯuʾirna) is ridiculous. Is there maybe a way to reduce this to cases where only one letter is different? Kolmiel (talk) 21:12, 10 February 2017 (UTC)
- Thanks for all of this clarification. As I understand it, i'jam pointing was added to the rasm script during the latter half of the Abbasid Caliphate. Prior to this, five to eight contemporary letters, depending on word position, were disambiguated by context. Although the i'jam are not considered diacritical marks, a few are indeed omitted today in certain regional script variants, including in Egypt and Iran, in certain word positions. Hence, there are two modest arguments for retaining the cross-links. Conversely, the cross-links are almost never helpful to users with even the most minimal Arabic proficiency (even CEFR A1), and they are annoying to users with more significant competency. The only use I can see for the links are for editors working in trans-lingual sections, such as etymologies, for example in Spanish, who will not have any familiarity with the script. I have found myself in this position occasionally, but generally very rarely. My sense is that the balance of argument as I understand it is not in favour of retaining these links. @Wikitiki89 is there a stronger use case which I am omitting to appreciate? I do not wholly accept the comparison with Finnish and Lithuanian because trans-lingual links can only treat orthography consistently in a language agnostic way. I do apologise that I have been away for some time without notice; some pleasant but time-consuming family matters have kept me away for several months, and much longer than I would have expected; but I will return regularly in the coming week or two. Isomorphyc (talk) 21:13, 11 February 2017 (UTC)
- Note that we all will (I think I can say that) be in favour of keeping links in those cases where signs may be omitted. This would be hamza, final yā’, and maybe some other things. These are valid and useful. Kolmiel (talk) 01:11, 12 February 2017 (UTC)
- Also, less importantly: You're right that in the earliest times the script was written without all the distinguishing dots, but it was hardly usable in that form to any purpose other than recording a text with which one was already acquainted (through oral transmission). Modern readers would be entirely unable to read a book that way. It could be like a riddle, rhguloy lkie tihs tnihg mybae. Kolmiel (talk) 01:24, 12 February 2017 (UTC)
- I've seen many cases in handwriting of native speakers of Arabic that the dots were either omitted by accident on a particular letter or were simply illegible (and I'm not referring to final ي or ة or anything like that). This could provide a way to disambiguate these cases. But perhaps that is not a strong enough argument, so I will concede if the natives don't find that necessary. @Kolmiel: Was the dotless script not used for letters and things even back then? I'm sure people could read it if they were used to it. "Modern readers" simply aren't used to it. --WikiTiki89 14:59, 13 February 2017 (UTC)
- As you will be aware, the problem is the letters that the Persians call "tooth letters", namely initial or internal ي ن ث ت ب (y n ṯ t b). If it weren't for these, it would be simple. But consider that you are unable to distinguish any of taktub, yaktub, naktub, for example. And that any of them could also be takbit, yakbit, nakbit (to subdue). I guess they were able to read it, but the degree of confusion when reading something you didn't already know must have been great. A mixed system of diacritics was developped within the 7th century. I'd like to know how long people wrote letters without any diacritics. (Quran texts are not a good point of reference because there's always the issue of "interfering in the holy text".) Kolmiel (talk) 16:53, 13 February 2017 (UTC)
- You are probably familiar with the story Augustine relates in the Confessions and at his surprise at how Ambrose would read without moving his lips. I have imagined that before the printing press, handwritten documents were precious enough that people could afford the time to read them at a vocalised rate, perhaps 60-100 words per minute, rather than 300-400 as is very common today. I am not sure that the lack of i'jam pointing is qualitatively very different from the lack of spacing, punctuation, capitalisation, or in Greek, diacritical marks which are common in older manuscripts. I feel a six-way letter ambiguity every two or three words won't make a text materially more difficult to read in language one knows well. But adding this to archaic diction and script possibly in a foreign language, the net effect I think is very formidable. Claude Shannon estimates in The Mathematical Theory of Communication that a given letter contributes about one binary bit of information in a Markov model built on letters, whereas in a Markov model built on words, each word contributes something like 1.5 bits of new information. Given five letters per word, this implies a somewhat remarkable 3x redundancy in the spelling system (he was using English in his example). This is why in modern English reading I think this type of ambiguity would be quite easy to ignore almost unconsciously.
- I will try in the coming week or two to write some productions for my equivalences file which will handle this rule and respect the relevant exceptions. But I may ask for help testing some examples if I am unable to draw a clear line. Isomorphyc (talk) 23:55, 14 February 2017 (UTC)
- The thing is that your calculations refer to languages with full vocalization, digraphs, etc. Arabic already leaves out gemination, all short vowels and back then also a good deal of long vowels. Even in its present form, the script is quite defective. While this is fine, it is not unusual that individual words must be vocalized in modern Arabic texts for clarification. Now adding to this a five-way ambiguity of consonant letters, makes the script a riddle. Wikitiki's objection was that they may have been able to solve this riddle. I agree with some reserve, but the problem cannot be compared to a lack of spacing, punctuation, capitalisation, and diacritical marks. This is roughly what the Arabic script has already. The consonant ambiguity undoubtably makes reading much more difficult than that, and equally undoutably leads to a much greater amount of ambiguity. Kolmiel (talk) 15:13, 15 February 2017 (UTC)
- As you will be aware, the problem is the letters that the Persians call "tooth letters", namely initial or internal ي ن ث ت ب (y n ṯ t b). If it weren't for these, it would be simple. But consider that you are unable to distinguish any of taktub, yaktub, naktub, for example. And that any of them could also be takbit, yakbit, nakbit (to subdue). I guess they were able to read it, but the degree of confusion when reading something you didn't already know must have been great. A mixed system of diacritics was developped within the 7th century. I'd like to know how long people wrote letters without any diacritics. (Quran texts are not a good point of reference because there's always the issue of "interfering in the holy text".) Kolmiel (talk) 16:53, 13 February 2017 (UTC)
- I've seen many cases in handwriting of native speakers of Arabic that the dots were either omitted by accident on a particular letter or were simply illegible (and I'm not referring to final ي or ة or anything like that). This could provide a way to disambiguate these cases. But perhaps that is not a strong enough argument, so I will concede if the natives don't find that necessary. @Kolmiel: Was the dotless script not used for letters and things even back then? I'm sure people could read it if they were used to it. "Modern readers" simply aren't used to it. --WikiTiki89 14:59, 13 February 2017 (UTC)
- Thanks for all of this clarification. As I understand it, i'jam pointing was added to the rasm script during the latter half of the Abbasid Caliphate. Prior to this, five to eight contemporary letters, depending on word position, were disambiguated by context. Although the i'jam are not considered diacritical marks, a few are indeed omitted today in certain regional script variants, including in Egypt and Iran, in certain word positions. Hence, there are two modest arguments for retaining the cross-links. Conversely, the cross-links are almost never helpful to users with even the most minimal Arabic proficiency (even CEFR A1), and they are annoying to users with more significant competency. The only use I can see for the links are for editors working in trans-lingual sections, such as etymologies, for example in Spanish, who will not have any familiarity with the script. I have found myself in this position occasionally, but generally very rarely. My sense is that the balance of argument as I understand it is not in favour of retaining these links. @Wikitiki89 is there a stronger use case which I am omitting to appreciate? I do not wholly accept the comparison with Finnish and Lithuanian because trans-lingual links can only treat orthography consistently in a language agnostic way. I do apologise that I have been away for some time without notice; some pleasant but time-consuming family matters have kept me away for several months, and much longer than I would have expected; but I will return regularly in the coming week or two. Isomorphyc (talk) 21:13, 11 February 2017 (UTC)
- It's tricky. People who have a good knowledge of the Arabic alphabet are definitely going to be confused by this. I have been several times, stumbling upon them, thinking "what is the relation between these two"? On the other hand, our users are quite likely not to have a good knowledge of the Arabic alphabet, and they might find this helpful now and then. Still, I agree that the way it's working now isn't right. Having بیژن (Bižan) pop up at ثئرن (ṯuʾirna) is ridiculous. Is there maybe a way to reduce this to cases where only one letter is different? Kolmiel (talk) 21:12, 10 February 2017 (UTC)
- I support showing these variants in
- I think this is undesired. In many entries, it actually causes confusion. Dot is an essential part of this script and it is not likely that someone who works with the script forget it. --Z 21:39, 9 February 2017 (UTC)
For whatever reason, Orphicbot didn't add neal and Neal to the top of néal. Was it only adding {{also}}
to undiacritic forms, or to all forms? —Aɴɢʀ (talk) 19:02, 24 October 2016 (UTC)
- @Angr: The reason is that I have been perhaps excessively conservative about wikitext parsing, partly because I know I haven't codified all of the template conventions that affect the wikitext tree. In this case it did not recognise the
{{topics}}
template just before the interwiki links as a category template, so it rejected the whole page. As a safety precaution, I do not edit pages which have parsing issues. It turns out there were 2016 of these, listed here: User:OrphicBot/EditLogs/22September2016_UpdateAlso_Error_Log.
- In this case I need to add
{{topics}}
as a valid way to declare categories. But in general I could edit the erroneous pages anyway, since the parsing errors do not in most cases affect the{{also}}
templates. Since this was about 0.25% of the total number of pages, I've prioritised them below the real-time client and on the variations appendices for words with more than eight also-entries, since in most cases the correct way to do this is probably to fix the bad wikitext. Isomorphyc (talk) 19:47, 24 October 2016 (UTC)- OK, I'll add it manually then. —Aɴɢʀ (talk) 20:15, 24 October 2016 (UTC)
Spanish inflected forms
[edit]The Spanish inflection templates are very dumb. You need to add the plurals manually if they aren't simply -s (frior). DTLHS (talk) 17:30, 27 October 2016 (UTC)
- @DTLHS: I've been slowly learning what to expect from the Spanish templates; I will remember this. Thank you for all of the morphology creation and linking!
- @DTLHS, Derrib9: Why don't we make them less dumb? —Μετάknowledgediscuss/deeds 22:44, 27 October 2016 (UTC)
- I'd recommend using Module:ca-headword as a start, then. —CodeCat 23:00, 27 October 2016 (UTC)
- Yes, I'll probably get around to it eventually. DTLHS (talk) 23:34, 27 October 2016 (UTC)
- I'd recommend using Module:ca-headword as a start, then. —CodeCat 23:00, 27 October 2016 (UTC)
- @Derrib9 Did you really want a robot to make the Spanish enclitics, as you suggested in an edit comment? Isomorphyc (talk) 23:30, 27 October 2016 (UTC)
- I just wrote some code to do this actually- see Special:Contributions/NadandoBot. DTLHS (talk) 23:34, 27 October 2016 (UTC)
- @DTLHS: Very nice. Are you just about ready with the Italian ones too, or shall I go ahead? Isomorphyc (talk)
- No, I had no plans to do anything with Italian. DTLHS (talk) 23:40, 27 October 2016 (UTC)
- I wouldn't have assumed, but since they are homologous I felt I should ask, in case you were already planning to use the same code for both. Isomorphyc (talk) 00:42, 28 October 2016 (UTC)
- I rewrote the Spanish conjugation templates to make it easy to run bots- just give them the "json" parameter and they will generate all the form titles and form code in a convenient JSON object. Italian would need some work on the back end for that to be possible I think. DTLHS (talk) 00:49, 28 October 2016 (UTC)
- @DTLHS: I was actually just about to do almost the same thing for the Latin conjugation modules to make it easier to generate per-lemma search engine arguments. For Italian I'll probably just use my own stemmer to do it on Wiktionary's lemmas. I should probably do both the way you are doing it, but I have never worked much on the Lua stemmers, so I am taking small steps for now. Isomorphyc (talk) 03:34, 28 October 2016 (UTC)
- First, awesome botting, DTLHS! Secondly, the Spanish templates aren't that dumb. Also, I don't know enough about Modules to help with de-dumbifying them. --Derrib9 (talk) 13:25, 28 October 2016 (UTC)
lacca
[edit]You have a source that Italian lacca derives instead from a Latin word ? Can you please share ? Leasnam (talk) 00:48, 28 October 2016 (UTC)
- @Leasnam: There is some disagreement about this. Please see: [garzanti]. Note [Pianigiani] somewhat confusingly marks this sense as neo-celt., but also identifies it as a doublet with lago. [Treccani] derives it purely from OHG. My interpretation was that this is clearly the same Germanic/Latin conflation in a number of languages. One could drive it either way, with influence from the other, but Latin made a little bit more sense to me. We could reverse the direction of the influence if you disagree with me (i.e., derived from OHG with influence from Latin, or something in between); or I could try to put Italian back into the *lakō tree as an OHG influence. What do you think? Isomorphyc (talk) 02:24, 28 October 2016 (UTC)
- I wouldn't at all have any objection to showing a partial or influenced derivation at *lakō, not at all Leasnam (talk) 02:52, 28 October 2016 (UTC)
- @Leasnam: Done. Isomorphyc (talk) 03:04, 28 October 2016 (UTC)
- I wouldn't at all have any objection to showing a partial or influenced derivation at *lakō, not at all Leasnam (talk) 02:52, 28 October 2016 (UTC)
Datatable modules in uncategorized modules
[edit]Your and your bot's accounts have bloated this category rendering it useless. Make a dedicated category for them and have Category:Data modules as their parent cat. Thanks.--Giorgi Eufshi (talk) 06:15, 31 October 2016 (UTC)
Well, in order not to create thousands of documentation pages for categorization purposes only, I have updated Module:documentation so that predefined datamodules can be automatically categorized. Make up a more reasonable name for the category of your modules than "User:Isomorphyc's datatables" and then update the Module:documentation on the 17th line. --Giorgi Eufshi (talk) 07:34, 31 October 2016 (UTC)
- @Giorgi Eufshi: I changed the name to `Reference module sharded data tables;' thank you so much for your work with this. Isomorphyc (talk) 12:38, 31 October 2016 (UTC)
OrphicBot's "See also" links
[edit]Next time OrphicBot does a run, could it be programmed to recognize "ɛ" and "ɔ" as variants of "e" and "o"? This would put bɔgɔ in the bogo family, for example. —Aɴɢʀ (talk) 13:10, 11 November 2016 (UTC)
- @Angr: Done. There were 272 changes: User:OrphicBot/EditLogs/11November2016 - ɛ/e and ɔ/o equivalences. Also, please feel free to edit User:OrphicBot/equivalences.txt directly if you would like; it is the result of quite a few peoples' contributions. The file is automatically read when the robot starts. Isomorphyc (talk) 14:27, 11 November 2016 (UTC)
- Out of curiosity, how did you categorize those modules into this category?
- Please deal with the typo in the category name.
- Please deal with the yet un-created category.
--kc_kennylau (talk) 15:51, 11 November 2016 (UTC)
- @Kc kennylau: Please see User_talk:Isomorphyc#Datatable_modules_in_uncategorized_modules. The categorisation is due to a new name-matching feature implemented by another user for the purpose of categorising these modules. I will create the category, but unfortunately I fail the see the typo, as sometimes happens with typos one makes oneself. Could you please point it out to me? Sorry if this is a bit dense. Isomorphyc (talk) 15:58, 11 November 2016 (UTC)
- Do you have any idea where the name-matching code is located at? The typo is sharded > shared. --kc_kennylau (talk) 16:00, 11 November 2016 (UTC)
- @Kc kennylau: Please see lines 17 and 87 of Module:documentation. I actually meant sharded; the reason is that the modules represent a small number of tables organised into a shard set according to a hash key for reduced memory usage. Isomorphyc (talk) 16:03, 11 November 2016 (UTC)
- Thanks, that is useful. Also, sorry for the false positive typo. --kc_kennylau (talk) 16:05, 11 November 2016 (UTC)
- @Kc kennylau: No need to apologise; I appreciate your looking out for these things. Since the modules are more obviously shared than sharded it is a natural assumption, also. Isomorphyc (talk) 16:11, 11 November 2016 (UTC)
- Thank you for creating the category. --kc_kennylau (talk) 16:12, 11 November 2016 (UTC)
- @Kc kennylau: No need to apologise; I appreciate your looking out for these things. Since the modules are more obviously shared than sharded it is a natural assumption, also. Isomorphyc (talk) 16:11, 11 November 2016 (UTC)
- Thanks, that is useful. Also, sorry for the false positive typo. --kc_kennylau (talk) 16:05, 11 November 2016 (UTC)
- @Kc kennylau: Please see lines 17 and 87 of Module:documentation. I actually meant sharded; the reason is that the modules represent a small number of tables organised into a shard set according to a hash key for reduced memory usage. Isomorphyc (talk) 16:03, 11 November 2016 (UTC)
- Do you have any idea where the name-matching code is located at? The typo is sharded > shared. --kc_kennylau (talk) 16:00, 11 November 2016 (UTC)
"also" and "character info/new"
[edit]I don't know if I need to say it, but it may be a good idea anyway, because you are dealing with {{also}}
in entries.
Some entries, like ѿ, contain both {{also}}
and {{character info/new}}
. If you are going to use OrphicBot to add {{also}}
in an entry that already has {{character info/new}}
, then please (if it's not a lot of trouble) add it below the {{character info/new}}
. This is correct:
{{character info/new}} {{also|...}} ==Translingual== ===Symbol===
If the {{also}}
were above the {{character info/new}}
, then the entry would have an ugly blank space above the {{character info/new}}
. This is what I'd like to avoid. I've been fixing entries with that blank space as I find them. Thanks for your work with {{also}}
, I appreciate it. --Daniel Carrero (talk) 03:18, 14 November 2016 (UTC)
- Hi @Daniel Carrero: sorry for the delay; it looks like you fixed at least 500 of these yourself before I got to them. Here are the remaining 334: User:OrphicBot/EditLogs/15November2016/Character Boxes Moved to Top. I wasn't aware this template needed to be above
{{also}}
, but I will put{{also}}
below it in future. I had all but committed these changes when I saw your message yesterday, and meant to write afterwards, but was unfortunately delayed in both by circumstances. I hope our duplication of efforts did not waste too much of your time. Isomorphyc (talk) 18:21, 15 November 2016 (UTC)- Thanks for fixing those 334 entries! I edited the documentation of
{{also}}
and{{character info/new}}
to mention their placement in relation to each other. --Daniel Carrero (talk) 21:24, 15 November 2016 (UTC)- @Daniel Carrero: Speaking of the space before the first language section more generally, should
{{wikipedia}}
ever belong there? It appears 1861 times in the first line: User:OrphicBot/Sandbox/First Line Wikipedia Templates. What I understand from the documentation is that it might be more correct to move these to the appropriate language sections. What do you think? Isomorphyc (talk) 17:26, 16 November 2016 (UTC)
- @Daniel Carrero: Speaking of the space before the first language section more generally, should
- Thanks for fixing those 334 entries! I edited the documentation of
- I believe you are right, and I checked the documentaton of
{{wikipedia}}
too.- Sister-project boxes such as
{{wikipedia}}
and{{wikispecies}}
— and images too, incidentally — are dependent on the current languages and sections, to some extent. Apparently,{{wikipedia}}
belongs in language sections. For example, the entry 愛 is about the Han character meaning "love". The Japanese section could link to w:ja:愛 and the Chinese section could link to w:zh:愛, which are FL articles about the concept. Many of the entries in User:OrphicBot/Sandbox/First Line Wikipedia Templates have only 1 language section, so the Wikipedia box can probably be safely moved to that section. I'm also okay with adding things like{{wikipedia|kilometer}}
(English Wikipedia link) in the Translingual section of km. - ...as opposed to
{{also}}
and{{character info/new}}
, which appear to be about the whole page. These templates basically don't "care" what are the current languages and senses. And for this reason, placing them at the top, above all language sections, makes sense to me.
- Sister-project boxes such as
- --Daniel Carrero (talk) 22:51, 16 November 2016 (UTC)
- I believe you are right, and I checked the documentaton of
Modules
[edit]@CodeCat is the only disagreement left that the modules are reading the pronunciations directly from the pages? What if we put them in modules? It wouldn't be impossible to keep them continually updated. I'd like to help if I can implement anything you can both agree on. Isomorphyc (talk) 19:10, 18 November 2016 (UTC)
- I don't know what you're talking about, can you provide context? —CodeCat 19:22, 18 November 2016 (UTC)
- @CodeCat: Maybe I was out of place to mention it-- I was responding to your desysopping comment, but I did not want to do it in the Module:headwords thread since it was so active then. I specifically had in mind this conversation:Wiktionary:Beer_parlour/2016/September#Separating_transcription_from_transliteration. I realise you didn't participate, but I thought the user with whom you disagreed in principle agreed to move the language-specific code out of Module:links. If you think a workable solution could be made out of Chuck Entz's proposal, I would be willing to implement it. I would like to think that if you could both agree on Module:links, nobody could object to restoring the permissions. My initial note above was referring to the mw.title.new(...):getContent() calls in Module:th. But thought was that all of this data could be moved to modules to avoid the self-referentiality; but it's possible I misread your views, and other users objected to that function call more than you did. Isomorphyc (talk) 23:31, 18 November 2016 (UTC)
- My objection that the code to do this for Thai is placed in Module:links, a general purpose module, rather than in a module for Thai specifically. —CodeCat 23:32, 18 November 2016 (UTC)
- @CodeCat: If I read correctly, he agreed to move it, provided Module:links had a parallel pr= functionality to handle phonetic transcription. That is at least a language agnostic solution, so the Thai code could stay in a Thai module. Is there anything else you would need for this proposal to work for you? Isomorphyc (talk) 23:42, 18 November 2016 (UTC)
- Except that the community at large didn't agree with such a change. The three edits Wikitiki and I made to Module:links, Module:th and Module:th-translit together fix the issue nicely though, but Wyang was opposed to them. Hence the edit war. —CodeCat 23:44, 18 November 2016 (UTC)
- A
|pr=
parameter, which can be manual or automatic (using transcription modules), was exactly what I had in mind. Wyang (talk) 23:48, 18 November 2016 (UTC)- @CodeCat, Wyang: So, it looks like some Classical languages editors would also like the functionality of a parallel transliteration system in Module:links. I think it satisfies both of your criteria-- the only difference is that it is more complicated than WikiTiki89's solution. Why don't you let me try to implement it, if there is a chance you'll both accept the result when I'm done? Isomorphyc (talk) 23:53, 18 November 2016 (UTC)
- That would be great. Looking forward to your version! Wyang (talk) 23:54, 18 November 2016 (UTC)
- It would need a vote first. However, Wyang's cooperation should not hinge on whether his proposal is accepted. If there is consensus in favour of these changes, I'll accept them, my only absolute demand is getting the Thai code out of Module:links. The means by which that is done doesn't matter that much, as long as it has consensus. —CodeCat 23:55, 18 November 2016 (UTC)
- @CodeCat: I don't think it needs a vote if you and Wyang agree. There was an impasse because voting is meaningless when too few users are interested. If you think Chuck Entz's proposal is acceptable in principle, with emendations if necessary, I won't be offended if you reject my implementation in the end. Hopefully I can make this so that no changes are needed on the wikitext side. Isomorphyc (talk) 00:07, 19 November 2016 (UTC)
- It needs a vote because it's a big change to our modules, and practices. We need to give people the chance to oppose if they want to, before it's done and over with. —CodeCat 00:29, 19 November 2016 (UTC)
- @CodeCat: If I offer Chuck Entz's proposal for vote, before implementing a module on which people can vote, would you support it? Isomorphyc (talk) 00:30, 19 November 2016 (UTC)
- It needs a vote because it's a big change to our modules, and practices. We need to give people the chance to oppose if they want to, before it's done and over with. —CodeCat 00:29, 19 November 2016 (UTC)
- @CodeCat: I don't think it needs a vote if you and Wyang agree. There was an impasse because voting is meaningless when too few users are interested. If you think Chuck Entz's proposal is acceptable in principle, with emendations if necessary, I won't be offended if you reject my implementation in the end. Hopefully I can make this so that no changes are needed on the wikitext side. Isomorphyc (talk) 00:07, 19 November 2016 (UTC)
- @CodeCat, Wyang: So, it looks like some Classical languages editors would also like the functionality of a parallel transliteration system in Module:links. I think it satisfies both of your criteria-- the only difference is that it is more complicated than WikiTiki89's solution. Why don't you let me try to implement it, if there is a chance you'll both accept the result when I'm done? Isomorphyc (talk) 23:53, 18 November 2016 (UTC)
- @CodeCat: If I read correctly, he agreed to move it, provided Module:links had a parallel pr= functionality to handle phonetic transcription. That is at least a language agnostic solution, so the Thai code could stay in a Thai module. Is there anything else you would need for this proposal to work for you? Isomorphyc (talk) 23:42, 18 November 2016 (UTC)
- My objection that the code to do this for Thai is placed in Module:links, a general purpose module, rather than in a module for Thai specifically. —CodeCat 23:32, 18 November 2016 (UTC)
- @CodeCat: Maybe I was out of place to mention it-- I was responding to your desysopping comment, but I did not want to do it in the Module:headwords thread since it was so active then. I specifically had in mind this conversation:Wiktionary:Beer_parlour/2016/September#Separating_transcription_from_transliteration. I realise you didn't participate, but I thought the user with whom you disagreed in principle agreed to move the language-specific code out of Module:links. If you think a workable solution could be made out of Chuck Entz's proposal, I would be willing to implement it. I would like to think that if you could both agree on Module:links, nobody could object to restoring the permissions. My initial note above was referring to the mw.title.new(...):getContent() calls in Module:th. But thought was that all of this data could be moved to modules to avoid the self-referentiality; but it's possible I misread your views, and other users objected to that function call more than you did. Isomorphyc (talk) 23:31, 18 November 2016 (UTC)
- In some languages, there is a letter-for-letter transliteration and a more pronunciation-based transcription that does not necessarily correspond to the actual pronunciation, which either may not be known or differs widely between dialects. Would using
|pr=
for this still make sense? Personally, I would prefer if we continued to use|tr=
for transcriptions and created a new parameter for letter-for-letter transliterations such as|translit=
. The latter would nearly always be automatable, and so using the parameter literally should be pretty rare (except for some languages, such as, Akkadian, where it might still require contextual interpretation). --WikiTiki89 14:57, 21 November 2016 (UTC)
- I agree, what you are proposing makes more sense. Right now the implementation is vague as to what pr= will do. We should be sure of the Thai editors' preferences, but it may make the most sense to be backwards compatible with tr= to avoid inconveniencing users unnecessarily (altering 5000 tr= entries to pr= of course is easy, in this case). Indeed, I am not convinced a second parameter is wholly desired. What was really wanted, I believe, was a transliteration process accessible via the lang object which can be either isolated from or combined with the phonetic glossary process as necessary. I'm still thinking about the best way to make a template parameter interface to the two underlying functions, given various types of overriding. Isomorphyc (talk) 18:07, 21 November 2016 (UTC)
zh-translit
[edit]Hi @Wyang Could you please elaborate on something you said in passing on your talk page? You wrote this: The reason Module:th-translit has to be bypassed after Module:links is because feeding a non-phonetic respelling into the transliteration will generate erroneous results..., and the reason I made Module:links feed into Module:th rather than Module:th-translit is because the reverse would result in misnomers... i.e. the "translit" module Module:th-translit used for non-translit purposes; which is why Module:zh-translit does not exist. To be more concrete, are you saying that it is impossible to create Module:zh-translit because 大 = {dà, dài}, 正 = {zhèng, zhēng}, etc? Or are you also referring to the fact that the characters have different pronunciations in Guan, Min, Yue, Wu, etc.? It seems to me that both of these problems can be remedied in various ways, much as {{zh-l}}
simply offers the first Mandarin pronunciation available. I think the issue must be more subtle than this? Thanks, Isomorphyc (talk) 00:24, 21 November 2016 (UTC)
- Hi. Transliteration for Chinese is inherently infeasible: transliteration is the script conversion by spelling, not concerned with the sound of the original spelling. dà is one of the transcriptions of 大 in Mandarin; it is not a transliteration because it cannot be inferred from 大 without the a priori information that 大 is pronounced dà, and it is not necessarily the transcription whenever 大 occurs. I guess there is a possibility that
{{zh-l}}
can be incorporated into the general{{l}}
template, but that will rely on much modification of the way the{{l}}
template works: automatic transcriptions are enabled, and multiple linked-to forms generated for multi-script languages. Wyang (talk) 00:44, 21 November 2016 (UTC)- @Wyang: This is true, but the purpose of getTranslit() in Module:th is to find the a priori uninferrable information in the entry. That is, what we are calling transliterations in Module:links are the composition of several operations, including a slightly trivial lookup operation. Assuming I am willing to do something similar in Chinese, is the situation still very much more complicated than Thai? I am trying to understand what different architectures will enable outside of Thai, and what they won't. Isomorphyc (talk) 00:52, 21 November 2016 (UTC)
- The Thai extract-phonetic function works very much like the Pinyin lookup in
{{zh-l}}
- only difference is that it doesn't get the transcription directly, only the respelled form, which is subsequently fed into the{{th-pron}}
. Chinese is potentially more complicated than Thai in its link structure; for the moment there are also alternative forms (simplified generated for now) that need to be displayed alongside the traditional word used as input. The fact that zh-l only generates the Mandarin transcription automatically is also something that some editors (me included) are not completely satisfied about; see for example Module:User:Suzukaze-c/zh-l and the talk page. This is probably a separate complex issue on its own. Wyang (talk) 01:05, 21 November 2016 (UTC)- @Wyang: Thank you; this makes a lot of sense. I noticed in Thai also the regular expression has several options to choose from, although far less often than in Chinese. So essentially, the lookup in Chinese is much more complicated, although once one has the pinyin, the rest of the process is simpler than in Thai. I appreciate your help with this. Isomorphyc (talk)
- The Thai extract-phonetic function works very much like the Pinyin lookup in
- @Wyang: This is true, but the purpose of getTranslit() in Module:th is to find the a priori uninferrable information in the entry. That is, what we are calling transliterations in Module:links are the composition of several operations, including a slightly trivial lookup operation. Assuming I am willing to do something similar in Chinese, is the situation still very much more complicated than Thai? I am trying to understand what different architectures will enable outside of Thai, and what they won't. Isomorphyc (talk) 00:52, 21 November 2016 (UTC)
There should be a vote before this is implemented, so that it's clear there is a consensus for it. —CodeCat 15:24, 21 November 2016 (UTC)
- @CodeCat: Yes, I should have put it in Module:User:Isomorphyc/languages-draft to begin with; the Module:language draft-module slippery slope turned out to be a bit less steep than I realised or I would have done it this way. I agree to the principle that this needs a vote. Also, my two edits to Module:languages/data2 are undone and moved to Module:User:Isomorphyc/languages-draft/data2. A question which is orthogonal to what I am doing: was putting the override_translit data in Module:language/data3(...), per the to-do list, worth doing, or would you prefer I revert that? It's been in the main module for so long I wondered if people liked it there.
- Also, I hope I will be able to convince you that Wyang's feature is worth accommodating, but if we can't agree on that, I still won't propose a draft module for vote if you are uncomfortable with the way it is structured. I'm not really sure what you think of adding the respelling module name as a new parameter in the language structure, as a means to accomplish this end? You can hold judgement if you prefer. Thanks for being willing to be involved in this. Isomorphyc (talk) 16:21, 21 November 2016 (UTC)
- That's why I would like to see a vote. Then it's very clear that there is an agreement and I wouldn't be inclined to go against that. I would like to work with you on the draft, too. —CodeCat 16:44, 21 November 2016 (UTC)
- Please feel free-- I always feel more comfortable when you are looking at my changes in modules I have not worked on much. All the Thai transclusions are copied in User:Isomorphyc/Sandbox6 and a smaller set is in User:Isomorphyc/Sandbox7. Isomorphyc (talk) 16:49, 21 November 2016 (UTC)
- Edit: one of the reasons I am doing this is to be able to offer proposed changes that are user facing, in order to make people comfortable to vote. Isomorphyc (talk) 16:53, 21 November 2016 (UTC)
- That's why I would like to see a vote. Then it's very clear that there is an agreement and I wouldn't be inclined to go against that. I would like to work with you on the draft, too. —CodeCat 16:44, 21 November 2016 (UTC)
πολυσύμμαχος
[edit]I found this word here and I am curious about it. What can you tell me about it or where you found it? Thanks, IOHANNVSVERVS (talk) 05:14, 15 December 2016 (UTC) IOHANNVSVERVS (talk) 05:14, 15 December 2016 (UTC)
- @IOHANNVSVERVS: It should be in the Lexikon zur byzantinischen Gräzität, which is here: [8]. However, this site is not functioning for me at this time, so I cannot verify this. It ought to mean `having many allies,' of course, but I am not able to tell you where it is attested. I hope this dictionary page will work for you when you read my note. Isomorphyc (talk) 14:08, 15 December 2016 (UTC)
- That's very interesting, thank you for your response. I would be very interested to see where it attested in literature. I came up with it independently modelled on the word "πολύφιλος" as from the quotation "Εὐτυχία πολύφιλος". I intended to use it to make "Δίκη πολυσύμμαχος". Please let me know if you can find more information on this word. Thank you greatly.
- @IOHANNVSVERVS: This isn't much to offer, but I was able to access the entry at LBG. I am not familiar with either of the citations it gives, but they are:
- Pseudo-Elias (Pseudo-David), Lectures on Porphyry’s Isagoge, ed. L.G.WESTERINK. Amsterdam 1967. [s.VI] (27.12)
- Anecdota Graeca e codd. manuscriptis bibliothecae regiae Parisiensis, ed. J.A.CRAMER. Vol.I–IV. Oxford 1839. (IV.432)
- And here is the text of the entry; its not much:
- πολυσύμμαχος, ὁ starker Verbündeter: -μαχε σύμμαχε Ῥώμης PsElias 27,12 = CramPar IV 432.
- Isomorphyc (talk) 19:43, 15 December 2016 (UTC)
πολυέχθρος, πολύεχθρος
[edit]Hello again, I wonder what would be the antonym of πολυσύμμαχος? I like πολυέχθρος as meaning "having many enemies", but I wonder if would be ambiguous and could be interpreted as "having much enmity". I think πολύεχθρος would be more proper for "having many enemies" but I don't appreciate it so much aesthetically. Do you have any information or opinion to share here? Thanks, IOHANNVSVERVS (talk) 22:56, 27 May 2017 (UTC)
@IOHANNVSVERVS: Hello again; you can find πολλοὺς δ᾽ ἐχθρούς in Xenophon, as literal enemies, whereas Homer would use it more abstractly, as in your second interpretation of your coined word. Conversely, depending on context, you could write δημεχθής, hated by the people. But your word might be negated in such a variety of ways it is too simplistic to speak of a single antonym. Adding words to the lexicon is a delight of its own, but you might want to see the variety of ways authors you admire express this idea. Most likely this semantic unit will not normally map to a single word in a realistic usage. Isomorphyc (talk) 03:48, 28 May 2017 (UTC)
- Consider this usage:
ὁ δίκαιος πολυσύμμαχος
ὁ ἄδικος πολυέχθρος
- (second line variations: πολύεχθρος ἄδικος, ὁ ἄδικος πολύεχθρος, πολυέχθρος ὁ ἄδικος)
- IOHANNVSVERVS (talk) 14:45, 29 May 2017 (UTC)
Share your experience and feedback as a Wikimedian in this global survey
[edit]Hello! The Wikimedia Foundation is asking for your feedback in a survey. We want to know how well we are supporting your work on and off wiki, and how we can change or improve things in the future.[survey 1] The opinions you share will directly affect the current and future work of the Wikimedia Foundation. You have been randomly selected to take this survey as we would like to hear from your Wikimedia community. To say thank you for your time, we are giving away 20 Wikimedia T-shirts to randomly selected people who take the survey.[survey 2] The survey is available in various languages and will take between 20 and 40 minutes.
You can find more information about this project. This survey is hosted by a third-party service and governed by this privacy statement. Please visit our frequently asked questions page to find more information about this survey. If you need additional help, or if you wish to opt-out of future communications about this survey, send an email to surveys@wikimedia.org.
Thank you! --EGalvez (WMF) (talk) 22:25, 13 January 2017 (UTC)
- ^ This survey is primarily meant to get feedback on the Wikimedia Foundation's current work, not long-term strategy.
- ^ Legal stuff: No purchase necessary. Must be the age of majority to participate. Sponsored by the Wikimedia Foundation located at 149 New Montgomery, San Francisco, CA, USA, 94105. Ends January 31, 2017. Void where prohibited. Click here for contest rules.
Comic strips, charges and cartoons (read below)!
[edit]Hello, exist a problem in several articles and verbets of Wikipedia and Wiktionary in Portuguese, English and Spanish!
[edit]Was be saying that comic strip, charge and cartoon are synonymous, when, in really, are different things!
Below, the explanations of that are the comic strips, charges and cartoons:
- Comic strip: comics of short duration with the charts disposed and organized in form of a strip, how the proper name already implies. The comic strips may or may not be humoristic and contains strong critics for the social values. They also can be daily, published in smaller quantities, and, generally, in black and white (although that some are colored) or Sunday, published in big quantities, ever colored and occupying a space equipollent to, in at least, a whole page. The term comes from the American English, comic strip and means comics strip.
- Charge: humoristic comics of short duration and that contains strong critics of the people and things of the contemporaneity. The term comes from the Franco Belgian French, charger and means load or exagere.
- Cartoon: humoristic comics of short duration and that contains strong critics of the daily to daily situations. Because of the similarities between the first animation short films and the cartoons printed and published in newspapers, magazines and books from the epoch, the animated drawing also is called of cartoon (or, unabbreviated, animated cartoon), be or not humoristic. The term comes from the British English, cartoon and that of the Italian, cartone and means piece of big card, stub or study.
Here they here the articles for be revised in the respective idioms: https://pt.wikipedia.org/wiki/Tira_de_banda_desenhada, http://pt.wikipedia.org/wiki/charge, https://pt.wikipedia.org/wiki/Cartoon, https://en.wikipedia.org/wiki/Comic_strip, https://en.wikipedia.org/wiki/Editorial_cartoon, http://en.wikipedia.org/wiki/Cartoon, https://es.wikipedia.org/wiki/Tira_de_prensa, https://es.wikipedia.org/wiki/Exageraci%C3%B3n_burlesca, https://pt.wiktionary.org/wiki/tira_cômica, https://pt.wiktionary.org/wiki/charge, https://pt.wiktionary.org/wiki/cartum, https://en.wiktionary.org/wiki/comic_strip, https://en.wiktionary.org/wiki/charge, https://en.wiktionary.org/wiki/cartoon, https://es.wiktionary.org/wiki/tira_cómica, https://es.wiktionary.org/wiki/charge and https://es.wiktionary.org/wiki/cartón!
Including and principally, the certain is that the Wikipedia articles (described soon above!) should receive the following names in each idiom: Tira de banda desenhada, Charge and Cartum (desenho humorístico) - in Portuguese, Comic strip, Charge (humoristic drawing) and Cartoon - in English and Tira de historieta, Charge (dibujo humorístico) and Cartón (dibujo humorístico) - in Spanish!
Remembering and highlighting that the caricature has nothing to do with the other three because isn't a form of comic: is, simply, a humoristic exaggerated drawing of something or someone, be real or not, does not even have texts!
And well, as you can see, the cartoon isn't a type of comic strip, neither the charge is a type of cartoon, if possible, please, warn to your fellow editors to make the changes, very thanks since now for all attention and interest and a hug!
Saviochristi (talk) Saviochristi (talk) 12:13, 21 January 2017 (UTC)
Also
[edit]Hello, I am a sysop in Turkish Wiktionary. Your bot named OrphicBot is adding to articles the "also" template, here. Can a similar application be made in Turkish Wiktionary? Turgut46 (talk) 13:40, 4 November 2017 (UTC)
- @Turgut46: I'm glad you like the small change which I made in the English Wiktionary. I could possibly add these Şablon:Bakınız templates myself, or if an existing robot operator on the Turkish Wiktionary would like to do this and keep it maintained, I could try to document my code a little bit better. (It is available on the OrphicBot user page). Eventually, I want to host all of my tools on Wikimedia Tools Lab for anyone to use more easily, but it will be some time before I can do this. A caveat is that for personal reasons, there is a possibility that for the next month I won't be able to commit to when, if, or how much I will be able to help. Let me know what you had in mind, though, and I will do my best. Isomorphyc (talk) 03:37, 5 November 2017 (UTC)
- Okay, is this difficult? --Turgut46 (talk) 08:52, 5 November 2017 (UTC)
- @Turgut46: It is not difficult if you are comfortable with the same character equivalence rules which the English Wiktionary community have refined for our use here: User:OrphicBot/equivalences.txt. The only complexity is differences in Wikitext conventions, which may need some parser modifications. I would probably be able to do it in a day or two if you wanted me to do it, and if you were comfortable for me to have a robot account on your Wiktionary, given that, collaboration may be difficult since, regrettably, I cannot read Turkish. In this case it would likely be at least a few weeks from now that I would be able to start. If an existing robot operator wanted to use my code, it might be more work, though I think it would depend on the person. Isomorphyc (talk) 15:55, 5 November 2017 (UTC)
- You don't need to read Turkish. Just add the "also" template. Only this. Please do not forget to mark the changes as small. How long does this take? --Turgut46 (talk) 17:51, 5 November 2017 (UTC)
- @Turgut46: It takes less than a day from when I start. However, I may not be able to start until early December. Is this all right? If my schedule changes (I may know this week) I will tell you as soon as I can. Can you please do two things for me? My robot account page is here: wikt:tr:Kullanıcı:OrphicBot. Can you please translate it into Turkish? You can add any information you think is important. My robot account also needs API access, like this: wikt:tr:Vikisözlük:Botlar. Can you please arrange for this? Thanks; I am excited to be able to help with this! Isomorphyc (talk) 18:20, 5 November 2017 (UTC)
- There is no version control system here. Your changes are automatically approved. Good luck with. --Turgut46 (talk) 18:41, 5 November 2017 (UTC)
- @Turgut46: The bot flag is necessary to query the database efficiently and make many changes quickly. Please see this example of the flag: wikt:tr:Özel:KullanıcıHakları/HydrizBot. This robot application was approved in 2016; see here: wikt:tr:Vikisözlük:Botlar/Bot_başvurusu/Arşiv/2017. If I post an application here: wikt:tr:Vikisözlük:Botlar/Bot_başvurusu, could you arrange for a Bürokratlar approve the flag if the vote succeeds? I am sorry I cannot update the "also" templates without the correct flag. Thanks. Isomorphyc (talk) 18:59, 5 November 2017 (UTC)
- Look at this page: tr:Vikisözlük:Botlar/Bot başvurusu/OrphicBot --Turgut46 (talk) 19:06, 5 November 2017 (UTC)
- @Turgut46: Thank you! Isomorphyc (talk) 19:08, 5 November 2017 (UTC)
- Look at this page: tr:Vikisözlük:Botlar/Bot başvurusu/OrphicBot --Turgut46 (talk) 19:06, 5 November 2017 (UTC)
- @Turgut46: The bot flag is necessary to query the database efficiently and make many changes quickly. Please see this example of the flag: wikt:tr:Özel:KullanıcıHakları/HydrizBot. This robot application was approved in 2016; see here: wikt:tr:Vikisözlük:Botlar/Bot_başvurusu/Arşiv/2017. If I post an application here: wikt:tr:Vikisözlük:Botlar/Bot_başvurusu, could you arrange for a Bürokratlar approve the flag if the vote succeeds? I am sorry I cannot update the "also" templates without the correct flag. Thanks. Isomorphyc (talk) 18:59, 5 November 2017 (UTC)
- There is no version control system here. Your changes are automatically approved. Good luck with. --Turgut46 (talk) 18:41, 5 November 2017 (UTC)
- @Turgut46: It takes less than a day from when I start. However, I may not be able to start until early December. Is this all right? If my schedule changes (I may know this week) I will tell you as soon as I can. Can you please do two things for me? My robot account page is here: wikt:tr:Kullanıcı:OrphicBot. Can you please translate it into Turkish? You can add any information you think is important. My robot account also needs API access, like this: wikt:tr:Vikisözlük:Botlar. Can you please arrange for this? Thanks; I am excited to be able to help with this! Isomorphyc (talk) 18:20, 5 November 2017 (UTC)
- You don't need to read Turkish. Just add the "also" template. Only this. Please do not forget to mark the changes as small. How long does this take? --Turgut46 (talk) 17:51, 5 November 2017 (UTC)
- @Turgut46: It is not difficult if you are comfortable with the same character equivalence rules which the English Wiktionary community have refined for our use here: User:OrphicBot/equivalences.txt. The only complexity is differences in Wikitext conventions, which may need some parser modifications. I would probably be able to do it in a day or two if you wanted me to do it, and if you were comfortable for me to have a robot account on your Wiktionary, given that, collaboration may be difficult since, regrettably, I cannot read Turkish. In this case it would likely be at least a few weeks from now that I would be able to start. If an existing robot operator wanted to use my code, it might be more work, though I think it would depend on the person. Isomorphyc (talk) 15:55, 5 November 2017 (UTC)
- Okay, is this difficult? --Turgut46 (talk) 08:52, 5 November 2017 (UTC)
Also's that don't belong
[edit]Hey Iso. I'm not sure how actively interested you are in pursuing new projects with {{also}}
, but I've been thinking that it would be useful to purge unwanted redlinks from it. Some, of course, are wanted, in the sense that an entry will be created at that link. Others, however, are leftover from entries which were deleted at RFV or RFD and yet still had links pointing to them. How easy would this be to solve? —Μετάknowledgediscuss/deeds 23:19, 14 December 2017 (UTC)
- @Metaknowledge: Thanks for stopping in; I had been looking for an excuse to come say hi to you on your talk page, actually. It is a bit mortifying how long I have been gone, but I'm very interested in the also-templates just now. Just a new iteration will be enough to delete the red-links. Isomorphyc (talk) 00:21, 15 December 2017 (UTC)
- You're always welcome to drop by! Well, the thing is that I don't think all redlinks should go, just the ones that are the result of deletions. For example, wolpertinger had a link to Wolpertinger before the latter page was created, which was all fine and proper. —Μετάknowledgediscuss/deeds 00:46, 15 December 2017 (UTC)
- I believe in previous runs of Isomorphic's also script all redlinks were deleted.
{{also}}
is not for requesting entries, and it defeats the purpose of the template (navigation) if it is used in such a manner. DTLHS (talk) 00:50, 15 December 2017 (UTC) - @Metaknowledge: Thanks! Initially I did delete all the red-links, but that was an overdue accumulation. For new ones, it wouldn't be unreasonable to leave a month grace period. But red-links arising only from deletions is too low a bar; I think the earlier sense was that also-templates are not the correct place to request new entries. @DTLHS: You said this before I could. Isomorphyc (talk) 00:52, 15 December 2017 (UTC)
- Well, that does seem pretty sound, even if it's not really my preference. —Μετάknowledgediscuss/deeds 01:31, 15 December 2017 (UTC)
- @Metaknowledge: I'll see if I can think of anything better. I'm hoping to have a bit of a web UI this time, so maybe there are intermediate options. Isomorphyc (talk) 01:33, 15 December 2017 (UTC)
- Well, that does seem pretty sound, even if it's not really my preference. —Μετάknowledgediscuss/deeds 01:31, 15 December 2017 (UTC)
- I believe in previous runs of Isomorphic's also script all redlinks were deleted.
- You're always welcome to drop by! Well, the thing is that I don't think all redlinks should go, just the ones that are the result of deletions. For example, wolpertinger had a link to Wolpertinger before the latter page was created, which was all fine and proper. —Μετάknowledgediscuss/deeds 00:46, 15 December 2017 (UTC)