User talk:AutoFormat
Add topicIf you want to add multiple suggestions at once, please make them separate sections, it is easier to discuss.
lang= parameter on pronunciation
[edit]Hi there! I would like to ask (maybe again) if your bot could add lang= parameter on all templates of the Pronunciation header, including {{SAMPA}}
, {{X-SAMPA}}
, {{rhymes}}
, {{homophones}}
and {{hyphenation}}
, like it does for {{IPA}}
. Normally, we put this parameter while editing, but I lose a lot of time filling all the templates previously written without lang=. Do you think it can be possible? Many thanks for now, have a nice day. Pharamp 20:07, 29 December 2009 (UTC)
- Per WT:TODO, ideally
{{etyl}}
and{{context}}
labels should have a lang specified as well. Not for things like{{by extension}}
, but if someone then adds another context like{{fishing}}
it does need a lang parameter, so if there is already done it doesn't do any harm. I already add lang=fr (in my case) to transitive and intransitive, as they may becoming categorizing in the future. Mglovesfun (talk) 20:40, 18 January 2010 (UTC)
removing empty checktrans tables
[edit]Hi, RU, any chance Af could remove
<!--Remove this section once all of the translations below have been moved into the tables above.--> {{checktrans-top}} {{trans-mid}} {{trans-bottom}}
where it appears, or
{{checktrans-top}} {{trans-mid}} {{trans-bottom}}
otherwise? (IMO that's reason enough to save a page even if not otherwise doing so.)—msh210℠ 20:36, 18 January 2010 (UTC)
- note also that some sections might be using
{{checktrans-mid}}
and/or{{checktrans-bottom}}
Thryduulf 21:14, 18 April 2010 (UTC)- yes, I have to think about the combinations. The comment is irritating ... AF does have a specific code path at that point, so I don't have to use fancy regex. Although the regex would not be hard. Robert Ullmann 11:25, 19 April 2010 (UTC)
- hmm
Regex['remove empty checktrans section'] = (re.compile(r'(<!-- ?Remove this.*-->\n|)' \ r'\{\{checktrans-top}}\n*\{\{(check|)trans-mid}}\n*\{\{(check|)trans-bottom}}\n', \ re.M), r'') # must not have re.S!
- ought to do it ... we'll see. Robert Ullmann 11:45, 19 April 2010 (UTC)
At least one case works. Robert Ullmann 12:18, 19 April 2010 (UTC)
- Thanks.—msh210℠ 15:07, 22 April 2010 (UTC)
Prepositional phrases
[edit]Prepositional phrases are now allowed as per the results of Wiktionary:Votes/pl-2010-01/Allow "Prepositional phrase" as a POS header. You don't need to tag them with {{rfc-xphrase}}
anymore, so please update your code if you haven't already. Thanks. —Internoob (Disc.•Cont.) 00:24, 21 February 2010 (UTC)
- I've noted this at the control file User:AutoFormat/Headers#Mostly standard POS. Let's see if that is enough. --Bequw → τ 14:56, 21 February 2010 (UTC)
- see below; the table change is correct, but the xphrase thing needs some code tweak. Robert Ullmann 10:49, 19 April 2010 (UTC)
* before ja-readings reversions
[edit]I'm curious as to why the bot is reverting these recent changes I made. {{ja-readings}}
already adds the * to each line automatically, and the talk page description doesn't require * before the template in entries. Putting * in the actual entry is redundant code, and removing it doesn't change the way the information is rendered. By that same note, keeping it doesn't do anything either, but it seems like it is unecessary. I'm not planning any mass changes, just those that I've stumbled on and thought the * is misplaced. If AutoFormat could be changed so it doesn't revert these kinds of minor corrections in the future, that would be great.
I've reviewed the similar discussion with respect to other templates needing leading bullets, but I think this template is different. {{ja-readings}}
generates a list of at least two to six lines, so removing the bullets from the template is not an option. Otherwise, the leading bullet in the entry would only bullet the first reading in the list, leaving the others unbulleted. I appreciate your taking the time to consider this request. Thanks.Dcmacnut 17:10, 4 March 2010 (UTC)
- There are a couple of good reasons to leave it (the *) in the wikitext as well. One is to make it easier on both programs reading the wikitext and on users to see that it is part of a list. The ability to do this redundantly is intentional in wikitext, not accidental. Also consider this:
* some reading * {{ja-readings}} * some other reading
renders correctly:
- some reading
- some other reading
But:
* some reading {{ja-readings}} * some other reading
does not:
- some reading
- some other reading
The difference is only just visible, but affects the HTML:
<pre> * some reading * {{ja-readings}} * some other reading </pre> <p>renders correctly:</p> <ul> <li>some reading</li> <li>some other reading</li> </ul> <p>But:</p> <pre> * some reading {{ja-readings}} * some other reading </pre> <p>does not:</p> <ul> <li>some reading</li> </ul> <ul> <li>some other reading</li> </ul>
in which the latter is two separate lists. This example is somewhat contrived in this case, as {ja-readings} with no parameters is unusual, but does work correctly. (And listing other readings first is a bit odd, but it might be the most important reading, and not a usual one.) However, it does illustrate the principal. Is easier to have the users (bots etc) always use * on list items, and not worry about how multiple bullet lines might be generated.
There is a similar case with definition lines, where we always require the # in the wikitext. Robert Ullmann 11:36, 19 April 2010 (UTC)
kkk
[edit]Does this bot still sort language sections? These sections at the current entry "kkk" are in incorrect order for approximately a week. --Daniel. 23:26, 18 March 2010 (UTC)
- It will, sooner of later. As in this case. It looks at entries after each (non-bot) edit, so it picks up most quickly. Bots adding section add the {rfc-auto} section. Robert Ullmann 11:15, 19 April 2010 (UTC)
As at User:Mglovesfun/Form of of, it would be nice to remove any extra "of" that people put in the first parameter of the template. Conrad.Irwin 11:56, 19 March 2010 (UTC)
- Done, not tested yet. Robert Ullmann 11:15, 19 April 2010 (UTC)
- works, in first case noted edit Robert Ullmann 13:38, 19 April 2010 (UTC)
just a suggestion
[edit]This bot should detect and remove duplicate usages of {{count page}}
. Otherwise, it piles up for inflected forms shared by multiple languages. -- Prince Kassad 17:34, 20 March 2010 (UTC)
- If the bot(s) are using the recommended code, which adds count page only if [[ not in the existing text, they won't pile up. In any case, a word that has multiple language forms ought to acquire iwikis quickly, and all of the count page templates will go away. If not either, it is at least harmless. (;-) Robert Ullmann 11:15, 19 April 2010 (UTC)
Malabarista has been listed in the request for AutoFormat category for a week. I can only assume it's using a deprecated template but AutoFormat doesn't know what to replace with what. Are nor do I, can we track this down? Mglovesfun (talk) 12:39, 25 March 2010 (UTC)
- The template {mf} and a few others can get pages stuck there when AF doesn't have a rule to eliminate them, typically when the call is indirect. I clean these up manually once in a while; meanwhile they cause no harm. Robert Ullmann 11:15, 19 April 2010 (UTC)
- I don't see when that entry was in the cat? Doesn't matter. Robert Ullmann 11:51, 19 April 2010 (UTC)
On a further note, the templates {{ja-attention}}
et al. have all been merged into {{attention}}
, so AutoFormat shouldn't be adding those to entries. Mglovesfun (talk) 12:39, 25 March 2010 (UTC)
Multiple identical language sections
[edit]I was wondering what this bot does if it finds multiple language sections for the same name. For example, two ==Dutch==
sections. Does it append the second to the first, or just give up and put a maintenance template on it? I would like my bot to be able to append to an existing Dutch section, but it's harder for me to write all the section splitting code than it is to just put the additional contents at the end and let AutoFormat sort it out. Is this a viable option? —CodeCat 09:04, 27 March 2010 (UTC)
- I understand from your bot doc and tests that you have the answer (;-) Yes, it will merge the sections. Robert Ullmann 11:15, 19 April 2010 (UTC)
POS header "Prepositional phrase"
[edit]Hi Robert,
AutoFormat has recently edited some entries by converting ===Prepositional phrase=== to ===Preposition phrase=== and adding {{rfc-xphrase}}
. This doesn't seem to be specified by User:AutoFormat/Headers; and as of this vote, it's no longer correct behavior. Can it be changed?
Thanks in advance!
—RuakhTALK 23:49, 8 April 2010 (UTC)
- The xphrase bit needs code changed as well. You've got the table entry fine. Will fix presently. Robert Ullmann 10:54, 19 April 2010 (UTC)
- Fixed, not tested. Robert Ullmann 11:15, 19 April 2010 (UTC)
- Thank you! —RuakhTALK 17:33, 20 April 2010 (UTC)
Does the bot check those in this category to see if the issues have been resolved? I fixed one header in the category but didn't remove the template; there could be other problems. Should they be removed it will the bot do it? -- 124.171.169.189 06:47, 17 April 2010 (UTC)
- (butting in) you should also remove the
{{rfc-level}}
tag. AutoFormat can re-tag it if you haven't fixed the problem, so don't worry about that. Mglovesfun (talk) 10:54, 19 April 2010 (UTC) - AF will remove the tag and re-check if it looks at the page, normally when you edit a page it will pick it up in RC and do that. If you'd like to force it, change -level to -auto. It doesn't scan the category, it does pick up things from the XML after a while. Robert Ullmann 11:15, 19 April 2010 (UTC)
Would it be possible for AF to adjust the following various italicisation methods to use {{sense}}
:
* (''word'') * (''word''): * ''(word)'' * ''(word):'' * ''(word)'': * {{italbrac|word}}: * {{i|word}}: * {{i-c|word}} * {{italbrac-colon|word}}
And the same when not precededed by the *.
In all cases, only when they appear at the start of a line in the synonyms or antonyms sections and are followed by one or more wikilinked words. It might be possible/desirable in other sections as well, but I'm not certain.
Its not something I think that should be gone through specifically, but as and when it encounters the in it's routine checking. Thryduulf 16:32, 20 April 2010 (UTC)
- Will take a bit of thought, and code to apply to only *nym sections (others?). Will keep it mind. Robert Ullmann 13:17, 22 April 2010 (UTC)
- Certainly Hypernyms sections should be included, probably Hyponyms and other *nyms too; WT:ELE suggests Cooridinate terms should be part of this set. I can see the benefit to them in usage notes, alternative spellings and see also sections as well but I don't know whether this method of marking which sense the links apply to is widely used enough to make it worth coding.
- We generally don't want it in pronunciation sections, as they use a different structure. See the new section below for my incomplete thoughts regarding similar patterns in the translations section. Thryduulf 15:41, 22 April 2010 (UTC)
- Maybe include derived and related terms sections as well. See this eidt for example. Thryduulf 22:44, 22 April 2010 (UTC)
- No, Derived and Related terms should be alphabetical. --EncycloPetey 16:26, 8 May 2010 (UTC)
- Err, that doesn't make sense? This isn't about changing the order of the entries, just replacing one type of markup - with another. Thryduulf (talk) 17:13, 8 May 2010 (UTC)
- But you've suggested inserting inappropriate markup where it can't be used. If Related terms and Derived terms are arranged alphabetically (as we normally do), then a
{{sense}}
tag is illogical, because it is meant to group items by sense. Inserting that tag into the Derived terms section (as you have done) is not a step forward, but sideways; it replaces one error with another. --EncycloPetey 19:18, 8 May 2010 (UTC)- I still don't understand. Where term has multiple senses, and one or more terms derived from a specific sense, then the derived terms are grouped by sense and listed alphabetically with that sense - nothing will change except the markup used to note the sense. Where the sense are not grouped by sense, then it is a straight alphabetical list and I'm not proposing to change that - indeed in this situation, nothing will change. Where not all terms in a list are marked as derived from a specific sense, then nothing will change except the markup used to note the sense. I note that you've still not fixed the entry you reverted my edits to - either to remove the error I pointed out or to fix what you say is an error in the structure. Thryduulf (talk) 20:57, 8 May 2010 (UTC)
- EP: he's just talking about the markup syntax, changing the syntax at the top of this section to {sense}. It is NOT about re-arranging things or changing any semantics. The question is: are all uses of (e.g.)
{{italbrac}}
: at the start of a line in these sections reasonably convertable to {sense}? In whichever cases it (the contents of italbrac) is not a "sense", if any, is it harmful (given that the syntax was already "wrong")? - and his edit to blauw was correct; the issue is that the others may (or may not) need to be sorted into senses. (WT:ELE#Synonyms ;-) Robert Ullmann 00:43, 9 May 2010 (UTC)
- But you've suggested inserting inappropriate markup where it can't be used. If Related terms and Derived terms are arranged alphabetically (as we normally do), then a
- Err, that doesn't make sense? This isn't about changing the order of the entries, just replacing one type of markup - with another. Thryduulf (talk) 17:13, 8 May 2010 (UTC)
- No, Derived and Related terms should be alphabetical. --EncycloPetey 16:26, 8 May 2010 (UTC)
Similar to this edit, could AF remove colons following {{sense}}
(which generates it's own colon). Thanks. Thryduulf (talk) 22:53, 15 July 2010 (UTC)
- done [1] Robert Ullmann 13:22, 16 July 2010 (UTC)
- Cheers. Thryduulf (talk) 13:28, 16 July 2010 (UTC)
- Thank you. Now if we could fix mis-uses of "it's". ("it's own colon"? Indeed. ;-) Rule: read "it's" as "it is", if that sounds wrong, it's. (<smirk>) Yes, "it is" has a verb, in "it's", the "'s" doesn't count as the required verb ... English is crazy. But so is everything else. Robert Ullmann 16:14, 16 July 2010 (UTC)
- /me considers himself suitably chastised. Thryduulf (talk) 16:50, 16 July 2010 (UTC)
- Thank you. Now if we could fix mis-uses of "it's". ("it's own colon"? Indeed. ;-) Rule: read "it's" as "it is", if that sounds wrong, it's. (<smirk>) Yes, "it is" has a verb, in "it's", the "'s" doesn't count as the required verb ... English is crazy. But so is everything else. Robert Ullmann 16:14, 16 July 2010 (UTC)
- Cheers. Thryduulf (talk) 13:28, 16 July 2010 (UTC)
Sort enPR, IPA and SAMPA into order
[edit]Would it be possible for AF to sort any instances where any two or more of {{enPR}}
, {{IPA}}
, {{SAMPA}}
and {{X-SAMPA}}
appear on the same line into this order if they aren't already? e.g. it would do as I did in this edit. Thryduulf 12:54, 21 April 2010 (UTC)
- See also Wiktionary:Grease pit#Italian stress-marking pronunciation, where I've noted my creation of
{{it-stress}}
. If this not rejected, it should be treated as per{{enPR}}
in the sort order (they should never appear in the same entry, as{{it-stress}}
is for Italian and{{enPR}}
for English). Thryduulf 13:28, 21 April 2010 (UTC)- Done, will test presently. Not including {it-stress} at present because you will be putting it in the right place? (;-). Robert Ullmann 13:18, 22 April 2010 (UTC)
- Works here. Had one bug. Robert Ullmann 13:32, 22 April 2010 (UTC)
- Done, will test presently. Not including {it-stress} at present because you will be putting it in the right place? (;-). Robert Ullmann 13:18, 22 April 2010 (UTC)
Is working very well. Thanks, good idea! Robert Ullmann 14:54, 22 April 2010 (UTC)
It seems I'm finding more stuff with this pronunciation cleanup, and {{shavian}}
now exists too - see WT:GP#Template:shavian. If you want to include it in your sorting algorithm, it should go at the end of the line (after SAMPA) as that is it's alphabetical position (it's also probably the least useful!). AIUI from the WP article, it should only appear on English entries (if it's ever used elsewhere than quadrillion that is!). Thryduulf 15:17, 28 April 2010 (UTC)
- My "sorting algorithm" is 3 regex rules, that can be fired repeatedly on a line. The number of rules required is n(n-1)/2; for 3 things (X- and SAMPA as one pattern) that is 3. For 4 things it is 6, for 5 it is 10 ... at which point I have to write some code to take the line apart and sort it ;-) I'll keep them in mind. Robert Ullmann 14:21, 29 April 2010 (UTC)
Following on from the section above re template:sense I've been wondering about where similar markup appears in translations sections.
Here the template should be {{qualifier}}
not {{sense}}
as the separate trans-tables should separate the different senses of the English word, but are used to distinguish different aspects of the foreign word. An example is cousin#Translations where different languages have a greater number of terms (e.g. Arabic has 8 terms to English's one).
The following should be fairly simple to determine what text actually is the qualifier. In all cases where it is just a single letter (e.g. m) then it is most likely this is the grammatical gender and not a qualifier.
- Where the existing entry uses a template like
{{italbrac}}
, e.g. the Arabic translations at cousin - Where it is the only part of a line not in a
{{t}}
template, e.g. the Persian translations at cousin
More complex possibly are these:
- Where it is the only part of a line not an internal link, e.g. the Latin translations at cousin
- Where it is the only part of a line not an internal link or a gender template, e.g. the Icelandic translations at cousin
- Where it is the only part of a line italicised and/or parenthesised, e.g. the Hebrew, Ewe and Russian translations at cousin (for Hebrew here the hyphens should also be removed). Note though that the parenthesised part of the Hebrew translation is not a qualifier but a transliteration.
There appears to be no standard whereby the qualifier comes before or after the translation (although the latter is more common), nor whether one or multiple lines are used. Qualifiers can therefore come before, between and/or after the translations. Some translations are explicitly qualified while others on the line are only implicitly qualified (e.g. the Basque translations at cousin).
A clue is possibly that the qualifier should be (almost?) entirely written in latin script, so if it isn't then it's not autoformattable as a qualifier. This might be an easyish check? It wouldn't help with other languages that use the Latin script though. Another is that all/the majority of the words should be in English - I'm not programmer, but I guess checking to see whether the words have English definitions here would be very processor/server intensive? Would there be an easier way? It also isn't guaranteed to work as there might be non-English words with the same spelling, or we might not have an entry yet.
Don't panic! I'm not asking you to add this yet, this is little more than a brain dump of my current vague thinking on the subject in the hope that additional brains will help! Do feel free to move this whole section to somewhere else if you don't want it cluttering the talk page. Thryduulf 16:17, 22 April 2010 (UTC)
- okay, no panic ... is o-dark-thirty ... a few examples would probably help here? Cheers, Robert Ullmann 22:15, 22 April 2010 (UTC)
- The first trans-table at cousin is a good example of various different formats. I'll go through them manually tomorrow if I get time if that would help. This edit at business is an example of a change. Thryduulf 22:42, 22 April 2010 (UTC)
Make minor copyediting edits
[edit]Hi there Ullman. I was wondering if it would be possible for your bot AutoFormat to automatically do the following when looking at the following:
Just English language entries:
- Capitalize the first letter of the definition in the definition line and add a period to the end of the definition line if one is missing.
All entries:
- Put a line between headers and line breaks.
- Put a line between inflection line and definition lines.
- Move any
{{wikipedia}}
templates anywhere else on the page to right under the appropriate headers for each language. - Put spaces in between the asterisks and any words in any Alternative spellings/forms, Derived, Related terms, and See also.
Languages other than English entries:
- Convert any links to other entries under Related Terms, Derived terms, and See also sections from plain wikilinks to use the correct
{{l}}
template.
If you could do any of those, I would be very grateful :) Razorflame 16:21, 28 April 2010 (UTC)
- There has been some discussion recently about making all definitions sentences - WT:BP#capital letters to begin defintions which is only 2 days old and so this request is premature imho. As for your "all entries" section, I believe AF already does items 1, 2 and 4 on your list. Thryduulf 17:14, 28 April 2010 (UTC)
- I believe it doesn't, and I'd support the ideas. Mglovesfun (talk) 17:26, 28 April 2010 (UTC)
- If it doesn't then I support adding them as well (if it already does them, then I support it continuing to do them). Thryduulf 17:30, 28 April 2010 (UTC)
- I believe it doesn't, and I'd support the ideas. Mglovesfun (talk) 17:26, 28 April 2010 (UTC)
- capitalizing: as mentioned, it is being debated. Yet again. Once again we will have to point out that sentences are only appropriate some of the time; forcing phrases and words to be "sentences" is bad construction, and loses important semantic information. Consider the difference between:
- the all entries things: AF does do 1,2,4 as noted. Moving {wikipedia} is troublesome; if there are multiple languages it must stay in language section, if in "prolog" (section 0) it can only be moved inside a language section if there is only one.
- converting links: maybe ... requires some work to see if there are or are not links that are other things (comments, linked qualifiers or glosses).
Robert Ullmann 14:30, 29 April 2010 (UTC)
rfc-tsort mistag?
[edit]Hi. Likely I'm missing something, but I don't see why AF added rfc-tsort here.—msh210℠ 21:48, 28 April 2010 (UTC)
- I suspect it is because whoever added the lines made a typo and used three closing curly brackets rather than two (ie. {{trans-mid}}} rather than {{trans-mid}}. See User:Robert Ullmann/Mismatched wikisyntax for other problems of this nature. Thryduulf 23:28, 28 April 2010 (UTC)
- Yes, it expects the line to be exactly
{{trans-mid}}
, any variation is probably an indication of a problem. Robert Ullmann 14:34, 29 April 2010 (UTC) - Ah, I looked and looked and still didn't see that there were three braces there. Sorry to have bothered you.—msh210℠ 16:22, 29 April 2010 (UTC)
- They can be difficult to spot, which is why there are so many problems to clean-up. I suspect it's because I've been doing a lot of that that I was able to spot it on this occasion, having got my eye in so to speak. Thryduulf 22:18, 29 April 2010 (UTC)
- We don't have an entry for get one's eye in? Hmm, see the tearoom, as I'm not sure how to define it. Thryduulf 22:20, 29 April 2010 (UTC)
- I just heard a commentator on the Windies cricket match yesterday use the expression. Good addition. Robert Ullmann 04:25, 4 May 2010 (UTC)
- Yes, it expects the line to be exactly
AF reverting itself?
[edit]I refer to this edit, and the next one. I think the first was correct, and don't understand the second one. \Mike 17:34, 2 May 2010 (UTC)
- I saw one of those too, and couldn't figure it out. I had introduced a bug fixing the Prep. phrase case above. Had a data structure with ill-defined semantics, now fixed. Robert Ullmann 23:44, 2 May 2010 (UTC)
AF not doing everything it says
[edit]With this edit [2] AF said it had done a long list of things, but only actually appears to have done one of them. I haven't looked at the entry in detail, but at first glance it seems the other things didn't actually need to be done? Thryduulf 18:48, 3 May 2010 (UTC)
- Yes, transient bug while I was making an improvement. I had had code to change Pronunciation from level 4 to 3 if it was the first header in a language section; I extended this to any L3 header. It is fairly common for people to use L4 headers in error just after the language header, as it looks okay in the TOC, and seems to render fine unless you notice the small difference in font size.
- Anyway, in that case it had a bug, promoted two headers to L3, then pushed them back to where they belonged (;-). So no net effect. Fixed now, is working properly. (Good to see someone noticing! Thanks!) Robert Ullmann 04:23, 4 May 2010 (UTC)
Partially linked language names
[edit]At ear, AF tagged "Northern Sami" as a bad language name [3]. Fixing that to "Northern Sami" (based on the Northern Sami entry) seems like something that should be within AF's capability. Thryduulf 08:50, 7 May 2010 (UTC)
- I'll look at the link/unlink code (sometime soon ...) Robert Ullmann 17:29, 20 May 2010 (UTC)
sort audio pronunciations
[edit]If there is one line in a pronunciation section with one or more IPA, SAMPA, etc templates and one audio template, could AF sort the lines so they occur:
- enPR / IPA / SAMPA / X-SAMPA
- audio
Things get more complicated when there are multiple pronunciations involved, and writing a specification to cover every possibility would be more hassle than it is worth, so with one exception (below) I think AF should just ignore these.
The one exception is that where an audio template appears on the line above a line that starts with {{a}}
, and the audio description matches a parameter of the {{a}}
template (e.g. {{a|UK}} and {{audio|example.ogg|UK}} match, as do {{a|Ca}} and {{audio|example.ogg|Canada}}), then this should be flagged for human attention as there could be multiple ways to sort it out.
Does this make sense? Thryduulf (talk) 16:16, 8 May 2010 (UTC)
- That's going to be very tricky to code for, especially since there are
{{a}}
parameters besides the usual UK and US. How many entries will it help and could you give an example? I'm not sure I know what the problem is that you're hoping to clean up. --EncycloPetey 16:24, 8 May 2010 (UTC)- For an example, see [4]. If the
{{a}}
parameters do not match then AF will just not do anything and the finding and sorting will have to be done manually (we're no worse off). The problem is that although transcription before audio is significantly more common and is also what WT:ELE#Pronunciation specifies, the opposite way round still occurs frequently. Fixing them where there is exactly one of each shouldn't be difficult to code, and for the latter it should just be a case of comparing strings and doing one thing if they match and nothing if they don't. Thryduulf (talk) 16:32, 8 May 2010 (UTC)
- For an example, see [4]. If the
- OK, that makes sense now. Your original description focuseed so much on the exceptions, I couldn't see what it was trying to fix. --EncycloPetey 16:44, 8 May 2010 (UTC)
- Probably could be: if there are exactly two lines, and first is "audio" and second is transcription, swap them. In most cases it will be the right thing to do, in the exception cases it isn't going to make anything worse. However, this doesn't fit the model of any of the other transforms that AF does, so it isn't just adding a rule. (There is one simple way to add a rule that might do, since we don't see sections with multiple lists except when they have subheads, and that would work. confused yet? ;-) Robert Ullmann 18:51, 8 May 2010 (UTC)
Being the complicator I am, I've just thought that we could include {{rhymes}}
lines in this. The order should be:
- transcription
- audio
- rhymes
When there is exactly one each of two or more of these, AF should sort them into this order. If they occur in combination with exactly one {{homphones}}
line, then AF should sort into this order, but keep rhymes and homophones in the same relative order as they were (whatever that is) - if there is a homophones line but no rhymes line, then homophones should come last. Thryduulf (talk) 16:33, 9 May 2010 (UTC)
* list
[edit]Hello. Can AF please add automatically an asterisk before {{list}}
when necessary? For instance, please compare these two entries: [5] and [6] . --Daniel. 08:13, 12 May 2010 (UTC)
- hmm, seems like something of a mess. Added {list} to Startemp. will do as requested. Robert Ullmann 20:07, 15 May 2010 (UTC)
Ligature heading
[edit]per WT:BP#Ligature=== -> ===Letter. Could you please whitelist the ligature header (if it's just a case of adding it to /Headers, I can do that). Conrad.Irwin 11:03, 15 May 2010 (UTC)
- Done. As you say, editing the Headers page is enough, but you have to poke me to restart AF too. Robert Ullmann 20:05, 15 May 2010 (UTC)
Chinese traditional/simplified
[edit]Quite a lot of the recent bad language translation table problems seem to be where there is an entry called: Chinese traditional/simplified. The line format is always exactly the same, so I was wondering if it could be bot-fixed. One example of what needs to be done is [7].
The line format is always:
- *Chinese traditional/simplified: [[traditional characters]], [[simplified characters]] (romanisation)
and needs to be:
- *Chinese:
- *:Traditional: [[traditional characters]] (romanisation)
- *:Simplified: [[simplified characters]] (romanisation)
Anything that isn't in this format should obviously still be reported for human attention. Thryduulf (talk) 09:58, 20 May 2010 (UTC)
ps: If you're looking for an entry to test with, budget and block are two examples. Thryduulf (talk) 10:03, 20 May 2010 (UTC)
- If the romanization is Mandarin (essentially always), then it should be fixed to * Mandarin: (etc.). Really needs to be fixed by a native speaker, or by someone (like me) who can get it right by cross-checking the entries (with or w/o an external reference ;-). Look at the first two trans tables at budget where someone has done it correctly.
- There are apparently about 120 of these. Might be best to do an ordinary bot op; take out the tra/sim part and tag the end of the line with {attention|zh} Will think about it. (Time now is 20:20, time for cricket! Yay!)Robert Ullmann 17:23, 20 May 2010 (UTC)
- In that case it might be an idea to mark {zh|attention} on any of the ones I've fixed recently (this week) (look for edit summaries that mention Chinese). I don't have time atm to find them quickly myself (and any automation that does better than browse will be quicker anyway!) Thryduulf (talk) 18:05, 20 May 2010 (UTC)
Basa language
[edit]In this edit [8] AF is objecting to the language name "Basa", however from what I can see this is the correct name for the language with ISO-639 code bzw, and is what {{bzw}}
outputs. Thryduulf (talk) 10:12, 20 May 2010 (UTC)
- It's actually objecting to "basa\u200c" - see http://en.wiktionary.org/wiki/Wiktionary:GP#na:k Conrad.Irwin 10:31, 20 May 2010 (UTC)
- Ah, cheers. I'd missed that thread. It does seem though that it hasn't been fixed as promised. Thryduulf (talk) 11:03, 20 May 2010 (UTC)
- The template was fixed. The (few) places where it was used (subst'd) have not. Which is why the tagging. Robert Ullmann 17:08, 20 May 2010 (UTC)
- Ah, cheers. I'd missed that thread. It does seem though that it hasn't been fixed as promised. Thryduulf (talk) 11:03, 20 May 2010 (UTC)
lowercase languages
[edit]Can you train AF to auto-change lowercase headers to uppercase, i.e. ==english== to ==English==. My shift key is dodgy --Rising Sun talk? contributions 19:23, 23 May 2010 (UTC)
- As noted below, it does that, but not for language/L2 headers. It doesn't try to correct near-matches on languages and so on as that runs a considerable risk of changing a valid but unknown language into a known but incorrect language.
- But fixing the capitalization shouldn't be too much of an issue, I'll look at that in a little while. Robert Ullmann 09:15, 24 May 2010 (UTC)
- Coded, not tested yet. Robert Ullmann 11:51, 24 May 2010 (UTC)
Similar to above
[edit]Proper Noun -> Proper noun would be useful (but non-essential). [9]. Thanks Conrad.Irwin 00:04, 24 May 2010 (UTC)
- It does sentence-case headers at L3+ unless they are completely unknown, in which case they are tagged. And it picks them up in XML screen so anything missed in RC (which sometimes happens) is fixed later. Robert Ullmann 09:11, 24 May 2010 (UTC)
I've just spent the past half-hour or so correcting misuses of {{q}}
. They were being used as a substitute for '''{{subst:PAGENAME}}'''
to create inflexion lines for initialisms, acronyms, and symbols. When {{q}}
occurs at the beginning of a line, could you convert it to '''{{subst:PAGENAME}}'''
, please? — Raifʻhār Doremítzwr ~ (U · T · C) ~ 01:01, 12 June 2010 (UTC)
- Would you give me a specific example or two? (Not that critical in this case, but always useful.) I'll look at it a bit later today. Robert Ullmann 05:48, 12 June 2010 (UTC)
- I'll give you thirty-three: [10], [11], [12], [13], [14], [15], [16], [17], [18], [19], [20], [21], [22], [23], [24], [25], [26], [27], [28], [29], [30], [31], [32], [33], [34], [35], [36], [37], [38], [39], [40], [41], and [42]. Also note [43] and [44], which are misuses that would not be picked up by AF. — Raifʻhār Doremítzwr ~ (U · T · C) ~ 10:13, 12 June 2010 (UTC)
- should be done; but we really ought to get rid of {q}. (should be short for
{{qualifier}}
IMHO) Robert Ullmann 05:53, 29 June 2010 (UTC)
- Thanks for the fixing. A short form of
{{qualifier}}
is already provided by{{i}}
(originally abbreviated from{{italbrac}}
); making{{q}}
do the same would be needlessly redundant.{{q}}
is intended to be an abbreviation of (deprecated template usage) quotation, (deprecated template usage) quoted word, or such. For some of the background on why{{q}}
is useful and necessary, see Wiktionary:Beer parlour archive/2009/September#.7B.7Bcitedterm.7D.7D. — Raifʻhār Doremítzwr ~ (U · T · C) ~ 01:01, 30 June 2010 (UTC)
- Thanks for the fixing. A short form of
- I've been replacing
{{i}}
and{{i-c}}
(as well as their non-abbreviated forms) with{{qualifier}}
,{{sense}}
,{{gloss}}
,{{a}}
, etc. so that we can make any changes to the display of any of these at some future time if we want -{{i}}
really shouldn't be being used in the main namespace. As it's all context dependent, I don't think think this is automatable. Thryduulf (talk) 08:17, 30 June 2010 (UTC)
- I've been replacing
- I consistently use
{{i}}
only where{{qualifier}}
is appropriate (because the former redirects to, and is therefore equivalent to, the latter); otherwise, I use the more specific templates. Again, since{{i}}
={{qualifier}}
already, it would be pointless to make{{q}}
={{qualifier}}
too. — Raifʻhār Doremítzwr ~ (U · T · C) ~ 09:38, 30 June 2010 (UTC)
- I consistently use
- I'm not suggesting
{{q}}
should redirect to{{qualifier}}
(generally I think we should be using template names that are more descriptive than a single letter -{{qual}}
perhaps).{{i}}
is really not a good name as it refers to the style it produces rather than anything semantic - other than{{i-c}}
and synonyms, I can't think of another example of where this is the case.{{i}}
is fine for use in discussion pages where all you want is italics without any semantics, but we should always want semantics in entries. And as others use/used{{i}}
differently to you, fixing this is something that can never be automated. I'd quite happily deprecate{{i}}
for all namespaces, replacing it with specific templates for the mainspace and something like{{ital}}
for discussion pages. Thryduulf (talk) 10:29, 30 June 2010 (UTC)
- I'm not suggesting
- I'm indifferent to deprecating
{{i}}
, just as long as{{qualifier}}
be abbreviated (as you suggest, to something like{{qual}}
), because fourteen characters is an excessive number to input when, for the vast majority, it is identical in appearance to the six characters(''…'')
. — Raifʻhār Doremítzwr ~ (U · T · C) ~ 12:18, 30 June 2010 (UTC)- I've created
{{qual}}
as a redirect to{{qualifier}}
, for anyone that wants to use it, regardless of any deprecation or not. I'll probably use it as well as I seem to have an intermittent inability to type "qualifier" without making at least one typo. Thryduulf (talk) 13:12, 30 June 2010 (UTC)
- I've created
- I'm indifferent to deprecating
Translation nesting
[edit]Has AF been doing any translation nesting formatting? I ask because I'm trying to reformat User:Atelaes/TargetedTranslations.js to work around some of the nesting issues. So, for example, if the user selects "Greek", it might automatically look up "Greek" and "Modern Greek", or if the user selects "Chinese" it might tell them that this won't return any translations, and they might consider "Mandarin". Also, have you seen Wiktionary:BP#Accelerated Nested Translations? Thanks. -Atelaes λάλει ἐμοί 13:21, 13 June 2010 (UTC)
Hi AF/RU. Would it be possible for AutoFormat to add an entry's PAGENAME to {{q}}
(as, for example, for (deprecated template usage) wateriness: {{q|wateriness}}
) when said {{q}}
is empty (i.e., when the first parameter isn't specified)? — Raifʻhār Doremítzwr ~ (U · T · C) ~ 22:41, 14 June 2010 (UTC)
A reason why can be found at User talk:Conrad.Irwin#.7B.7Bq.7D.7D. — Raifʻhār Doremítzwr ~ (U · T · C) ~ 22:42, 14 June 2010 (UTC)
- Thanks. — Raifʻhār Doremítzwr ~ (U · T · C) ~ 01:01, 30 June 2010 (UTC)
Latin g to IPA ɡ
[edit]Per WT:GP#IPA symbol ɡ, would it be possible for AF to convert all instances of Latin g (U+67) to IPA ɡ (U+261) in {{IPA}}
, {{IPAchar}}
and {{rhymes}}
templates in the main namespace, please. Thanks, Thryduulf (talk) 21:57, 22 June 2010 (UTC)
- Excluding of course the lang=XXX parameters of
{{IPA}}
and{{rhymes}}
where we'd want the Latin g. --Bequw → τ 03:14, 23 June 2010 (UTC) - Thirded.—msh210℠ (talk) 17:13, 23 June 2010 (UTC)
Using the same criteria, could AF also find and fix any uses of Greek ε (U+3B5) instead of the Latin (and IPA) ɛ (U+25B). This was noted as being a problem on the French Wiktionary, so it's not unlikely to have happened here also. Thryduulf (talk) 12:04, 27 June 2010 (UTC)
- Again with the same criteria, ǝ (U+1DD "Latin small letter turned e") should be replaced by ə (U+259 "Latin small letter schwa").
- To clarify the whole request: In
{{IPA}}
,{{IPAchar}}
and{{rhymes}}
templates (excluding lang= parameters) the following substitutions should be made by AF on an ongoing basis:
from to g (U+67) ɡ (U+261) ε (U+3B5) ɛ (U+25B) ǝ (U+1DD) ə (U+259)
- Cheers, Thryduulf (talk) 13:03, 28 June 2010 (UTC)
(I am here, and I've been reading and thinking about it; just haven't had a reply. And there are all kinds of network problems at the present time.) One concern is that AF is supposed to be about all sorts of syntax "errors", and not about making changes that are semantic, or bordering on it. I'd be more inclined to do something separate, that can look at the IPA strings from a current XML dump, and do a number of things. That could include matching to the SAMPA, some comparison to enPR, and so on. Or perhaps not, perhaps these should just be AF rules (which would not be hard, it has a ruleset for pronunciation section lines). Robert Ullmann 13:53, 28 June 2010 (UTC)
- This seems like just the thing AF was designed to do, to be honest. Something small and trivial that's not worth starting a discussion over, but just needs to get fixed. I imagine especially the automatic replacement for g would be welcomed by many people, as it's an easier character to type, so they can be lazy and let AF do the fixing. —CodeCat 14:29, 28 June 2010 (UTC)
- I'd equate these changes to typo fixing, such as the s/See alo/See also/ correction AF made to a header I erred in typing yesterday. Changes like /r/ to /ɹ/ (for English only) that I've suggested previously are closer to semantic (although in that specific instance we have defacto consensus at least), so which is why I've not included them in this request. In the table above the characters in the left column are not IPA characters and so should not appear in IPA strings. Thryduulf (talk)
- (hi CodeCat!) (that comment above took me 26 tries to save ... net better now.)
- The "g" case occurs a lot ... you are quite sure? (;-) see [45] and so on. Seems to work. Robert Ullmann 03:40, 29 June 2010 (UTC)
- I don't like this edit: [46] isn't the first one the same case? we sure about these code points? Robert Ullmann 03:45, 29 June 2010 (UTC)
- Ah, typo, fixed, will try again (msh210 already reverted ;-) Robert Ullmann 03:55, 29 June 2010 (UTC)
- much better [47] Robert Ullmann 04:09, 29 June 2010 (UTC)
Related to this, could you check if there are any pages in the Rhymes: namespace using one of these characters in their page name? If so, and there are any, then please could you produce a list of them. I doubt that there's going to be enough to bother with automation to fix, but if I'm wrong we can revisit that. Thryduulf (talk) 16:51, 30 June 2010 (UTC)
- Here you go:
- Rhymes:English:-ægən
- Rhymes:English:-ɒgə(r)
- Rhymes:English:-ɑnəgræm
- Rhymes:English:-eɪʃǝn
- Rhymes:English:-ɛksəgræm
- Rhymes:French:-ɔg
- Rhymes:French:-εːʀ
- Rhymes:French:-εːʀ(ə)
- Rhymes:French:-εr
- Rhymes:Italian:-uglio
- —CodeCat 17:59, 30 June 2010 (UTC)
- Oh good. I'd noted the first case as AF changed dragon and the link wasn't either made or broken. (There are two pages?) A number of them seemed to have been moved to the correct forms and then Conrad snapped the redirects, so AF isn't finding that many corrections to rhyes templates. Robert Ullmann 04:40, 1 July 2010 (UTC)
- I've gone through and fixed/moved/deleted as appropriate. I've changed those pages that linked to them that I felt needed changing - leaving most main namespace pages to AF. It will be worth looking through the Rhymes namespace pages to check that they are consistently using the right character, but as there is a lot of untemplated usage this will be harder. Thryduulf (talk) 12:21, 1 July 2010 (UTC)
- Let's not forget the colon ":" versus the IPA colon "ː". Mglovesfun (talk) 18:14, 16 July 2010 (UTC)
- Ah yes. Should do that. Not right now, it is late ;-) But yes. Robert Ullmann 19:38, 16 July 2010 (UTC)
- The 'standard' colon is used a lot in Old English entries. Mglovesfun (talk) 20:00, 25 July 2010 (UTC)
- Done, testing. Robert Ullmann 08:19, 26 July 2010 (UTC)
- Like this. Good? Robert Ullmann 08:33, 26 July 2010 (UTC)
- Yeah, I replaced about 20 by hand but it drives me insane as I have to use the toolbar each time. Mglovesfun (talk) 08:37, 26 July 2010 (UTC)
- The 'standard' colon is used a lot in Old English entries. Mglovesfun (talk) 20:00, 25 July 2010 (UTC)
- Ah yes. Should do that. Not right now, it is late ;-) But yes. Robert Ullmann 19:38, 16 July 2010 (UTC)
- Let's not forget the colon ":" versus the IPA colon "ː". Mglovesfun (talk) 18:14, 16 July 2010 (UTC)
- I've gone through and fixed/moved/deleted as appropriate. I've changed those pages that linked to them that I felt needed changing - leaving most main namespace pages to AF. It will be worth looking through the Rhymes namespace pages to check that they are consistently using the right character, but as there is a lot of untemplated usage this will be harder. Thryduulf (talk) 12:21, 1 July 2010 (UTC)
- Oh good. I'd noted the first case as AF changed dragon and the link wasn't either made or broken. (There are two pages?) A number of them seemed to have been moved to the correct forms and then Conrad snapped the redirects, so AF isn't finding that many corrections to rhyes templates. Robert Ullmann 04:40, 1 July 2010 (UTC)
IPA ligatures
[edit]Related to the above, Wiktionary's (de-facto) policy is not to use the following (deprecated) ligatures in IPA transcriptions, so could AF make the following replacements please:
from | to |
---|---|
ʤ | dʒ |
ʦ | ts |
ʣ | dz |
ʧ | tʃ |
ʨ | tɕ |
ʥ | dʑ |
Cheers, Thryduulf (talk) 14:32, 28 July 2010 (UTC)
- I'd not advise doing this. A t-esh ligature is semantically different from single t + esh, and while the IPA has a tie bar for this purposes, it causes major problems with Arial Unicode MS. -- Prince Kassad 16:02, 28 July 2010 (UTC)
- I don't see that as relevant for -
- We shouldn't continue to use deprecated IPA (and I'm not the only one who converts the ligatures to separate characters when I encounter them)
- We already specify non-broken fonts for IPA (I believe), so tie bars can be added where needed, however
- The tie-bars are not explained in any of the pronunciation keys we link to (at least last time I looked), which thus make no distinction between [tʃ] and [t͡ʃ] (even the Wikipedia article simply says they "may be separate or joined by a tie bar")
- (afaik) none of our pronunciation sections make a distinction between [tʃ] and [t͡ʃ]] invalid IPA characters (]) (or any other such pair)
- So given that the distinction between [tʃ] and [ʧ] replace ʧ with t͡ʃ, invalid IPA characters (ʧ)/[t͡ʃ] is not made, making this change has no effect but to standardise on the standard IPA throughout Wiktionary.
- Alternatively, if you really feel he need, we could replace the ligatures with the tie-barred characters, however I think we'd need to explain the tie bars on the pronunciation keys, note they aren't used universally and add them to the toolbar (I think they'd need to be precomposed pairs to make them clickable) first. Thryduulf (talk)
- I don't see that as relevant for -
- Prince Kassad, can you give us an example (entry) where a distinction between [tʃ] and the ligature or tied characters is made (and needed)? From what I have found, the ligature and the separate characters are considered identical in IPA, with the former deprecated, and the tie optional. (and we apparently opt-out in our keys ;-) Mind you, I'm not going to have AF do something that is questionable here. (and no hurry) Robert Ullmann 08:30, 29 July 2010 (UTC)
- Can we consistently replace the ligatures with the tie bar? Robert Ullmann 14:11, 2 August 2010 (UTC)
- We can, but it causes major display problems with w:Arial Unicode MS, which I suppose a lot of people here use.. -- Prince Kassad 14:14, 2 August 2010 (UTC)
- Can we consistently replace the ligatures with the tie bar? Robert Ullmann 14:11, 2 August 2010 (UTC)
others
[edit]See User:Robert Ullmann/IPAchars (;-) Robert Ullmann 13:22, 30 July 2010 (UTC)
- enPR and SAMPA in report. AF happily munching on what it has been given. Robert Ullmann 12:51, 4 August 2010 (UTC)
- I'd love some kind of list about what characters are actually permitted in enPR, just like what I did for IPA. For SAMPA it should be pretty obvious, as that takes only ASCII. -- Prince Kassad 14:45, 4 August 2010 (UTC)
Report is using the list for IPA. There are a few corrections to the list, as /, [, and ] are not valid characters within the IPA strings, and are treated as invalid. (Which means there is something to be fixed.)
SAMPA treats everything > 127 as invalid; this isn't quite right, there are several characters that aren't part of SAMPA or X-SAMPA. We use parentheses as in enPR and IPA. There are a couple of others, but I don't know which.
I also fixed the parsing a bit, it should have been stripping spaces at ends of parameters, so now there are many fewer cases of / apparently occurring inside the string. Robert Ullmann 12:30, 9 August 2010 (UTC)
apostrophe
[edit]As noted U+0027 apostrophe should not occur anywhere in IPA. A lot of observed use should be the primary stress marker (U+02C8), and a small amount should be the modifier apostrophe (U+02BC) used to form ejectives.
An interesting example is ბიჭი which uses the stress marker correctly, and then uses the 0027 apostrophe incorrectly.
Suppose AF were to always convert 0027 to the stress marker.
Pro: common use of ' for ˈ would be automatically corrected, this makes this easier for editors much as the g->ɡ and :->ː conversions.
Con: editors adding ejectives would be required to know and use the modifier apostrophe.
(and we have to sort the existing cases of ejectives, but that needs to be done anyway ;-)
Thoughts on this? Robert Ullmann 12:02, 2 August 2010 (UTC)
- The apostrophe could be added to the edittools. That way, it would be much more accessible. -- Prince Kassad 13:50, 2 August 2010 (UTC)
- The modifier apostrophe is already there, labelled for ejectives. Robert Ullmann 14:02, 2 August 2010 (UTC)
- I also just noticed there are many uses of the typographical apostrophe (U+2019), which is always wrong in IPA context. You should check that out too. -- Prince Kassad 14:09, 2 August 2010 (UTC)
- The modifier apostrophe is already there, labelled for ejectives. Robert Ullmann 14:02, 2 August 2010 (UTC)
- IMO the con outweighs the pro. You've been admirably careful, Robert, to avoid your bots' guessing so as to correct ambiguities, and I see no reason you should change that practice now.—msh210℠ (talk) 16:27, 2 August 2010 (UTC)
- Well I think we can automatically correct a subset of the uses - for example where the apostrophe occurs in a position that a modifier apostrophe legitimately can't (for example as the first character of a pronunciation transcription, or for languages that do not use ejectives). Any occurrence that cannot be unambiguously corrected should of course not be so, but should be marked. The reverse occurrences should also be flagged as wrong (e.g. the ejective modifier apostrophe appearing as the first character of a transcription). Thryduulf (talk) 17:37, 2 August 2010 (UTC)
- As the first character of the transcription, or after
.
(a period/full stop), can be corrected IMO. After a consonant in a language that doesn't have ejectives I'm warier about, as there's always a chance someone transcribed something oddly, as an ejective, in a language that phonemically has none. Maybe I'm being too cautious, though.—msh210℠ (talk) 17:59, 2 August 2010 (UTC)- It can also be converted after a vowel, or any consonant which can not possibly be a valid ejective (n' or r'). -- Prince Kassad 18:03, 2 August 2010 (UTC)
- As the first character of the transcription, or after
- Well I think we can automatically correct a subset of the uses - for example where the apostrophe occurs in a position that a modifier apostrophe legitimately can't (for example as the first character of a pronunciation transcription, or for languages that do not use ejectives). Any occurrence that cannot be unambiguously corrected should of course not be so, but should be marked. The reverse occurrences should also be flagged as wrong (e.g. the ejective modifier apostrophe appearing as the first character of a transcription). Thryduulf (talk) 17:37, 2 August 2010 (UTC)
I added a few columns to the table. In particular "pos eject" is the number that are possibly ejectives, depending on the language. There are, as you see, almost none. "pos error" is the number of possible errors that could occur if a rule converting any 0027 or 2019 followed by alpha was used. The other way to read this is it is the number of cases that would be missed if a rule of (not ptkqsɬʃ) (0027 or 2019) (any alpha) was followed. That may be a usable rule? Robert Ullmann 10:10, 3 August 2010 (UTC)
- It looks okay for me, at least.
However, you should also include tɬ' and tʃ', which are common ejectives in Native American languages.Whoops, I got confused by the HTML entities. -- Prince Kassad 10:24, 3 August 2010 (UTC)- I used the numeric references to make sure I got them right (arrant pedantry: there are "named entities" and "numeric references" ;-) sorry for confusion. Perhaps I'll try this rule after lunch. Robert Ullmann 11:22, 3 August 2010 (UTC)
- Numeric character references, you mean. (As opposed to "named entity references", or "character entity references" as HTML 4.01 calls them.) —RuakhTALK 12:30, 3 August 2010 (UTC)
- :-)
- Numeric character references, you mean. (As opposed to "named entity references", or "character entity references" as HTML 4.01 calls them.) —RuakhTALK 12:30, 3 August 2010 (UTC)
- I used the numeric references to make sure I got them right (arrant pedantry: there are "named entities" and "numeric references" ;-) sorry for confusion. Perhaps I'll try this rule after lunch. Robert Ullmann 11:22, 3 August 2010 (UTC)
Like this ? Robert Ullmann 13:00, 3 August 2010 (UTC)
Although there is this ... things already in error can't be helped I suppose ;-) Robert Ullmann 13:06, 3 August 2010 (UTC)
- If we keep explicit non-IPA pronunciations, we can go through and fix any non-valid characters in them later, both SAMPA and enPR contain fewer legitmate characters so would be a smaller job anyway. Thryduulf (talk) 13:28, 3 August 2010 (UTC)
Looks very good, shall continue; checking each one now, after a while I'll trust it. Haven't seen 2019 yet. Robert Ullmann 13:27, 3 August 2010 (UTC)
- It just came to my mind you should probably check for palatalized and labialized ejectives (02B2 and 02B7). They're uncommon but can occur. -- Prince Kassad 14:59, 3 August 2010 (UTC) (addendum: pharyngealized ejectives too: 02C1)
- Those followed by the modifier apostrophe? really? or what? Robert Ullmann 15:09, 3 August 2010 (UTC)
- Stuff like qʷ', where ' is the ASCII apostrophe. They may constitute valid ejectives. -- Prince Kassad 15:11, 3 August 2010 (UTC)
- Yuck. Okay. Robert Ullmann 15:14, 3 August 2010 (UTC)
another one
[edit]Now for something that should be fairly uncontroversial, dotless-i ı (U+0131) should be corrected to small capital i ɪ (U+026A). -- Prince Kassad 10:24, 3 August 2010 (UTC)
Parsing errors
[edit]Most of the entries in the parsing errors list are caused by an incorrectly closed template. A one-time run finding the string: {{IPA|lang=fr|/*/} (where * is a wildcard matching any number of characters that are not a slash) and appending } to it would fix these (and remove them from the mismatched syntax workload). The transcriptions should then be checked to see if there are other problems.Thryduulf (talk) 10:39, 3 August 2010 (UTC)
- An additional find and replace on the same line changing "}}}" to "}}" also needs to be done for most of these entries.Thryduulf (talk) 10:44, 3 August 2010 (UTC)
- They are almost all Dawnraybot. Yes, pretty simple to fix. Should run that. It is effectively moving one } from the end to where it belongs. Robert Ullmann 10:49, 3 August 2010 (UTC)
- Fixed these. Robert Ullmann 14:32, 3 August 2010 (UTC)
- They are almost all Dawnraybot. Yes, pretty simple to fix. Should run that. It is effectively moving one } from the end to where it belongs. Robert Ullmann 10:49, 3 August 2010 (UTC)
I think I've now fixed all the remaining parsing errors. The IPAchar inside IPA was seemingly caused by an unauthorised bot run in early 2008 that simply converted every instance of IPA:... to {{IPA|/.../}} in French entries, without taking into account that some entries already used {{IPAchar}}
. Thryduulf (talk) 12:59, 4 August 2010 (UTC)
Semicolons
[edit]There appear to be two uses of semicolons.
- semicolons where there should be multiple parameters (e.g. at absorbing) should be fixable using the exiting code where commas separate the parameters.
- html entities e.g. /ˈɛɾ̃ɹ̩pɹɑjz/ at enterprise. Both &nnn; and &xnnn; forms seem to be used. Could AF convert these to the proper unicode character? I've got a vague feeling that it does/did this elsewhere before? Thryduulf (talk) 16:02, 3 August 2010 (UTC)
- yes, the code can easily handle ; as well as , will do soonish
- see this :-) Robert Ullmann 16:40, 3 August 2010 (UTC)
- given the spirit of pedantry on this page, I feel compelled to point out that the edit summary is inaccurate ;) Thryduulf (talk) 17:07, 3 August 2010 (UTC)
- Yes, I thought about changing the edit summary for the rule to include semicolon. This is better? (Although not semicolon in this instance ;-) Robert Ullmann 12:16, 4 August 2010 (UTC)
- I can't fault that! Thryduulf (talk) 12:48, 4 August 2010 (UTC)
- Yes, I thought about changing the edit summary for the rule to include semicolon. This is better? (Although not semicolon in this instance ;-) Robert Ullmann 12:16, 4 August 2010 (UTC)
- given the spirit of pedantry on this page, I feel compelled to point out that the edit summary is inaccurate ;) Thryduulf (talk) 17:07, 3 August 2010 (UTC)
- see this :-) Robert Ullmann 16:40, 3 August 2010 (UTC)
- there was some conversion of both entities and numerics (see above ;-) done a while ago by DoddeBot. AF never did this. Should be more general than IPA of course; to be looked at? Robert Ullmann 16:10, 3 August 2010 (UTC)
- There are some cases where we want HTML entities, for example in cases where MediaWiki screws with characters due to Unicode normalization. However, this should never be the case for IPA. -- Prince Kassad 16:14, 3 August 2010 (UTC)
- Yes, I realize that, and some other issues; takes some care. Robert Ullmann 16:40, 3 August 2010 (UTC)
- Conversion of HTML entities/references/whatevers should IMO, to aid human-readability, not be done for those characters that look like other, more common (especially 7-bit) characters, like dashes (en, em, minus, et al.) and spaces (non-break, et al.) and also (for the same reason) not for invisible characters (zero-width joiner, et al.).—msh210℠ (talk) 14:52, 4 August 2010 (UTC)
- With the exception of invisible characters, I disagree - inclusion of the HTML code significantly decreases human-readability. For example, which is easier to read "јануар" or "јануар"? Also, which glyphs are similar depends on font and size, which vary between systems. Thirdly, "ј" is going to sort differently to "ј" and confuse bots that are looking for that character. Thryduulf (talk) 15:49, 4 August 2010 (UTC)
- Hm, I suppose you're right. Perhaps just dashes and spaces? It's a shame to have a non-break space added by an editor on purpose and another editor replace it by a space, not realizing. (Or, worse, copy it to another page, not realizing it's not a space.) Letters and numbers and things like, that which look like one another, I suspect people are less likely to err on.—msh210℠ (talk) 16:12, 4 August 2010 (UTC)
- Non-breaking spaces, I can see the benefits of, yes. Dashes I'm not so certain about, while – and — look pretty much identical in the editing window, they're easily differentiable on preview and in entry display (- is easily differentiable in both). What do others think? Thryduulf (talk) 18:00, 4 August 2010 (UTC)
- Still, it's pretty confusing if all dashes look the same in the edit window. I've experienced it before. Threrefore, it should really be – or — -- Prince Kassad 18:05, 4 August 2010 (UTC)
- Non-breaking spaces, I can see the benefits of, yes. Dashes I'm not so certain about, while – and — look pretty much identical in the editing window, they're easily differentiable on preview and in entry display (- is easily differentiable in both). What do others think? Thryduulf (talk) 18:00, 4 August 2010 (UTC)
- Hm, I suppose you're right. Perhaps just dashes and spaces? It's a shame to have a non-break space added by an editor on purpose and another editor replace it by a space, not realizing. (Or, worse, copy it to another page, not realizing it's not a space.) Letters and numbers and things like, that which look like one another, I suspect people are less likely to err on.—msh210℠ (talk) 16:12, 4 August 2010 (UTC)
- With the exception of invisible characters, I disagree - inclusion of the HTML code significantly decreases human-readability. For example, which is easier to read "јануар" or "јануар"? Also, which glyphs are similar depends on font and size, which vary between systems. Thirdly, "ј" is going to sort differently to "ј" and confuse bots that are looking for that character. Thryduulf (talk) 15:49, 4 August 2010 (UTC)
- There are some cases where we want HTML entities, for example in cases where MediaWiki screws with characters due to Unicode normalization. However, this should never be the case for IPA. -- Prince Kassad 16:14, 3 August 2010 (UTC)
and see User:Robert Ullmann/HTML entities Robert Ullmann 14:04, 5 August 2010 (UTC)
script G
[edit]I don't know if it's possible to make special rules for SAMPA, but script G should be converted to plain G in SAMPA transcriptions. -- Prince Kassad 18:05, 4 August 2010 (UTC)
- we have special rules for IPA, right? But two things: one is that SAMPA uses =, which makes the syntax nastier; and the other is that this error is more likely to be something mis-identified as SAMPA, and should be looked at. Robert Ullmann 14:06, 5 August 2010 (UTC)
- Essentially, this would do the reverse of what AutoFormat is doing for IPA. The only errors I've seen with SAMPA is enPR being misidentified as SAMPA, and I already cleaned up all or most of them. -- Prince Kassad 15:40, 5 August 2010 (UTC)
- I've seen several instances of IPA misidentified as SAMPA. Thryduulf (talk) 18:14, 5 August 2010 (UTC)
- It mostly seems to be just people who are unfamiliar with SAMPA and thus leave in IPA characters, because they don't know their SAMPA equivalent. -- Prince Kassad 18:34, 5 August 2010 (UTC)
- I've seen several instances of IPA misidentified as SAMPA. Thryduulf (talk) 18:14, 5 August 2010 (UTC)
- Essentially, this would do the reverse of what AutoFormat is doing for IPA. The only errors I've seen with SAMPA is enPR being misidentified as SAMPA, and I already cleaned up all or most of them. -- Prince Kassad 15:40, 5 August 2010 (UTC)
commas to multiple parameters in pronunciation transcriptions
[edit]Would it be possible for AF to edit pronunciation templates that contain a comma or space to show multiple transcriptions in a single parameter and convert them to multiple parameters please? The following rules should catch all cases:
- Replace "/,/" or "/, /" or "/ /" with "/|/"
- Do not match "|/ /" but report as an empty parameter.
- Replace "],[" or "], [" or "] [" with "]|["
- Replace "/..., .../" with "/.../|/.../"
- Replace "[..., ...]" with "[...]|[...]"
- Replace "{{enPR|..., ....}}" with "{{enPR|...|...}}"
In all cases ... represents any string of characters that are not one of "|", "/", ",", "[", "]", "{{" or "}}", and the replacement should be able to cope with more than two transcriptions (e.g. /..., ..., .../}}.
If this isn't possible, can you add these cases to the pronunciation exceptions report.
Thanks, Thryduulf (talk) 12:11, 13 July 2010 (UTC)
- It already does this in the possible cases. The cases like "/..., .../" can't be done, as the pronunciation can be for a phrase with a comma in it (I've seen this somewhere!) The first two, which are done, is most of the cases. I'll look at some point at getting others into the exception report. Robert Ullmann 13:38, 13 July 2010 (UTC)
- Oh, in the cases it does do, it also matches " or " ;-) Robert Ullmann 14:05, 13 July 2010 (UTC)
{{rfp}}
now takes a lang= parameter, so it would be useful if AF could add this where it isn't already there. As AF already adds this to {{IPA}}
, I'm guessing it wouldn't be too difficult. Thryduulf (talk) 01:24, 20 July 2010 (UTC)
- Now it supports it, the same should be done for
{{rfv-pronunciation}}
too. Thryduulf (talk) 23:35, 25 July 2010 (UTC)- And
{{homophones}}
, please.—msh210℠ (talk) 15:38, 5 August 2010 (UTC)
- And
Given that AF also does this for things like context templates, rhymes, {{plural of}}
, etc, would it be perhaps simplest to not mark each template in the code, but to have a page which lists all the templates that AF should add a lang= parameter to if one isn't there already?
Also, does AF do any checks to see if the lang= parameter is correct? e.g. would it do anything about {{plural of|foo|lang=cy}} in a German L2 section? Thryduulf (talk) 11:57, 10 August 2010 (UTC)
- the code to add
lang=
is already there; but only operative inside a Pronunciation section. So missed the reasonable case where someone adds it w/o the header. I've added it to the general list for adding the parameter, and moved IPA to that list as well. And addedrfv-pronunciation
andhomophones
. Could export this list to a page at some point as noted.
- there isn't any check on existing parameters, noted Robert Ullmann 11:52, 11 August 2010 (UTC)
- Thanks.—msh210℠ (talk) 15:50, 11 August 2010 (UTC)
- Thank you. Thinking a bit more on the mismatch of lang= and L2 headers, perhaps it would be better to flag this for human attention rather than assuming the L2 is correct in every case? I guess that in most cases that a mismatch will be a typo or thinko between similar codes (eg de/se, en/enm, etc), but it's possible that the opposite has happened and the L2 is wrong. This is going to be much more obvious though, so perhaps it is safe to do it automatically? Perhaps we should get an idea of the scale of the issue first - if it's only a handful of instances we might as well just do them manually, but if there are lots it might be worth doing at least a subset automatically (e.g. where there are other templates in the L2 section with lang= parameters that match the header). Any thoughts? Thryduulf (talk) 18:49, 11 August 2010 (UTC)
- Scope can't hurt, but I think that in any event the inflection line's using {infl|langcode} or {langcode-foo} (matching the header) suffices to allow automation, as long as the entry isn't in a "...that lack inflection template" category for that language.—msh210℠ (talk) 18:55, 11 August 2010 (UTC) 17:40, 12 August 2010 (UTC)
- Hm, see the recent history of [[Schwachmatikus]], please: AF doesn't seem to be adding lang to rfp.—msh210℠ (talk) 17:22, 7 September 2010 (UTC)
Perhaps less easy, and less important (but a nice to have if it's not too tricky) would be to move {{rfp}}
to within a pronunciation section if it isn't in one already. I think the following rules should cover it:
- If
{{rfp}}
is already in a Pronunciation section (at any level) then leave it where it is - If
{{rfp}}
is not in a Pronunciation section:- If there is one L3 Pronunciation section, move it to the end of that section
- If the
{{rfp}}
is within an Etymology N section, and a pronunciation section exists within the same Etymology N section, move it to the end of that pronunciation section - If no pronunciation section exists, and the
{{rfp}}
is not inside an L3 Etymology N section, create an L3 Pronunciation section immediately before the first L3 POS section or immediately before the first L3 Etymology N section and move the{{rfp}}
there. - If the
{{rfp}}
is within an L3 Etymology N section, create an L4 Pronunciation section immediately before the first L4 POS section if one doesn't exist, and move the{{rfp}}
there
- In all other circumstances leave the
{{rfp}}
where it is. Thryduulf (talk) 01:21, 20 July 2010 (UTC)
Strip tab characters
[edit]Would it be possible for AutoFormat to strip tab characters [48] [49] [50]. Conrad.Irwin 23:12, 1 September 2010 (UTC)
- ...or convert them to spaces (where they're not line-final)?—msh210℠ (talk) 14:38, 2 September 2010 (UTC)
Why has the bot added {{infl}}
given that the entry vietor already had a {{sk-noun}}
which was sufficient for the categorisation of the entry as Slovak noun? The uſer hight Bogorm converſation 17:39, 7 September 2010 (UTC)
bug: removed whole sections without indication!
[edit]Hello,
See this diff without further explanation: [51]
It wasn't easy to notice and not easier to find. Would be wise to check all the possible, similar problems and kindly revert them. Thanks. --grin 11:08, 14 September 2010 (UTC)
- There are not supposed to be two consecutive identical headers. It removed one, as it was designed to do. I have move the content to Talk:hallo pending some support for the claimed origin. Looking forward to continuing the discussion there. DCDuring TALK 20:34, 24 September 2010 (UTC)
Linked parameter
[edit]Why did AutoFormat in October 2008 add a link to the parameter to {{plural of}}
? For the Finnish plurals, roughly a half have this link, the other half don't. --LA2 18:26, 17 September 2010 (UTC)
- O_O Autoformat adds links to
{{plural of}}
?? It's not supposed to have a link, linked parameters break the section anchor and accomplish nothing. AF should remove linked parameters, not add them. --Yair rand (talk) 18:51, 17 September 2010 (UTC)
- The edit comment says "...to make page count" and there is a discussion relating to this in December 2008, User talk:AutoFormat/2008#wikilinking lemma terms in form-of templates. Maybe it was considered a useful edit at the time, but why did it only cover some of the template calls? And as we now think the parameter should not be linked, why have these links not been removed? Is this just by mistake or is there still a good reason? --LA2 19:05, 17 September 2010 (UTC)
- The
{{plural of}}
template is called in 152,000 places (all languages) and 135,000 or 89 percent of these use a linked first parameter. --LA2 20:21, 17 September 2010 (UTC)
- The
AutoFormat replaced ''(generally)'' with {{generally|lang=nl}}. Only problem is, this has been deleted. Mglovesfun (talk) 12:18, 18 September 2010 (UTC)
Automatic "Translations" sections
[edit]Hi. May I ask that, when an entry does not contain a "Translations" section, AutoFormat automatically adds it? Apparently your bot still does not have that helpful function, and I would appreciate if it were introduced. --Daniel. 22:07, 7 October 2010 (UTC)
Has AF gone on Wikibreak?
[edit]AFAICT AF hasn't run since October 5. Why? Can it be restarted? How can such a long absence be avoided henceforth? DCDuring TALK 20:25, 23 October 2010 (UTC)