Wiktionary talk:Scribunto
Add topicWhat can we do with scripts?
[edit]I'm not sure how useful or flexible Lua is for us, so I'd like to discuss some possible applications. Please feel free to add more sections. —CodeCat 22:03, 4 September 2012 (UTC)
Replacing code templates
[edit]It's been said that it's very slow to transclude templates from within Lua, so that means our vast database of language codes is essentially unavailable to Lua scripts. That is certainly a problem. Since the barrier is one way, it's presumably more efficient to keep this information in Lua form, or perhaps in both forms for the time being? —CodeCat 22:03, 4 September 2012 (UTC)
- Yes, I think we'll want it in both forms for now. In fact, I could readily imagine our permanently keeping language templates like
{{fr}}
and so on, since a lot of other Wiktionaries have them as well, and visitors from those Wiktionaries seem to expect them. (Templates like{{etyl:he-IL}}
and{{he/script}}
and{{proto:gem-pro}}
, of course, can be jettisoned as soon as we're done with them.) —RuakhTALK 02:02, 5 September 2012 (UTC)- I suppose it would be possible for each of these templates to 'redirect' to the appropriate Lua function? —CodeCat 14:43, 5 September 2012 (UTC)
What would the best way to do this? Presumably we would want to store in a table what we currently store in subtemplates, so for the code en
there would be the table:
en.name = "English"
en.type = "regular" -- should probably change this to 'language' while we're at it
en.names = {"English", "Modern English"} -- maybe merge this into .name, and assume the first value in the list is the default?
en.script = "Latn"
en.family = "gmw"
But if we store all of this information in one module, will that slow down loading of that module a lot? We want these codes to be fast to use, after all. —CodeCat 22:20, 5 September 2012 (UTC)
- Re: structuring the data: I assume that by en.name you mean something like languagedata.en.name?
- Re: slowing down the module: Good question. I had assumed that a module would be cached (as implied by §5.3 of the Reference Manual), but I just tested in two different ways:
- Creating a table-entry named foo initialized to zero, together with an exported function incrementFoo that incremented foo and returned its new value. I figured that if the module was reloaded every time, then this would always return 1, and if it wasn't, then this would return steadily increasing values.
- Creating a table-entry named foo initialized to math.random(), together with an exported function getFoo tath returned its new value. I figured that if the module was reloaded every time, then this would return a new random value every time, and if it wasn't, then this would always return the same value.
- Both tests gave the result that indicate that the module is reloaded every time.
- The Scribunto documentation does mention that "Each {{#invoke:}} call runs in a separate environment. Variables defined in one {{#invoke:}} will not be available from another. This restriction was necessary to maintain flexibility in the wikitext parser implementation", but I guess I didn't realize just what that meant.
- I really don't know quite what to make of this. We frequently load script-template and language-name information dozens of times, even hundreds of times, on a single page. If there's really no caching going on on the server-side, then I'm surprised to hear that this is considered to be a good idea. Are memcached-backed database calls really so expensive?
- —RuakhTALK 23:03, 5 September 2012 (UTC)
- I'm actually not quite sure about the languagedata part. The standard idiom seems to be to export a table containing functions and variables from a module, but who says we can't export the data directly instead by returning itself, rather than a table containing it as a variable? If our module ends up being named Module:lang, then after using
require(lang)
a script could refer to English as justlang.en.name, lang.en.script
and so on. As for the slowdown... I think we should ask Uncle G about this as it could be quite a serious issue for us. —CodeCat 23:13, 5 September 2012 (UTC)- I think we're actually on the same page: I say tomato, you say lang, I say languagedata, you say tomahto. —RuakhTALK 23:39, 5 September 2012 (UTC)
- I'm actually not quite sure about the languagedata part. The standard idiom seems to be to export a table containing functions and variables from a module, but who says we can't export the data directly instead by returning itself, rather than a table containing it as a variable? If our module ends up being named Module:lang, then after using
Simplifying the use of inflection templates
[edit]Most of our templates now require parameters like the stem, certain other settings, endings and so on. There is also quite a large number of them, one for each declensional class. If we have scripts, we could do away with all of that, and allow string functions to auto-detect any variations in the headword (such as which infinitive ending it has) and use them to generate the correct inflected forms. So, for example a Spanish inflection script could work for all regular verbs, and look at the infinitive ending -ar, -er, -ir to decide which endings to add. —CodeCat 22:03, 4 September 2012 (UTC)
- This is true, but we probably don't want to write Lua functions that are too brilliant, because any on-wiki Lua logic will likely need to be duplicated off-wiki for various purposes (XML-dump analyses to make sure our data are consistent, bots to create form-of entries, and so on). Though the good news is, it'll probably be easier to read a Lua function and figure out all of its edge-cases than to do the same with a template. (Have you ever tried to replicate the logic of
{{en-verb}}
in an off-wiki Perl script? I have, and I can tell you, it's no piece of cake!) —RuakhTALK 01:59, 5 September 2012 (UTC)- It is probably easier to make a bot based on a Lua script too, because it's a matter of translating one programming language to another. In fact, we could probably start off with MewBot's Python code and back-port it to Lua for the templates it supports. —CodeCat 09:04, 5 September 2012 (UTC)
- I could volunteer to attempt an automatic Hebrew verb conjugator that would automatically take irregular root letters into account. Do we know if we would be able to invoke scripts stored outside of the "Module:" namespace (e.g. userspace for testing)? --WikiTiki89 (talk) 09:07, 5 September 2012 (UTC)
- Re: first sentence: Please don't. There's some sort of mass delusion among Hebrew-speakers that all verbs are perfectly regular if you just account for all the weak/hollow/guttural root letters, but I assure you that it's not the case. (And if it were the case, I hope we would have created a template that did it by now: templates, after all, are perfectly capable of this sort of selection-and-construction logic. Lua will make it more convenient, and I think we can put a lot of helpful functionality in Lua. But I don't think that Lua will make anything possible, on the Hebrew-conjugation-template front, that wasn't already possible.)
Re: second sentence: I doubt it, but I don't actually know. You can play around on test2wiki: and see if you can get that to work.
—RuakhTALK 12:38, 5 September 2012 (UTC)
- Re: first sentence: Please don't. There's some sort of mass delusion among Hebrew-speakers that all verbs are perfectly regular if you just account for all the weak/hollow/guttural root letters, but I assure you that it's not the case. (And if it were the case, I hope we would have created a template that did it by now: templates, after all, are perfectly capable of this sort of selection-and-construction logic. Lua will make it more convenient, and I think we can put a lot of helpful functionality in Lua. But I don't think that Lua will make anything possible, on the Hebrew-conjugation-template front, that wasn't already possible.)
Inserting language-section fragments inside links
[edit]I've created test2wiki:Module:LanguageifyLinks and test2wiki:Template:LanguageifyLinks. Now something like {{LanguageifyLinks|[[foo]] [[bar|baz]] [[bip#French|bip]]|French}} becomes [[foo#French|foo]] [[bar#French|baz]][[bip#French|bip]]. Something like this could address one serious annoyance I have: I hate taking advantage of {{term}}
and {{onym}}
's support for letting you supply your own links (e.g. {{term||[[yes]]/[[no]]|lang=en}}), because then you have to specify the anchor yourself. It's annoying that we've coupled our templates-that-provide-language-aware-styling so closely with our templates-that-provide-language-aware-linking, but until now there hasn't been a better way. With Lua, however, the templates-that-provide-language-aware-styling can pick apart their contents, including any existing links, and insert these fragments. (And OMG, it was so easy! First thing I've ever written in Lua, and it took all of ten minutes. Though, of course, it may well turn out to have some problems that still need fixing. I didn't test it very thoroughly.) —RuakhTALK 03:28, 5 September 2012 (UTC)
- That is certainly easy yes. What consequences do you think this will have for templates like
{{l}}
and{{term}}
the way we use them now? —CodeCat 09:07, 5 September 2012 (UTC)- Re: first sentence: Dude, word of advice. When someone accomplishes something, they probably don't want you to tell them that what they did was easy — even if they themselves say it was. ;-) Re:
{{l}}
and{{term}}
: Well, this will involve policy decisions as much as technical decisions; but one immediate consequence is that if we modify (for example) headword-line templates to use a module like this, we can eliminate many current instances of{{l}}
(e.g., things like {{en-verb|head={{l|en|pass}} {{l|en|by}}}} can be changed to {{en-verb|head=[[pass]] [[by]]}}). —RuakhTALK 13:57, 5 September 2012 (UTC)
- Re: first sentence: Dude, word of advice. When someone accomplishes something, they probably don't want you to tell them that what they did was easy — even if they themselves say it was. ;-) Re:
Automatic romanizations
[edit]No more "Category:XYZ terms lacking transliteration". Should be fairly easy for some languages and more difficult in others. --WikiTiki89 (talk) 08:50, 5 September 2012 (UTC)
Automatic sort keys
[edit]While having customisable sorting is still the only proper solution, we can use scripts to generate the right sort keys for each language. For this, we can add another 'property' of language codes, the function to use to generate sort keys. By default they would map one-to-one, but they could be changed so that accents are stripped for example. All of our templates (or modules) would then be rewritten to use this, and then we would not need anymore sort=
parameters. —CodeCat 12:54, 6 September 2012 (UTC)
- Instead of "by default they would map one-to-one", I think by default there would be no function, and we'd fall back on DEFAULTSORT. Also, we'd still need to support the possibility of explicitly-specified sort-keys, since there are various languages where such an automated approach would not always work, or would not work at all. —RuakhTALK 13:26, 6 September 2012 (UTC)
When?
[edit]Do we know when Wiktionary is getting Scribunto? --WikiTiki89 (talk) 08:51, 5 September 2012 (UTC)
- According to Uncle G, Lua is getting deployed here today. --Yair rand (talk) 08:54, 5 September 2012 (UTC)
- I'm sorry for the misunderstanding. Wiktionary did not get Scribunto deployed today, and there is no timetable, right now, for Scribunto deployment to wikis other than mediawiki.org and test2.wikipedia.org. To get pretty immediate alerts about decisions and discussion regarding when Scribunto will be deployed to English Wiktionary, add yourself to the cc list for this issue or join the low-traffic tech ambassadors list; tech ambassadors, I hope, will also relay this sort of information to your community. Thanks. Sumanah (talk) 02:01, 6 September 2012 (UTC)
One module for each language?
[edit]I don't know much of how Lua's modules work, but does it sound reasonable to start off having one module per language? If so, what should it be called? By the language code (Module:en) or name (Module:English)? Also, can modules be categorised? —CodeCat 09:08, 5 September 2012 (UTC)
- Having large modules would make it difficult to edit. I think that each independent functionality should be in its own module, for example separate modules for the
{{en-verb}}
and{{en-noun}}
but the same module for{{fr-conj-er}}
,{{fr-conj-ir}}
, and{{fr-conj-re}}
. --WikiTiki89 (talk) 10:59, 5 September 2012 (UTC)- Ok, but in that case, can modules be nested somehow, like subtemplates? I certainly I think it would be nice for the sake of clarity to have a module named en/verb, English/verb or similar. I don't know if Lua supports that though. —CodeCat 14:27, 5 September 2012 (UTC)
Postprocessing?
[edit]Can we use Lua to post-process entire pages? If so, we could use it to do a lot of the things we currently use JavaScript for, so that we no longer need to depend on users having JS turned on. —CodeCat 09:11, 5 September 2012 (UTC)
- I think the answer to your first question is "no", but even if it were "yes", I think your second sentence would be overly optimistic. Of the things that we use JavaScript for, most genuinely do require client-side scripting, because they involve dynamic interaction with an already-downloaded page. (Tabbed languages and collapsing tables, for example, both involve the user being able to show and hide different parts of the page.) There are some things that we use JavaScript for that don't theoretically require client-side scripting, if we had a sufficiently scriptable server, but even with Scribunto, we still don't have a sufficiently scriptable server, so it's moot. (Auto-redirection from [[THE]] to the could be handled by an HTTP redirect instead of a JS redirect; the translations-adder could be a genuine form that connects to a back-end that computes and performs the edit; but I'm reasonably confident that neither of these is achievable by a Scribunto module.) I actually can't think of anything we do in JavaScript that could be moved to Scribunto, but if you have an idea, then — I'm listening. —RuakhTALK 12:16, 5 September 2012 (UTC)
- Would we be able to use it to solve the nikkud order problem? --WikiTiki89 (talk) 12:28, 5 September 2012 (UTC)
- Good question. That's a great idea, but so far as I can tell, no: it seems that, when the return-value of a Lua function is UTF-8-decoded and returned to regular MediaWiki space, the UTF-8-decoding step includes the same normalization that occurs when you save a page. —RuakhTALK 15:49, 5 September 2012 (UTC)
- Actually, on second thought, strike that: I haven't tested this idea, but we probably can, by rewriting return-values to use numeric character references instead of the characters themselves; for example, changing א to א (or, more to the point, changing a dagesh to ּ). Though it might require some thought, to figure out exactly the right place to put that logic. We probably don't want some other Hebrew template to have to deal with the output of a template that performs such a transformation. —RuakhTALK 17:10, 10 September 2012 (UTC)
- Good question. That's a great idea, but so far as I can tell, no: it seems that, when the return-value of a Lua function is UTF-8-decoded and returned to regular MediaWiki space, the UTF-8-decoding step includes the same normalization that occurs when you save a page. —RuakhTALK 15:49, 5 September 2012 (UTC)
- Would we be able to use it to solve the nikkud order problem? --WikiTiki89 (talk) 12:28, 5 September 2012 (UTC)
- To Ruakh: I was thinking of the custom link we have on the GP currently. Another possibility would be to implement at least the page-generating part of TabbedLanguages on the server, and use JS only to switch between the tabs. But you are probably right, this can't be done. —CodeCat 14:30, 5 September 2012 (UTC)
Maintainability and listing usage
[edit]I just realised that one of the biggest tools we have when it comes to templates is being able to list exactly where a template is used. This allows us to easily fix obsolete usage or judge when a template is safe to be deleted. Is this possible with Lua scripts? I certainly hope so. Without it, we would have no way of knowing if a function or module is used once it is written, and so we would have no way of knowing if removing or renaming a function will break things. We'd end up with a lot of cruft code. —CodeCat 14:36, 5 September 2012 (UTC)
- Whatlinkshere and "Templates used on this page" both seem to work with Lua modules. --Yair rand (talk) 15:04, 5 September 2012 (UTC)
- But they work with modules as a whole, rather than with individual functions? If that's the case, then we would probably want to keep our modules small. Say, corresponding to one template or a set of closely related templates. —CodeCat 15:49, 5 September 2012 (UTC)
- FWIW, I expect that there won't be a need for template metaprogramming in the templates that use Lua modules, so simple XML dump analysis should be able to see what modules and functions are called from where. (By contrast with, say,
{{he/script}}
, which is called indirectly by templates that combine a parameter with /script, or{{Hebr}}
, which is called indirectly by templates that use{{he/script}}
to choose a template-name. With Lua, we won't need that sort of thing.) —RuakhTALK 15:55, 5 September 2012 (UTC)
- FWIW, I expect that there won't be a need for template metaprogramming in the templates that use Lua modules, so simple XML dump analysis should be able to see what modules and functions are called from where. (By contrast with, say,
- But they work with modules as a whole, rather than with individual functions? If that's the case, then we would probably want to keep our modules small. Say, corresponding to one template or a set of closely related templates. —CodeCat 15:49, 5 September 2012 (UTC)
Unicode
[edit]Update: I've now tested the Scribunto/Lua Unicode support. The bad news — not even news, this is something we've always known — the bad part is that it doesn't understand Unicode. The good news is that although Unicode presents limitations, it doesn't seem to present any problems. Strings are UTF-8-encoded on the way in, and UTF-8-decoded on the way out, and we can even include non-ASCII characters in literal strings in our Lua module (which Lua, of course, will see as UTF-8-encoded strings). However, we will have to be very cautious not to perform an operation on bytes as though it operated on characters; for example, we have to be careful not to grab half a character, or insert something in the middle of a character, and pattern-matching is probably almost completely useless when it comes to languages that use non-ASCII letters. —RuakhTALK 15:26, 5 September 2012 (UTC)
- Are there any unicode-aware string indexing/substring functions at all? It would definitely be a good thing to have a function that can, say, remove the last two characters from a string. —CodeCat 15:48, 5 September 2012 (UTC)
- No, none. (At least, I'm pretty sure there aren't.) You have to operate directly on the bytes. For example, to remove the last two UTF-8-encoded characters from a string s, you'd write something like s = s:gsub("[^\128-\191][\128-\191]*[^\128-\191][\128-\191]*$", "") (disclaimer: not tested). (Naturally that's "character" in the Unicode sense, not in the real-world sense. Removing the last two grapheme clusters would be decidedly nontrivial.) —RuakhTALK 16:04, 5 September 2012 (UTC)
- I've added some information about Unicode support to the page. We probably need to write a set of basic Unicode support functions ourselves, such as indexing per character. —CodeCat 16:43, 5 September 2012 (UTC)
- Sorry, but what you've written is not really correct. It's true that the Wiktionary database is encoded in UTF-8, but it could just as easily be encoded in UTF-16 without having any effect on anything else. At its core, MediaWiki operates on strings of Unicode characters, and encodes those characters into byte-strings only as needed. I'll see if I can rewrite that section to be more accurate. —RuakhTALK 17:18, 5 September 2012 (UTC)
- Ok that's true, but all the web pages are also in UTF-8, so presumably for the sake of speed the database uses it too. And I suppose Lua supports only byte strings, not double-byte strings? —CodeCat 17:59, 5 September 2012 (UTC)
- Re: "all the web pages are also in UTF-8, so presumably for the sake of speed the database uses it too": No, absolutely not. The database driver converts UTF-8-encoded text in the database into PHP Unicode strings, and the Web-server converts PHP Unicode strings back into UTF-8. There is absolutely no connection between these two things; you can think of it as a bizarre coincidence that they both happen to involve UTF-8. (O.K., so obviously it's not really bizarre. The reasons that UTF-8 is used in the one are more or less the same reasons that UTF-8 is used in the other, so it's not truly surprising that it happens to be used in both. But there is no software connection between these two facts.) Re: "I suppose Lua supports only byte strings, not double-byte strings?": Right. If it supported double-byte strings, there wouldn't really be a problem. I mean, JavaScript only supports double-byte strings — characters outside the BMP are represented using surrogate pairs — and we've never suffered for it. —RuakhTALK 18:21, 5 September 2012 (UTC)
- Oh ok I think I understand now, thank you. What was wrong with the string.sub example though? —CodeCat 18:26, 5 September 2012 (UTC)
- It's not wrong, but I think it's bad advice. Consider the case of a template for French regular -ir verbs. If you just strip off the last two bytes, you'll generate garbage for amuïr, and you'll never know it; I think it's better to specifically strip off ir. And even without the Unicode issues, you'll have problems for maudire, which is a regular -ir verb aside from its infinitive and past participle. (O.K., so I admit: with either approach, you need to check to see if the string matches your expectation, and add a cleanup category if not. When the goal is amuïs, amuïris is not much improvement over amu�is. But I think it's easier to make a mistake, and to lose track of what you need to check for, when you think in terms of "I want to remove two characters" than when you think in terms of "I want to remove ir".) —RuakhTALK 18:52, 5 September 2012 (UTC)
- But isn't that exactly what my code did? It stripped off the final string.len(ending) bytes. So if ending = "ir" it strips two bytes, if ending = "ïr" it strips off three. —CodeCat 18:55, 5 September 2012 (UTC)
- Oh wait, I think I see it. I wrote that code with the assumption that you already know the ending is on the word, and you want to get rid of it. My intention was to show that it's better to write string.sub(word, -string.len(ending)) than string.sub(word, -2) because that will (counterintuitively if you think in characters) not work for "ïr". —CodeCat 19:01, 5 September 2012 (UTC)
- Yes, exactly. I agree that removing string.len(ending) bytes is better than removing 2 bytes, but I think it's best to try to just completely avoid thinking in terms of lengths and indices. In a non–Unicode-aware environment, they exist only to deceive and confuse. :-P —RuakhTALK 19:18, 5 September 2012 (UTC)
- But how would you propose removing an ending then (either knowing or not whether it's present)? —CodeCat 19:47, 5 September 2012 (UTC)
- Using gsub. (See the example in the last bullet-point of good news.) In the case of a French -ir verb module, I might write something like
if(s:match("ir$")) s = s:gsub("ir$", "") else return "'''Error:''' This ''-ir'' verb does not end in ''-ir''! [[Category:French terms needing attention]]"
, or perhapss, count = s:gsub("ir$", "") if(count < 1) return "'''Error:''' This ''-ir'' verb does not end in ''-ir''! [[Category:French terms needing attention]]"
(plus obligatory newlines), until I realized about amuïr and maudire. —RuakhTALK 20:00, 5 September 2012 (UTC)- Ok, I understand using gsub to actually test for the ending, but it could probably also be removed using the code I wrote instead, couldn't it? I imagine that might be faster because the function wouldn't need to parse the pattern (whether it's clearer is another matter). And does Lua not have a 'find substring' function (like .indexOf in JS), only pattern matching? —CodeCat 20:11, 5 September 2012 (UTC)
- As I wrote above, your code isn't wrong, I just think it's bad advice. It won't be faster, no. (I mean, yes, I imagine that will probably be a somewhat more efficient operation, but negligibly so, given the context.) Re: "a 'find substring' function": the Lua analogue of JS s1.indexOf(s2) is s1:find(s2, 1, true) (or string.find(s1, s2, 1, true)), but I'm not sure what you're getting at. :-/ For a full list of Lua's string-manipulation functions, by the way, see §5.4 of its reference manual. —RuakhTALK 20:28, 5 September 2012 (UTC)
- Ok, I understand using gsub to actually test for the ending, but it could probably also be removed using the code I wrote instead, couldn't it? I imagine that might be faster because the function wouldn't need to parse the pattern (whether it's clearer is another matter). And does Lua not have a 'find substring' function (like .indexOf in JS), only pattern matching? —CodeCat 20:11, 5 September 2012 (UTC)
- Using gsub. (See the example in the last bullet-point of good news.) In the case of a French -ir verb module, I might write something like
- But how would you propose removing an ending then (either knowing or not whether it's present)? —CodeCat 19:47, 5 September 2012 (UTC)
- Yes, exactly. I agree that removing string.len(ending) bytes is better than removing 2 bytes, but I think it's best to try to just completely avoid thinking in terms of lengths and indices. In a non–Unicode-aware environment, they exist only to deceive and confuse. :-P —RuakhTALK 19:18, 5 September 2012 (UTC)
- It's not wrong, but I think it's bad advice. Consider the case of a template for French regular -ir verbs. If you just strip off the last two bytes, you'll generate garbage for amuïr, and you'll never know it; I think it's better to specifically strip off ir. And even without the Unicode issues, you'll have problems for maudire, which is a regular -ir verb aside from its infinitive and past participle. (O.K., so I admit: with either approach, you need to check to see if the string matches your expectation, and add a cleanup category if not. When the goal is amuïs, amuïris is not much improvement over amu�is. But I think it's easier to make a mistake, and to lose track of what you need to check for, when you think in terms of "I want to remove two characters" than when you think in terms of "I want to remove ir".) —RuakhTALK 18:52, 5 September 2012 (UTC)
- Oh ok I think I understand now, thank you. What was wrong with the string.sub example though? —CodeCat 18:26, 5 September 2012 (UTC)
- Re: "all the web pages are also in UTF-8, so presumably for the sake of speed the database uses it too": No, absolutely not. The database driver converts UTF-8-encoded text in the database into PHP Unicode strings, and the Web-server converts PHP Unicode strings back into UTF-8. There is absolutely no connection between these two things; you can think of it as a bizarre coincidence that they both happen to involve UTF-8. (O.K., so obviously it's not really bizarre. The reasons that UTF-8 is used in the one are more or less the same reasons that UTF-8 is used in the other, so it's not truly surprising that it happens to be used in both. But there is no software connection between these two facts.) Re: "I suppose Lua supports only byte strings, not double-byte strings?": Right. If it supported double-byte strings, there wouldn't really be a problem. I mean, JavaScript only supports double-byte strings — characters outside the BMP are represented using surrogate pairs — and we've never suffered for it. —RuakhTALK 18:21, 5 September 2012 (UTC)
- Ok that's true, but all the web pages are also in UTF-8, so presumably for the sake of speed the database uses it too. And I suppose Lua supports only byte strings, not double-byte strings? —CodeCat 17:59, 5 September 2012 (UTC)
- Sorry, but what you've written is not really correct. It's true that the Wiktionary database is encoded in UTF-8, but it could just as easily be encoded in UTF-16 without having any effect on anything else. At its core, MediaWiki operates on strings of Unicode characters, and encodes those characters into byte-strings only as needed. I'll see if I can rewrite that section to be more accurate. —RuakhTALK 17:18, 5 September 2012 (UTC)
- I've added some information about Unicode support to the page. We probably need to write a set of basic Unicode support functions ourselves, such as indexing per character. —CodeCat 16:43, 5 September 2012 (UTC)
- No, none. (At least, I'm pretty sure there aren't.) You have to operate directly on the bytes. For example, to remove the last two UTF-8-encoded characters from a string s, you'd write something like s = s:gsub("[^\128-\191][\128-\191]*[^\128-\191][\128-\191]*$", "") (disclaimer: not tested). (Naturally that's "character" in the Unicode sense, not in the real-world sense. Removing the last two grapheme clusters would be decidedly nontrivial.) —RuakhTALK 16:04, 5 September 2012 (UTC)
Further update: Although Lua itself doesn't have any Unicode or UTF-8 support, mw:Extension:Scribunto/API specification#ustring API mentions that Scribunto offers some UTF-8 string-manipulation functions. I haven't had a chance to try them out, so I don't know if they're even implemented yet, but if and when they are, they'll probably be very useful. —RuakhTALK 17:05, 10 September 2012 (UTC)
An example please
[edit]Could someone generate an example module / template that could replace one of our existing templates please (just to give the rest of us an idea of how difficult it will be to program). Ideally one that has some, but not too much, logic in it - maybe en-adv or something similar. SemperBlotto (talk) 13:37, 6 September 2012 (UTC)
- I've created a template test2wiki:Template:EnWiktEnAdv, backed by the module test2wiki:Module:EnWiktEnAdv. A few notes:
- Various cases are demonstrated at test2wiki:EnWiktEnAdv.
- I attempted to modify as little actual behavior as possible, so this preserves some odd behaviors from
{{en-adv}}
that we would probably change if we were doing this "for real". (For example,{{en-adv|er|more|more}}
will happily include two copies of "more ___". And for some reason,{{categorize}}
is written in such a way that Template:en-adv is listed in Category:English adverbs. Under "e", no less. And all the glossary-links are to #comparable, even though that only makes sense for one of them.) - The only thing I knowingly changed is — the logic of
{{isValidPageName}}
is really hard to reverse-engineer, and it's not like it's a perfect test anyway (it thinks that the empty string is valid, but that a single hyphen is not; it thinks that leading spaces are O.K., but trailing spaces are not; and so on), so I just dropped it in favor of a more straightforward test. - I couldn't figure out the Scribunto+Lua equivalents of {{NAMESPACE}} and {{PAGENAME}} and {{SUBPAGENAME}} — I think they're just not implemented yet — so for now, test2wiki:Template:EnWiktEnAdv passes them to test2wiki:Module:EnWiktEnAdv as "configuration". Obviously that's not how we'll do it "for real".
- —RuakhTALK 19:29, 7 September 2012 (UTC)
- Thanks. It's not simple is it? I wonder how much more efficient it is. By the way, Uncle G helped me create something simpler in the test wiki - and changed my PAGENAME to config.pagename (where config was created from pframe.args - see the Module:It on the test wiki). SemperBlotto (talk) 20:47, 7 September 2012 (UTC)
- I think it's quite simple, if you're used to reading programming-language code. It's not as terse as wikitext, but that's a good thing. Re: "changed my PAGENAME to config.pagename": Yes, that's the same thing I did. (By the way, you mean frame.args, not pframe.args. It's an important difference.) —RuakhTALK 21:02, 7 September 2012 (UTC)
- It seems exceedingly odd tht all the work is being done by test2wiki:module:EnWiktEnAdv but that test2wiki:template:EnWiktEnAdv is necessary anyway. Can't entries include
{{#invoke:EnWiktEnAdv|main|ier|iest}}
rather than{{EnWiktEnAdv|ier|iest}}
and remove the need for the middleman template? Wouldn't that reduce processing time? Is it undesirable for some reason?—msh210℠ (talk) 19:07, 11 September 2012 (UTC)- ...and if the counterargument is that adding
main
is hard on editors, I contend that it's no harder than the rest of the syntax they add (like{{
), especially ifmain
is a fairly constant name for functions used in many of our modules.—msh210℠ (talk) 19:09, 11 September 2012 (UTC)- In this case, I think the counterargument is that the use of Scribunto is an implementation detail, and we don't want to edit every English adverb entry to invoke a Scribunto module instead of calling
{{en-adv}}
. But in cases where a single template is used by many other templates, rather than by entries, we might well want to completely delete the template after Luacization. —RuakhTALK 19:28, 11 September 2012 (UTC)
- In this case, I think the counterargument is that the use of Scribunto is an implementation detail, and we don't want to edit every English adverb entry to invoke a Scribunto module instead of calling
- ...and if the counterargument is that adding
- It seems exceedingly odd tht all the work is being done by test2wiki:module:EnWiktEnAdv but that test2wiki:template:EnWiktEnAdv is necessary anyway. Can't entries include
Now in use
[edit]Scribunto is here today.
All Italian conjugation templates call the single "Module:Itconj". A few spelling mistakes got through the testing process, probably because all the conjugated forms gave red links in the displayed table (on the test Wiki) and spelling mistakes didn't stand out. All now fixed as far as I can tell. SemperBlotto (talk) 09:11, 19 February 2013 (UTC)
- Damn. After all these months waiting, I forgot that the test Wiki uses first-letter capitalisation. I intended to name the module "itconj" on our wiki - I'll fix it later. SemperBlotto (talk) 09:27, 19 February 2013 (UTC)
- I think it-verb would be a better name, or at least it-conj. —CodeCat 14:45, 19 February 2013 (UTC)
Efficiency
[edit]Thanks for that. I checked Italian conjugation tables (Lua) against the virtually identical French ones (non-Lua). Italian O.55 secs, French 0.9 secs. SemperBlotto (talk) 10:06, 19 February 2013 (UTC) (text moved from content page)
- You can compare the old non-Lua version of it-conj with the current Lua one. I did a few runs and got the following results:
- Non-lua: 1.072, 0.970, 0.789, 0.994, 0.714, 1.088, 0.927; average 0.936s
- Lua: 0.748, 0.782, 1.332, 0.775, 0.721, 0.882, 0.723; average 0.852s
- Interestingly, non-Lua got the lowest time (0.714) and Lua got the highest time (1.332). --Njardarlogar (talk) 13:25, 19 February 2013 (UTC)
- You could improve the module by replacing the successive concatenations (conj = conj .. xx) by an append to a table (table.insert(tab, "xx")), concatenated at the end (table.concat(tab)). The reason is that every time a concatenation is made like conj = conj .. xx, the conj variable is recreated.
- Also, for the sake of cleanness (and in order to avoid unintended conflicts), I suggest that all variables for forms should be put in a table like (form['pres1s']). Checking the conditions would only necessitate a loop, instead of writing everything by hand : prem1s = p.over(prem1s,args["prem1s"]); -> loop on form[xx] = p.over(form[xx], args["xx"]). Dakdada (talk) 14:21, 29 April 2013 (UTC)
mw.log()
[edit]How exactly does this output anything? It never seems to do anything for me - nothing appears in the debug console. --Njardarlogar (talk) 16:24, 26 February 2013 (UTC)
- I don't know - I couldn't get it to do anything either. SemperBlotto (talk) 16:27, 26 February 2013 (UTC)
- It's working fine for me. Running something that contains mw.log in the debug console displays the text right above the input box. --Yair rand (talk) 17:37, 26 February 2013 (UTC)
- Ah, maybe it was the "debug console" that I couldn't find. SemperBlotto (talk) 17:40, 26 February 2013 (UTC)
- I was trying to use it when previewing pages from the module page. I need to be able to pass arguments to the module if the debugging function is to be of much use. Is there any way to pass arguments that can be accessed via
frame:getParent()
while debugging; or will I have to rewrite the script in order to debug it if I am accessing the arguments that way? That would seem really inconvenient. --Njardarlogar (talk) 18:16, 26 February 2013 (UTC)
- I was trying to use it when previewing pages from the module page. I need to be able to pass arguments to the module if the debugging function is to be of much use. Is there any way to pass arguments that can be accessed via
Checking for existence of an element of an array (list)
[edit]Not sure if it's the right terminology but there's an array (or list) of forms elements (declared as "local forms = {}"
Is it correct to check for an existence of an element like this:
if forms["past_n2"] ~= "" then SOME CODE end
--Anatoli (обсудить/вклад) 02:19, 29 April 2013 (UTC)
I'm trying to add (as a test for now) this:
| ]=] .. forms["past_n"] .. [=[
to this
| ]=] .. forms["past_n"] if forms["past_n2"] ~= "" then ", " .. forms["past_n2"] end .. [=[
but I get error messages. Even a test with a string (with or without quotes didn't work:
| ]=] .. forms["past_n"] if forms["past_n2"] ~= "" then "TEST" end .. [=[
--Anatoli (обсудить/вклад) 02:29, 29 April 2013 (UTC)
- An undefined value is always nil. Dakdada (talk) 11:53, 29 April 2013 (UTC)
- Thank you. --Anatoli (обсудить/вклад) 13:33, 29 April 2013 (UTC)
We need your feedback to improve Lua functions
[edit]Hello,
If you’re regularly using Lua modules, creating and improving some of them, we need your feedback!
The Wikidata development team would like to provide more Lua functions, in order to improve the experience of people who write Lua scripts to reuse Wikidata's data on the Wikimedia projects. Our goals are to help harmonizing the existing modules across the Wikimedia projects, to make coding in Lua easier for the communities, and to improve the performance of the modules.
We would like to know more about your habits, your needs, and what could help you. We have a few questions for you on this page.
Thanks a lot for your help, Lea Lacroix (WMDE) (talk) 08:50, 27 March 2018 (UTC)