Module talk:palindromes

From Wiktionary, the free dictionary
Latest comment: 7 months ago by Dpleibovitz in topic local missing
Jump to navigation Jump to search

match efficiency

[edit]

@CodeCat Thanks for the fix. I would assume that match, when given an index, would have short circuit logic that would return when it first finds a character. Then again, I suppose it must calculate the unicode offset every time. Lua can be very dumb sometimes. —JohnC5 18:54, 27 August 2016 (UTC)Reply

gmatch avoids that problem, iterating over a string that way would be O(n). A hybrid method may be the fastest: use gmatch to add all the characters individually to a list, which is O(n), then do the same n/2 processing as before but using list indexing instead to find the characters. Extending a list with an element each time may slow it down a little, but if the list is pre-created with n "nil"s then it would probably not ever reallocate. I really doubt putting that much work into optimising it is worth it though. I like to treat it more like Python: write simple, sensible code first, be clever only if you have to. —CodeCat 18:57, 27 August 2016 (UTC)Reply
Fair enough. My only concern is that adding this mildly intense operation to all headwords will only exacerbate Lua memory and time overflows like those that happen at water. I'd like to eke out as much efficiency as possible for these high user templates. —JohnC5 19:29, 27 August 2016 (UTC)Reply
I implemented the hybrid version as I proposed. What do you think? —CodeCat 19:35, 27 August 2016 (UTC)Reply
Good enough for me! —JohnC5 20:02, 27 August 2016 (UTC)Reply

Verification

[edit]

@CodeCat, @JohnC5 These entries are categorized as palindromes and are not currently recognized as such by the module. DTLHS (talk) 21:54, 27 August 2016 (UTC)Reply

@DTLHS: Many of these are what we would consider repeated character entries and a few are just that we have not accounted different diacritics. Some questions that remain are Czech pochop and Hungarian csúcs, szusz, where whoever made the entry considered the ch, cs, and sz to be single characters respectively. I do not know enough of these languages to make a judgment about the validity. Some of the Turkish ones seem to be false. —JohnC5 01:04, 28 August 2016 (UTC)Reply
@Panda10, @Dan Polansky, what do you think of these? DTLHS (talk) 01:07, 28 August 2016 (UTC)Reply
The Hungarian digraphs cs, dz, gy, ly, ny, sz, ty, zs and trigraph dzs are considered one letter, and should be read the same, so csúcs and szusz are palindromes. See this list of Hungarian palindromes in Wikiquote: [1]. The words kerék and kérek are also palindromes but their meaning is not identical when read backwords. --Panda10 (talk) 01:21, 28 August 2016 (UTC)Reply
In Czech, ch is considered to be one letter for alphabetical sorting purposes, placed after h in the alphabetical order. --Dan Polansky (talk) 07:40, 28 August 2016 (UTC)Reply
@Dan Polansky: Thanks for the tip. At present I'm having the module replace ch with χ, which I hope is not used anywhere in Czech orthography. I suppose we could use a series of very obscure characters for substitution, which would almost guarantee that we never accidentally find a false palindrome. What do you think? —JohnC5 15:25, 28 August 2016 (UTC)Reply
Couldn't we just use something like the null character (U+0000)? It's guaranteed to never be an entry title. DTLHS (talk) 23:30, 29 August 2016 (UTC)Reply
Telugu uses combining characters that might be difficult to account for: for example విరివి DTLHS (talk) 01:12, 28 August 2016 (UTC)Reply

On the issue of w:Kaibun (also here), I would ask for additional guidance. Should we allow romaji palindromes and if so should we strip macrons? —JohnC5 01:32, 28 August 2016 (UTC)Reply

I think that the hiragana(+katakana) reading that is passed to the headword should be the only think examined. IMO romaji shouldn't count at all. —suzukaze (tc) 01:35, 28 August 2016 (UTC)Reply
@suzukaze-c: Could you provide the unvoiced-voiced equivalency as described in the second link (if it is correct). Also, we just have to ignore the entire Latn block then? —JohnC5 01:45, 28 August 2016 (UTC)Reply
1. Module:ja/data > data.tenconv has what you are looking for. 2. It doesn't seem like Japanese palindromes involve Latn at all. (pinging @Eirikr, Atitarev, TAKASUGI Shinji as other Japanese editors, just in case) —suzukaze (tc) 02:37, 28 August 2016 (UTC)Reply
@suzukaze-c: I've made the changes, but now we are missing entries like DVD (DVD) and SUS (SUS) which seem theoretically not to be romaji. Should I take into account what script the template uses? —JohnC5 03:47, 28 August 2016 (UTC)Reply
@JohnC5: I suppose that would work. —suzukaze (tc) 03:53, 28 August 2016 (UTC)Reply
@suzukaze-c: I think I got it working correctly. —JohnC5 04:15, 28 August 2016 (UTC)Reply

@Vahagn Petrosyan, are there any considerations for Old or Modern Armenian? —JohnC5 19:02, 29 August 2016 (UTC)Reply

The existence of Category:Mandarin palindromes seems really wrong to me. —suzukaze (tc) 03:32, 30 August 2016 (UTC)Reply

Fixed I think- the category can be deleted if it becomes empty. DTLHS (talk) 03:35, 30 August 2016 (UTC)Reply
@JohnC5: In Modern Armenian, Old Armenian and Middle Armenian ու should be considered a single character. In Modern Armenian եւ should be considered a single character. There are no other rules. --Vahag (talk) 09:04, 30 August 2016 (UTC)Reply
@Vahagn Petrosyan: I think I have implemented it correctly. —JohnC5 14:40, 30 August 2016 (UTC)Reply
Cool. --Vahag (talk) 05:42, 31 August 2016 (UTC)Reply

Two-character "palindromes"

[edit]

Is Category:Chinese palindromes supposed to have two-character entries like 媽媽 in it? This seems to defy -- Ignore terms that consist of just one character repeated -- This also excludes terms consisting of fewer than 3 characters. (@JohnC5.) —suzukaze (tc) 06:55, 25 October 2016 (UTC)Reply

It's because Module:palindromes/data explicitly says that repeated characters are palindromes in Chinese. —CodeCat 14:14, 25 October 2016 (UTC)Reply
Removed. These belong to Category:Chinese reduplications, not palindromes. Wyang (talk) 21:20, 25 October 2016 (UTC)Reply

local missing

[edit]

Plesae add local on line 13. Thanks. Dpleibovitz (talk) 13:20, 29 October 2023 (UTC)Reply