Module talk:palindromes

match efficiency

Latest comment: 8 years ago5 comments2 people in discussion

@CodeCat Thanks for the fix. I would assume that match, when given an index, would have short circuit logic that would return when it first finds a character. Then again, I suppose it must calculate the unicode offset every time. Lua can be very dumb sometimes. —John C5 18:54, 27 August 2016 (UTC)Reply

gmatch avoids that problem, iterating over a string that way would be O(n). A hybrid method may be the fastest: use gmatch to add all the characters individually to a list, which is O(n), then do the same n/2 processing as before but using list indexing instead to find the characters. Extending a list with an element each time may slow it down a little, but if the list is pre-created with n "nil"s then it would probably not ever reallocate. I really doubt putting that much work into optimising it is worth it though. I like to treat it more like Python: write simple, sensible code first, be clever only if you have to. —CodeCa t 18:57, 27 August 2016 (UTC)Reply

Fair enough. My only concern is that adding this mildly intense operation to all headwords will only exacerbate Lua memory and time overflows like those that happen at water. I'd like to eke out as much efficiency as possible for these high user templates. —John C5 19:29, 27 August 2016 (UTC)Reply

I implemented the hybrid version as I proposed. What do you think? —CodeCa t 19:35, 27 August 2016 (UTC)Reply

Good enough for me! —John C5 20:02, 27 August 2016 (UTC)Reply

Verification

Latest comment: 8 years ago21 comments6 people in discussion

@CodeCat, @JohnC5 These entries are categorized as palindromes and are not currently recognized as such by the module. DTLHS (talk) 21:54, 27 August 2016 (UTC)Reply

@DTLHS: Many of these are what we would consider repeated character entries and a few are just that we have not accounted different diacritics. Some questions that remain are Czech pochop and Hungarian csúcs, szusz, where whoever made the entry considered the ch, cs, and sz to be single characters respectively. I do not know enough of these languages to make a judgment about the validity. Some of the Turkish ones seem to be false. —John C5 01:04, 28 August 2016 (UTC)Reply

@Panda10, @Dan Polansky, what do you think of these? DTLHS (talk) 01:07, 28 August 2016 (UTC)Reply

The Hungarian digraphs cs, dz, gy, ly, ny, sz, ty, zs and trigraph dzs are considered one letter, and should be read the same, so csúcs and szusz are palindromes. See this list of Hungarian palindromes in Wikiquote: [1]. The words kerék and kérek are also palindromes but their meaning is not identical when read backwords. --Panda10 (talk) 01:21, 28 August 2016 (UTC)Reply

In Czech, ch is considered to be one letter for alphabetical sorting purposes, placed after h in the alphabetical order. --Dan Polansky (talk) 07:40, 28 August 2016 (UTC)Reply

@Dan Polansky: Thanks for the tip. At present I'm having the module replace ch with χ, which I hope is not used anywhere in Czech orthography. I suppose we could use a series of very obscure characters for substitution, which would almost guarantee that we never accidentally find a false palindrome. What do you think? —John C5 15:25, 28 August 2016 (UTC)Reply

Couldn't we just use something like the null character (U+0000)? It's guaranteed to never be an entry title. DTLHS (talk) 23:30, 29 August 2016 (UTC)Reply

Telugu uses combining characters that might be difficult to account for: for example విరివి DTLHS (talk) 01:12, 28 August 2016 (UTC)Reply

On the issue of w:Kaibun (also here), I would ask for additional guidance. Should we allow romaji palindromes and if so should we strip macrons? —John C5 01:32, 28 August 2016 (UTC)Reply

I think that the hiragana(+katakana) reading that is passed to the headword should be the only think examined. IMO romaji shouldn't count at all. —suzukaze (t・c) 01:35, 28 August 2016 (UTC)Reply

@suzukaze-c: Could you provide the unvoiced-voiced equivalency as described in the second link (if it is correct). Also, we just have to ignore the entire Latn block then? —John C5 01:45, 28 August 2016 (UTC)Reply

1. Module:ja/data > data.tenconv has what you are looking for. 2. It doesn't seem like Japanese palindromes involve Latn at all. (pinging @Eirikr, Atitarev, TAKASUGI Shinji as other Japanese editors, just in case) —suzukaze (t・c) 02:37, 28 August 2016 (UTC)Reply

@suzukaze-c: I've made the changes, but now we are missing entries like DVD (DVD) and SUS (SUS) which seem theoretically not to be romaji. Should I take into account what script the template uses? —John C5 03:47, 28 August 2016 (UTC)Reply

@JohnC5: I suppose that would work. —suzukaze (t・c) 03:53, 28 August 2016 (UTC)Reply

@suzukaze-c: I think I got it working correctly. —John C5 04:15, 28 August 2016 (UTC)Reply

@Vahagn Petrosyan, are there any considerations for Old or Modern Armenian? —John C5 19:02, 29 August 2016 (UTC)Reply

The existence of Category:Mandarin palindromes seems really wrong to me. —suzukaze (t・c) 03:32, 30 August 2016 (UTC)Reply

Fixed I think- the category can be deleted if it becomes empty. DTLHS (talk) 03:35, 30 August 2016 (UTC)Reply

@JohnC5: In Modern Armenian, Old Armenian and Middle Armenian ու should be considered a single character. In Modern Armenian եւ should be considered a single character. There are no other rules. --Vahag (talk) 09:04, 30 August 2016 (UTC)Reply

@Vahagn Petrosyan: I think I have implemented it correctly. —John C5 14:40, 30 August 2016 (UTC)Reply

Cool. --Vahag (talk) 05:42, 31 August 2016 (UTC)Reply

Two-character "palindromes"

Latest comment: 8 years ago3 comments3 people in discussion

Is Category:Chinese palindromes supposed to have two-character entries like 媽媽 in it? This seems to defy -- Ignore terms that consist of just one character repeated -- This also excludes terms consisting of fewer than 3 characters. (@JohnC5.) —suzukaze (t・c) 06:55, 25 October 2016 (UTC)Reply

It's because Module:palindromes/data explicitly says that repeated characters are palindromes in Chinese. —CodeCa t 14:14, 25 October 2016 (UTC)Reply

Removed. These belong to Category:Chinese reduplications, not palindromes. Wyang (talk) 21:20, 25 October 2016 (UTC)Reply

local missing

Latest comment: 1 year ago1 comment1 person in discussion

Plesae add local on line 13. Thanks. Dpleibovitz (talk) 13:20, 29 October 2023 (UTC)Reply