User talk:Jberkel

From Wiktionary, the free dictionary
Latest comment: 6 hours ago by Jberkel in topic Wanted
Jump to navigation Jump to search
Archive
Archives

Catalan pronunciations

[edit]

Hi, just a note to be careful when adding Catalan pronunciations. For example, you added a pronunciation of ê to esquetx, which is wrong (it should be é) and unlikely in any case, since ê generally only occurs with inheritances and some old borrowings, and esquetx is a recent borrowing from English. I have documented the sources of pronunciation in the documentation to {{ca-IPA}}; in particular, only trust the DCVB for Balearic pronunciations and don't trust cawikt at all. Benwing2 (talk) 02:34, 28 January 2024 (UTC)Reply

@Benwing2: Ok, I thought cawikt was fairly reliable. Btw, thanks for your great work on the Catalan corner! Jberkel 10:42, 28 January 2024 (UTC)Reply

Statistics

[edit]

Hi Jberkel, willst du noch einen neunen Update der Statistik machen? Dein letzter stammt schon wieder vom 1. Juli. Ja, ich weiß dass es eine Menge Zeit und Computerkraft beansprucht, aber ich denke wir alle möchten das einfach schon mal wieder wissen. :) Steinbach (talk) 17:18, 22 February 2024 (UTC)Reply

@Steinbach Hallo, würde ich gerne regelmäßig machen, aber es gibt immer noch Datenprobleme mit den HTML-Dumps: phab:T305407. Die letzten einigermaßen kompletten Daten sind vom letzten Juli. Die WMF-Leute arbeiten daran, aber irgendwie dauert das ewig, bin schon ständig am nachfragen :( Jberkel 17:42, 22 February 2024 (UTC)Reply
@Steinbach Gibt frische Stats… Jberkel 00:53, 5 June 2024 (UTC)Reply

HTML Dump

[edit]

Hi, I saw your posts complaining about the lack of HTML dumps as I had the same issue. I ended up creating my own HTML dump using the API to rapidly download millions of entries. I used the 20240220 XML dump as a base so that the two dumps would include exactly the same revisions. Note that the same wikitext can produce different HTML code at different points in time, so I can't guarantee that the page looks exactly as it did at the time of the XML dump.

  • Pages included: non-redirects in namespaces 0 (main) and 118 (reconstruction)
  • Number of lines: 7,952,575
  • Time generated: February ‎20, ‎2024, ‏‎7:49:52 PM to ‎February ‎22, ‎2024, ‏‎1:16:18 AM (EST)
  • Uncompressed size: 112,213,194,308 bytes
  • Compressed size: 5,482,140,342 bytes

Would you be interested in the code or the dump itself?

Ioaxxere (talk) 20:05, 22 February 2024 (UTC)Reply

@Ioaxxere Lol, I'm close to starting a project myself, given the glacial progress on the WMF side. Yes, I'm interested, how did you get the HTML, how long does it take? Is it the Parsoid rendered version which is used in the HTML dumps? If you want we can join forces and run it as a community project. Jberkel 09:44, 23 February 2024 (UTC)Reply

The script works by grabbing HTML data using a revision ID. For example: https://en.wiktionary.org/w/api.php?action=parse&oldid=65853771&format=json. I'm not sure what parser is used but it seems to correspond with "view page source" in my browser. Here is the code:

Then I verified the output with this code:

Which produced:

These correspond with pages in the XML dump that have recently been deleted.

I don't have the time/resources to generate these on a regular basis, but you're welcome to adapt this code for your purposes!

Ioaxxere (talk) 19:56, 23 February 2024 (UTC)Reply

Oh god, I just realized that adding &parsoid=true to the API query gives *far* better data. Time to rerun... Ioaxxere (talk) 20:09, 23 February 2024 (UTC)Reply
Cool, thanks! We could run it on WMF infrastructure. Great to see that 50 lines of Python yield better results than the WMF's buzzword soup of Kafka, DAGs and what have you… How long does it take to do a full run? Jberkel 15:20, 26 February 2024 (UTC)Reply
nm, you already had in your post, almost 2 days… :) Jberkel 15:57, 26 February 2024 (UTC)Reply
Even if the WMF some day manage to produce useful dumps again, we'll still need wiki-specific namespaces such as Reconstruction, so it'll be useful to have some way of generating them ourselves. Jberkel 15:58, 26 February 2024 (UTC)Reply

ScribuntoUnit vs. UnitTests

[edit]

I just discovered there are two unit testing frameworks here, Module:UnitTests used by everyone but you, and Module:ScribuntoUnit used by you. The former is older than the latter, so I'm not sure why you imported the latter from Wikipedia, but I think we should consolidate. Can you think about converting your unit tests to use Module:UnitTests? Benwing2 (talk) 20:34, 10 March 2024 (UTC)Reply

Hi, just wondering if you got my msg. Can you at least clarify why you imported and started using Module:ScribuntoUnit in preference to our own module? BTW I just discovered a third unit test framework, Module:QFQ/UnitTests, used only on Module:mnw-translit. Benwing2 (talk) 07:43, 14 March 2024 (UTC)Reply
Hi @Benwing2, sorry had short Wiktionary hiatus. It's been a long time (~ 10 years), but I think when I first looked at Module:UnitTests it was a spaghetti mess and didn't have the features I wanted. That's probably no longer the case, and I agree it's better to standardize on one framework. Jberkel 09:27, 15 March 2024 (UTC)Reply

catalogue raisonné

[edit]

Wwoww, Jberkel, you're fast. Wanted to cite the same Guardian passage here, and it was already there ... MistaPPPP (talk) 12:55, 19 March 2024 (UTC)Reply

Apologies

[edit]

I need to apologise to you also, about my simple edit in my archaic paragraph about certain 'etymologies that discredit Wiktionary' that it should have completely disrupted the edit section including yours - there should really be mechanism in place to stop this from happening, since any innocent editor could well make a similar mistake that if not detected quickly as both Surjection and I did, it could cause linguistic mayhem! Regards, Andrew Andrew H. Gray 11:40, 29 March 2024 (UTC)

On ass...

[edit]

What Doyle said was about this:

https://en.m.wiktionary.org/wiki/arse#English

Here, ass is another way of spelling arse (as in dumb). Lunatone3000 (talk) 22:24, 4 April 2024 (UTC)Reply

The reputation system

[edit]

You mentioned this in a beer parlour comment about "the reputation system, for good or ill".

The reputation system is for ill.

There are editors like me whose behavior is scrutinized. And people are willing to make inaccurate claims about how many or few productive edits I've

Then there are other editors who have almost no ability at all to get along with other editors or admit wrongdoing. But, because they're perceived as being essential to the project, it's unacceptable to question their opinions or behavior. Purplebackpack89 13:46, 5 June 2024 (UTC)Reply

I'd say there's a mix of different people finding problems with your edits: editors who had already mentally "blacklisted" you (Equinox, putting you in the "moron" box), WF (creating RFDs "for the lulz" to create havoc), and more level-headed/diplomatic editors who see real CFI/process-related issues. As -sche pointed out, because there are so many different editors involved, it's difficult to conclude that *all* of them are here to harass you. And because this has been going on for years, patience/good will/faith is running low… Jberkel 14:59, 5 June 2024 (UTC)Reply
"Because there are so many different editors involved" makes it feel like I'm being harassed regardless of why they are doing it. Perhaps unwittingly, Equinox name-calling and WF/Denazz trolling made it harder for somebody like Benwing to legit address my edits. Knightwho is somewhere in between. While he may also legit want to clean up the project, he has a long and well-documented history of being confrontational. And the other problem is that Benwing and Knight could've maybe noticed that I felt put upon at the moment and maybe waited, say, a couple of weeks until things had died down. There wasn't anything they were doing that had to be addressed immediately. They didn't do that. Purplebackpack89 16:37, 5 June 2024 (UTC)Reply

Wanted

[edit]

User:Jberkel/lists/wanted hasn't bin updated4a while. Can we get it bac, pls? Denazz (talk) 22:28, 5 June 2024 (UTC)Reply

now iz bac. zorry for ze inconviniance caused. Jberkel 09:24, 6 June 2024 (UTC)Reply