Jump to content

Wiktionary:Thesaurus/Improvements 1

From Wiktionary, the free dictionary
information This is an archive of the page originally titled "Project - Improving WikiSaurus". Please, do not extend this page.

Project Purpose

[edit]

Build on and improve, refine, the ideas in the budding concept of WikiSaurus.

Project Status

[edit]

Foundering

Project Players

[edit]

All true Wiktionarians invited to participate

Project Page vs Project Talk page

[edit]

This project page tries to represent/summarise the developing consensus view of the current state of the project. Please use the Talk page for discussions which have not yet reached some form of consensus.

Project Topics

[edit]

Page Layout

[edit]

What should the page look like?

Philosophy of Thesaurus

[edit]

Roget's Thesaurus was absolutely stupendous for what it was--a paper classification system for words that mimicked the classification system for plants and animals. With the coming of the computer industry, though, it has become obsolete. In reading all about "head words" and "near synonyms" and other ways to manage an electronic thesaurus, one can see that the a good solution has yet to be found. Most of the people tackling this are too close to the problem. They no doubt have English degrees, Literature awards and other such ephemera but are limited in their knowledge of computers.

Lord knows that's the kind of people who can really do a good job making a dictionary which goes into great depth and encompass so much, but a Literature major can't understand where the average Joe Schmutz is coming from. Unless we plan on this being so esoteric as to be useless to any but the top 5% of the literati, we must come up with a simpler plan.

Electronic Thesaurus

[edit]

We can visualize the function of a thesaurus by thinking of it as a web that links every definition to every other definition through shades of meaning. Using Head-words removes some of the linkages in this fine web. When you do that, it leaves holes that meaning can fall through. I challenge anyone to find one word in the English language that means exactly the same as another. If the nuance is not in their lexicon, it doesn't mean it isn't there. Ours is a huge and complex language. We shouldn't attempt to simplify it by removing bits of it.

On the other hand, why make work for ourselves by adding extra wordage to each page or extra pages which will be difficult to get to. Such is the case with the main page method of wikisaurus. In order to cover all the areas you leave out, twice as many words must be defined. In order to have a main and sub-categories, you end up with an inability to actually get the users to the category. Just as wiktionary doesn't require wikisaurus assistance until the user needs synonyms, so wikisaurus shouldn't require wiktionary unless an in-depth definition (or any of the other things they provide) is needed.

The other part of this equation is the number of synonyms (or by my way of seeing it near-synonyms all), antonyms and other bits and bobs we want to pack into one area. In reading over all these discussions I've heard everything from duplicate the paper template exactly (Roget's) to "Frag it! Let's dump it all into the wiktionary and have done with it!" There were a lot of steps in between (thank God!) Somewhere along the way, though, I saw that something fairly basic was being assumed that shouldn't have been.

The erroneous assumption

[edit]

It's the same assumption that is made by inexperienced pilots; non-mathematicians or physicists. Most of us perceive things in a two dimensional way when we can experience three (pilots) and envision more (physicists and mathematicians.) The arguments were always about what should go on the page and why, but never about how multidimensional we should go. It was touched on when Roget's was brought up ("it's like the skin of an onion".) We're working with a computer here. There is no end to the complexity we can shove on the computer to make it easy for everyone else. ...well, ok, there is an end. We can bring a computer to it's knees if we give it too much to do. I actually experienced that at one of the companies I worked for.(back in the dark ages) The system we had took 8 hours to be booted up.

Flight of Fancy
[edit]

We are nowhere near there. I want to give back to the computer some of the complexity we've taken on ourselves. I want the computer to start using those multi-dimensions to make connections we don't have to make with our brains. Let's start by giving it back all the look-ups. We don't care if it has to go through fifteen hundred linkages to get a word defined. Let it. Just so we don't have to. Let's give it three categories: Synonym, Antonym, and Colloquial/Archaic/Slang. At first everything is a Synonym or Antonym. Over the course of time, more "click-thrus" will be done on some synonyms than on others. We keep track of click-tru and when a word drops below a certain threshold, we have the bot transfer the word from synonym into Colloquial/Archaic, or from Colloquial/Archaic into synonyms. Antonyms will be a dead zone.

We could take it several dimensions further by having a bot go through and sort words by click-thru amount; or assign color values to each span of click-thrus and have the bot change the color based on those values; or do both so that we get a rainbow. This way people who don't want to deal with the complexity don't have to while those who want every nuance mapped will get their wish.

The Page

[edit]

Every word should have it's own wikisaurus page. If it's in the wiktionary, it should be in the wikisaurus. Our choice is between putting every synonym down twice or having every word have a page. For any given page we do, we won't be having three or four words. In some instances we can have as many as fifty. That many linkages per page can actually slow the computer down more effectively than having fifty more pages.

Every wikisaurus page should have all the words, regardless of part of speech on that single page. Otherwise, how does a person find a wikisaurus page with the right part of speech on it? Would they have to go through as many as four different screens to find the right part of speech before looking up their word? Nobody wants to have to do that! By putting it on the same page and having a clear TOC to find the right spot, it relieves a users time and frustration level.

Summary

[edit]

There you have it in one hundred pages or less! We should not have the same structure as wiktionary on our wikisaurus pages and here you can see why this alternate structure is a sane and simple solution.

Notification System

[edit]

How do we keep from stepping on each other's toes?

Above in these topics you will see a place you can put your name if you want to be associated with working on Wikisaurus. Our choices for notification are:

  • We could email each of the people on the list to let them know if we're going to do major revision on a specific page
    • Pro we'll be able to see what was getting done without too much hassle.
    • Con we'll have a rather distasteful job each and every time we work on a page for more than an hour.
  • we could do something like a checkout of pages where we put our names next to the page we want to work on.
    • Pro once it's set up logging pages in and out will be a snap!
    • Con huge job to set up.

That's my take on the notification system. Anyone have something more inventive? Amina (sack36) 06:06, 16 July 2008 (UTC)[reply]