Wiktionary:Votes/pl-2010-05/Placenames with linguistic information 2
Placenames with linguistic information 2
[edit]- Voting on: Amending Wiktionary:Criteria for inclusion as follows: Deleting the existing section:
Names of specific entities
[edit]This section regulates the inclusion and exclusion of names of specific entities, that is, names of individual people, names of geographic entities, names of mythological creatures, names of planets and stars, etc.
Many names of specific entities should be excluded while some should be included. There is no agreement on specific rules for the inclusion of names of specific entities.
and replacing it by:
Names of specific entities
[edit]This section regulates the inclusion and exclusion of names of specific entities, that is, names of individual people, names of geographic entities, names of mythological creatures, names of planets and stars, etc.
Many names of specific entities should be excluded while some should be included. With the exception of geographic entities (for which see the section "Place names"), there is no agreement on specific rules for the inclusion of names of specific entities.
Place names
[edit]Place names, that is, names of geographic entities, are subject to the criteria for inclusion specified in the section "General rule", extended with the following additional requirements. A place name entry should initially include at least two of the following:
- An etymology. This is insufficient as one of the two necessary items for a multiple-word place name, such as (deprecated template usage) South Carolina.
- A pronunciation.
- Information about grammar, such as the gender and an inflection table.
- A translation that is not spelled identically with the English form. A place name that is in itself such a translation, like the French entry for (deprecated template usage) Londres, also meets this requirement.
- An additional definition in the same language as something else besides a place name, for example as a surname.
Also, names of streets and other minor landmarks needed to create addresses, such as (deprecated template usage) Madison Street or (deprecated template usage) Elm Avenue are not included if they can be recognized as street names or the like from their wording. For example (deprecated template usage) Strand can be included, since the name gives no indication of its being a street. A street name that has other definitions, such as (deprecated template usage) Harley Street has, is also acceptable.
Only minimal information about the place in question should be given. If the name is shared by several places, some of the places bearing the name can have a dedicated sense line, while other ones can be covered under a summary sense line such as "Any of a number of cities in Anglophone countries". Entries like (deprecated template usage) London, Ontario should not be made.
- Vote starts: 00:00, 30 June 2010 (UTC)
- Vote ends:
24:00, 30 July 2010 (UTC)- 24:00, 13 August 2010 (edited by DAVilla 22:35, 31 July 2010 (UTC))
- Vote created: --Makaokalani 14:19, 14 May 2010 (UTC)
- Discussion:
Support
[edit]- Support --Makaokalani 11:30, 30 June 2010 (UTC)
- Support, although I do think that "There is no agreement on specific rules for the inclusion of names of specific entities, except that there is a specific regulation for geographic entities, specified in the section "Place names"." could be expressed far simpler as "With the exception of geographic entities (for which see the section "Place names"), there is no agreement on specific rules for the inclusion of names of specific entities." Thryduulf (talk) 12:51, 30 June 2010 (UTC)
- See, I had time to change it, since it's just you and me here yet.--Makaokalani 13:00, 30 June 2010 (UTC)
- I hadn't thought of that! This now has my full unqualified support. Thryduulf (talk) 13:18, 30 June 2010 (UTC)
- See, I had time to change it, since it's just you and me here yet.--Makaokalani 13:00, 30 June 2010 (UTC)
- Support To the barricades, placename lovers. This time we must win. --Vahagn Petrosyan 13:56, 30 June 2010 (UTC)
- Support. After over four years and nearly 400kb of discussion, this issue had better finally come to a close. --Yair rand (talk) 14:01, 30 June 2010 (UTC)
- Support Bequw → τ 14:33, 30 June 2010 (UTC)
Support very weakly.I am not happy about the wording "Entries like (deprecated template usage) London, Ontario should not be made". From discussion, it seems to me that what's meant is "Entries like (deprecated template usage) London, Ontario, where a place name is followed (respectively, preceded) by another to specify which referent of the first (second) name is meant, but where the first (second) referent is also sometimes written without the second (first), should not be made", but it'd be nice if that were spelled out in the wording. More importantly, I'm really not happy that there is no exclusion for things like (deprecated template usage) River des Peres, where the only real name is (deprecated template usage) des Peres, like there is for (deprecated template usage) Elm Street, and that's why my vote in support is very weak.—msh210℠ (talk) 16:58, 30 June 2010 (UTC)- Actually (deprecated template usage) London, Ontario was only meant to exclude names of the City, State or River, Country type. But all place names are subject to the CFI General rule. River des Peres looks like sum of parts to me. I removed the other examples because, as you can see on the talk page, every contributor has a different idea of what an idiomatic place name is. It's better discussed on a rfd page for a specific name than on a vote page. But Elm Street is not called Elm, so a separate rule for street names was needed.--Makaokalani 11:55, 1 July 2010 (UTC)
- Support. I don't understand this vote 100% but I think it's an improvement. --Anatoli 00:13, 1 July 2010 (UTC)
Support. - just so long as "A place name entry should initially include at least two of the following" does not mean "A place name entry must initially include at least two of the following:" SemperBlotto 21:26, 4 July 2010 (UTC)- I think that that is what people are taking it to mean. I know I am. If it doesn't contain that, it'll get RFDed and deleted if not fixed, and if a bunch are created in a row that don't contain that minimal info, then they'll be speedily deleted, is my understanding.—msh210℠ (talk) 21:31, 4 July 2010 (UTC)
- Support Mglovesfun (talk) 19:18, 24 July 2010 (UTC) with reservations. However overall this is much better than what we have now (no consensus) and I was just hoping it would pass without my vote. If that doesn't make sense well, it doesn't make sense to me either. Mglovesfun (talk) 19:18, 24 July 2010 (UTC)
- Support. --Thrissel 17:57, 27 July 2010 (UTC)
- Support. I am not particularly fond of including purely encyclopædic information here, but since I shun different approaches applied to the various kinds of geographical places (e. g. regionally famous names like The Mall are permitted, but what if I create площад Александър Невски, the main square in Sofia (also regionally famous)?) and because already extant entries about seemingly notorious places in several countries would hardly be deleted, then theoretically every village and palanka in my region must be treated on an equal footing together with said entries. The uſer hight Bogorm converſation 18:22, 31 July 2010 (UTC)
- Support Ivan Štambuk 18:24, 31 July 2010 (UTC)
- Hmm, the vote is already over. You were just a bit too late. -- Prince Kassad 18:32, 31 July 2010 (UTC)
- Please, do not strike unilaterally votes cast by users who are entitled to partake of the current vote. They are valid according to the current voting requirements. It is up to the person closing the vote to decide whether to disregard them. The uſer hight Bogorm converſation 18:38, 31 July 2010 (UTC)
- Where does it say that the vote period is actually completely irrelevant and you may vote whenever you want? Certainly not here, people regularly strike votes if you're early. -- Prince Kassad 18:48, 31 July 2010 (UTC)
- I agree with Bogorm. It's certainly in line to comment that a vote is late, and I think someone closing the vote could be justified in disregarding it; but striking a vote seems inappropriate, since that notation usually indicates that the voter has retracted his or her vote. —RuakhTALK 19:21, 31 July 2010 (UTC)
- Where does it say that the vote period is actually completely irrelevant and you may vote whenever you want? Certainly not here, people regularly strike votes if you're early. -- Prince Kassad 18:48, 31 July 2010 (UTC)
- Please, do not strike unilaterally votes cast by users who are entitled to partake of the current vote. They are valid according to the current voting requirements. It is up to the person closing the vote to decide whether to disregard them. The uſer hight Bogorm converſation 18:38, 31 July 2010 (UTC)
- Hmm, the vote is already over. You were just a bit too late. -- Prince Kassad 18:32, 31 July 2010 (UTC)
- Support EncycloPetey 19:44, 31 July 2010 (UTC) I'm voting a bit late by virtue of being mostly offline much of the past month and not keeping up with discussions as I'd like.
- Support Krun 15:54, 1 August 2010 (UTC)
- Support —Stephen 07:41, 2 August 2010 (UTC)
- Support —Internoob (Disc•Cont) 17:23, 2 August 2010 (UTC)
- Support Ƿidsiþ 18:37, 2 August 2010 (UTC) Bit restrictive, but an improvement I suppose. Ƿidsiþ 18:37, 2 August 2010 (UTC)
Support I don't understand this vote 100% but I think it's an improvement. --Anatoli 01:18, 3 August 2010 (UTC)- Anatoli, you have already voted above in the cast vote no. 6 on 1 July 2010, so I have striken your second vote. --Dan Polansky 10:13, 3 August 2010 (UTC)
- My apologies, I voted in the first round and then removed my vote as I was late. Well, this is the 2nd round but I thought it's the same page. I didn't know what I was doing. --Anatoli 02:29, 4 August 2010 (UTC)
Oppose
[edit]Oppose AugPi 17:35, 30 June 2010 (UTC)
- Oppose. It seems to me that with a pronunciation and translation into my favorite language, the spot on the wall next to the translucent thumbtack on the bottom left corner of the Elvis poster above the dresser in my bedroom would pass if but for attestation, which is to say that this isn't so much a criterion for inclusion as it is a criterion for presentability. As much as I would commend the efforts of anyone who could provide the required information for the most unheard-of places, I'm not sure I would want to encourage them to do so without guidelines for actual selection. DAVilla 04:38, 2 July 2010 (UTC)
- The spot on your wall isn't a geographic entity, but it's not a worthless spot if it has an attested translation. All attested given names and surnames are included, so it's frustrating that place names are not. They are often etymologically related. Pirita is a Finnish given name and also a district of Tallinn. But I dare not make the Estonian entry because the place is so small.--Makaokalani 11:52, 2 July 2010 (UTC)
- Nothing in the text of the vote says that the translation has to be attested; and while logically it doesn't make sense for people to list non–CFI-meeting translations, in practice I've never seen that enforced. —RuakhTALK 14:13, 2 July 2010 (UTC)
- All information in this dictionary is subject to WT:CFI#Attestation. If somebody adds an erratic or nonexistent translation, it can be rfv'd. It's true I've never seen such a rfv. Regular contributors are more likely to replace it with the correct word, or to wipe out obvious nonsense. Place names refer to a specific place, or to a finite set of places, so translations are either right or wrong. There is no room for creativity. Sentences may be translated with imagination, and that's one of the reasons for a separate Phrasebook.--Makaokalani 11:22, 3 July 2010 (UTC)
- Attestation is a very low bar. Obviously mine wasn't meant to be an example itself, but since it's so difficult to work with hypotheticals maybe we should find an example of some obscure location to illustrate the point. I'm saying, and I'm hoping someone can back this up, that there is such a constructed term that is geographical and that is attested and which can be easily created under this rule with pronunciation and with grammar and without the need for so much as a single translation (or maybe with a translation and not one of the first two options). There aren't many people here who want every city and town to be created as dictionary entries, yet that's exactly what this would allow, if not even more. DAVilla 21:09, 4 July 2010 (UTC)
- Three cites ([1], [2], [3], all uses, not mentions) in Hebrew for פלאטבוש (“Flatbush" or "Midwood”). All place names in Hebrew are feminine, so there's your grammatical info, which is enough (though pronunciation is easy, too).—msh210℠ (talk) 21:16, 4 July 2010 (UTC)
- And the Hebrew name of probably every little wadi in Israel is attested, and they all are masculine.—msh210℠ (talk) 21:26, 4 July 2010 (UTC)
- Yes, those are a good illustration of my objection to this vote. I was also thinking of something even more indirect like Guatemala City sinkhole which is geographic, idiomatic in that there could be several sinkholes in the capital city with this referring to a specific one, and in a year's time may easily be attested. Or there may be better examples. DAVilla 01:33, 6 July 2010 (UTC)
- Guatemala City Sinkhole would merit a discussion for several reasons, but assuming it were included, together with the name of a wadi in Israel, why is it harmful? Very rare surnames and given names are included, too. None of the present contributors specialize in rare names. Gender and translations are linguistic data, regardless of the size of the place.--Makaokalani 10:33, 6 July 2010 (UTC)
- To clarify, since SemperBlotto changed his vote after this edit, and Vahagn may have misunderstood his reason: Guatemala City Sinkhole does not meet the CFI of the vote now. I was assuming what DAVilla suggested, that (1.) the sinkhole is not filled in, but becomes a permanent tourist attraction, and (2.) there are other permanent sinkholes in Guatemala City with specific SOP names like X Street Sinkhole and only this one is called Guatemala City Sinkhole in two languages. Even then you might argue that a sinkhole that has swallowed a house is not a geographic entity but a construction, like a man-made hole measuring 18 m across would be (its cause is not quite clear), or that the name refers to the incident, not to the place. On the other hand it might acquire a symbolic meaning, as in "Any place name discussion becomes a Guatemala City Sinkhole".
- Multiple word place names should be explained separately in a Wiktionary:About place names. Personally, I would delete Lake Ontario and River Thames in favor of Ontario and Thames, but I removed examples from this vote in the fear that they might contradict the CFI General rule and make the vote invalid.--Makaokalani 13:53, 7 July 2010 (UTC)
- First of all, what does it matter if the sinkhole is filled in? Once attested always attested, assuming the citations span one year, but that could happen even after the hole is filled in. Nothing here says that it has to be a currently existing geographic location, nor should it, unless you're seriously considering excluding modern names for ancient sites? Precedent is that terms do not die, they are only marked “historic” or such. I'm not even sure what it means for a location to be “permanent”.
- Also, what does it matter if it becomes a tourist attraction? Nothing here says that it must be of any notoriety, much less specify any way to measure such importance, only that it be attested.
- Why would it have to be called “Guatemala City Sinkhole” in two languages? So what if a place doesn't have the same name in any other language, even as a literal translation? The wording states that it must have different names in at least another, nowhere mentioning anything about having the same name in two languages.
- How are you inferring, or where in the new language is it written, that there must be other such geographic locations within Guatemala City, or anywhere in the world, much less attested?
- What does it matter that the hole is man-made? Are cities geographic entities or merely constructions? Your explicit inclusion of certain street names easily counters this point. Furthermore, there is a loophole in that the text only applies to such “minor landmarks needed to create addresses”. Since Guatemala City Sinkhole is not used to create an address, the proviso that it should not be possible to identify the location from the name does not apply.
- It seems that your are trying to judge the inclusion of such a term on your own subjective opinions about what is worthy for inclusion, but that is not what objective criteria are meant to accomplish. If you can come up with a test that expresses your instinctive reasoning as to what constitutes a valid versus an invalid entry, then please present that. In the meantime we will vote on the wording in the existing vote. DAVilla 02:34, 25 July 2010 (UTC)
- Three cites ([1], [2], [3], all uses, not mentions) in Hebrew for פלאטבוש (“Flatbush" or "Midwood”). All place names in Hebrew are feminine, so there's your grammatical info, which is enough (though pronunciation is easy, too).—msh210℠ (talk) 21:16, 4 July 2010 (UTC)
- Nothing in the text of the vote says that the translation has to be attested; and while logically it doesn't make sense for people to list non–CFI-meeting translations, in practice I've never seen that enforced. —RuakhTALK 14:13, 2 July 2010 (UTC)
- The spot on your wall isn't a geographic entity, but it's not a worthless spot if it has an attested translation. All attested given names and surnames are included, so it's frustrating that place names are not. They are often etymologically related. Pirita is a Finnish given name and also a district of Tallinn. But I dare not make the Estonian entry because the place is so small.--Makaokalani 11:52, 2 July 2010 (UTC)
- Oppose SemperBlotto 10:40, 6 July 2010 (UTC) SemperBlotto 10:40, 6 July 2010 (UTC) I think that the names of ALL places that actually exist may be included here.
- A proposal allowing ALL places has no chances of passing. That's why this proposal was made, to appease placenameophobes. Now your vote plays into their [placenameophobes’] hands. --Vahagn Petrosyan 10:58, 6 July 2010 (UTC)
- If that's how you feel then you should probably support this vote, as it's not that far off from what you describe. DAVilla 02:34, 25 July 2010 (UTC)
- Oppose The "abstain" votes suggest that this really does not have the sort of consensus that would make me O.K. with it. (In particular, Mglovesfun explicitly states that it can be amended later, but I'm really not sure that's true. This vote opens the floodgates to all place-names, so we need to decide beforehand whether we're O.K. with that.) —RuakhTALK 14:12, 7 July 2010 (UTC)
- Oppose. A place name may potentially have all of the requested, but the initial creator might not know of them, particularly if that initial creator is a newbie. bd2412 T 15:30, 10 July 2010 (UTC)
- Yes, but he can find them out. The request for data is an answer to the argument, "We don't need place names because they are all explained in the Wikipedia already." All place name proposals seem to be opposed both for being too strict and too lenient.--Makaokalani 12:45, 15 July 2010 (UTC)
- Oppose -- Prince Kassad 19:14, 24 July 2010 (UTC)
- What is it about this proposal that makes you oppose it? In case this vote ends in defeat for the proposal and we resume the status quo of no consensus, an explanation of why tou oppose will be very helpful in drafting a future proposal that stands a chance of approval. Thryduulf (talk) 01:22, 25 July 2010 (UTC)
- I have no obligation to specifically disclose my reasons for opposing. -- Prince Kassad 09:08, 25 July 2010 (UTC)
- Indeed not, it's just helpful if people understand why so they can address the the reasons, or if you've spotted something others haven't then they might change their views to match yours. See the recent vote about adding language statements to transliterations for an example of the second. Thryduulf (talk) 09:56, 25 July 2010 (UTC)
- I changed my mind, actually. However my reason is the same I stated before already - it allows every single Polish village to pass on the basis that they had German names when they were on German territory, yet at the same time it causes larger, maybe even capital cities such as Port Moresby to fail the criteria and get deleted. -- Prince Kassad 13:08, 1 August 2010 (UTC)
- I added etymology and as there already were translations in non-Roman scripts, two conditions are met and Port Moresby fails the criteria no longer. I daresay any capital city can be easily treated like this if necessary. --Thrissel 16:35, 1 August 2010 (UTC)
- Etymology does not count as one of the two things needed. See the criteria being voted on, above.—msh210℠ (talk) 18:15, 2 August 2010 (UTC)
- Hum ho, you're right, overlooked that. Funny rule. All the same I presume Port Moresby has a pronunciation, although I'm too lazy to look for it. --Thrissel 19:15, 3 August 2010 (UTC)
- and as there already were translations in non-Roman scripts - imho this should not count at all. Otherwise, every village in the world would be eligible, since you can transcribe all of them in Cyrillic/Arabic/any other script. That's a huge loophole right there. -- Prince Kassad 19:59, 3 August 2010 (UTC)
- That's where we differ. What you see as a loophole would for me make an entry worth even without a second criterion. --Thrissel 20:07, 5 August 2010 (UTC)
- Etymology does not count as one of the two things needed. See the criteria being voted on, above.—msh210℠ (talk) 18:15, 2 August 2010 (UTC)
- I added etymology and as there already were translations in non-Roman scripts, two conditions are met and Port Moresby fails the criteria no longer. I daresay any capital city can be easily treated like this if necessary. --Thrissel 16:35, 1 August 2010 (UTC)
- I changed my mind, actually. However my reason is the same I stated before already - it allows every single Polish village to pass on the basis that they had German names when they were on German territory, yet at the same time it causes larger, maybe even capital cities such as Port Moresby to fail the criteria and get deleted. -- Prince Kassad 13:08, 1 August 2010 (UTC)
- Indeed not, it's just helpful if people understand why so they can address the the reasons, or if you've spotted something others haven't then they might change their views to match yours. See the recent vote about adding language statements to transliterations for an example of the second. Thryduulf (talk) 09:56, 25 July 2010 (UTC)
- I have no obligation to specifically disclose my reasons for opposing. -- Prince Kassad 09:08, 25 July 2010 (UTC)
- What is it about this proposal that makes you oppose it? In case this vote ends in defeat for the proposal and we resume the status quo of no consensus, an explanation of why tou oppose will be very helpful in drafting a future proposal that stands a chance of approval. Thryduulf (talk) 01:22, 25 July 2010 (UTC)
Abstain
[edit]Abstain.Would the current Dutch entry for Lissabon meet this proposed amendment to CFI? If it does then I would switch to Support. If it doesn't, then I would Oppose.AugPi 17:17, 30 June 2010 (UTC)- It would not. It'd need some etymological, grammatical, or pronunciation info added. That said, I'm not too happy about this either, as (as I explain above) I think it lets in too much. You seem to think it lets in too little. So I'm not moving your vote to the "Oppose" section — in the hope that you'll leave it here as a compromise.
:-)
—msh210℠ (talk) 17:25, 30 June 2010 (UTC)- It does have implicit grammatical information: the word "Lissabon" is used without article, so there is no need to specify gender, and it has no declension either. AugPi 17:28, 30 June 2010 (UTC)
- Eh, I would hope (and support) that that wouldn't count.—msh210℠ (talk) 17:35, 30 June 2010 (UTC)
- Then the entire article Lissabon would have to be deleted: is that so? AugPi 17:30, 30 June 2010 (UTC)
- Well, if someone puts any section of it up for deletion, someone else can just add no declined forms to the inflection line (for Dutch, and other info, as appropriate, for another language), and it'd be kept. No one (I don't think) is saying these'll be speedily deleted.—msh210℠ (talk) 17:35, 30 June 2010 (UTC)
- That said, I'd think they can be speedily deleted if added en masse (not meeting the requirements of the vote). I mean, if someone is going to add a million town names that don't meet the criteria, we can nuke them, right?—msh210℠ (talk) 13:13, 2 July 2010 (UTC)
- I agree.--Makaokalani 11:30, 3 July 2010 (UTC)
- That said, I'd think they can be speedily deleted if added en masse (not meeting the requirements of the vote). I mean, if someone is going to add a million town names that don't meet the criteria, we can nuke them, right?—msh210℠ (talk) 13:13, 2 July 2010 (UTC)
- Well, if someone puts any section of it up for deletion, someone else can just add no declined forms to the inflection line (for Dutch, and other info, as appropriate, for another language), and it'd be kept. No one (I don't think) is saying these'll be speedily deleted.—msh210℠ (talk) 17:35, 30 June 2010 (UTC)
- And the article Lissabon in sister Wiktionaries would have to be deleted as well, if the other Wiktionaries used the same proposed CFI. AugPi 17:33, 30 June 2010 (UTC)
- The CFI used on the English Wiktionary affects only the English Wiktionary. If other Wiktionaries choose to use the same or similar criteria, that is entirely up to them and has no bearing on which criteria we choose to use. If a word meets the CFI of, some/all or none of the non-English Wiktionaries has no bearing on whether it meets our CFI or not, just as the presence or absence of a word on the English Wiktionary does not indicate that it does or does not meet the CFI of any given non-English Wiktionary. Thryduulf (talk) 23:12, 30 June 2010 (UTC)
- How would you say "beautiful Lisbon" in Dutch? Would you need to know the gender? - Judging from the discussions, nobody is planning to rfd old entries of the Lissabon type. They can be completed in good time when editors come across them. The rules mostly affect new entries.--Makaokalani 11:55, 1 July 2010 (UTC)
- That said, we probably make cleanup categories by language for existing entries. --Bequw → τ 14:38, 1 July 2010 (UTC)
- It does have implicit grammatical information: the word "Lissabon" is used without article, so there is no need to specify gender, and it has no declension either. AugPi 17:28, 30 June 2010 (UTC)
- It would not. It'd need some etymological, grammatical, or pronunciation info added. That said, I'm not too happy about this either, as (as I explain above) I think it lets in too much. You seem to think it lets in too little. So I'm not moving your vote to the "Oppose" section — in the hope that you'll leave it here as a compromise.
Abstain. I find little here that I like, and much that I dislike; but if it does have consensus, then that's nice, at least. It'll be the first time we've had anything like that on this subject. —RuakhTALK 16:42, 1 July 2010 (UTC)
- Abstain. @Makaokalani: you would say "het mooie Lissabon," so you're right: it would be better with the gender specified (and then I guess it would meet the proposed CFI). AugPi 04:54, 2 July 2010 (UTC)
- Abstain. It lets in too much.—msh210℠ (talk) 00:39, 6 July 2010 (UTC)
Abstain.Not because it allows too much but because there are ambiguities. However if this passes, I imagine there's room to amend it. So basically it's looking pretty good. Also, surely you can pronounce anything, and anything can have an etymology of sorts. Mglovesfun (talk) 10:41, 6 July 2010 (UTC)
- Abstain Dan Polansky 13:47, 29 July 2010 (UTC): I have supported in the previous round of this vote, but I have now considerable doubt about the wording and all-inclusiveness. The Geographic Names Information System (GNIS) database of the United States has around 2,200,000 entries[geonames.usgs.gov]; have a look and enter "New York". It is unclear that the sort of multi-word entries as found in this database really get excluded, as the proposed wording does not equip the classes of information "pronunciation" and "information about grammar" with a non-compositionality treatment, unlike etymology ("# An etymology. This is insufficient as one of the two necessary items for a multiple-word place name, such as South Carolina."). Thus, pronunciation and grammar information such as inflection are able to justify the inclusion of almost any attestable multi-word geographic name in inflected and gender-equipped languages. While the requirement of attestability has some filtering power, its filtering power remains unclear.
The wording I have been playing with is the following: "A geographic name should be included iff it is attestable and it is non-compositional with respect to at least one class of lexical information other than the meaning of the name. A term is non-compositional with respect to a class of information (such as pronunciation or inflection) iff that class of information recorded on the term cannot be derived from that class of information recorded on component words of the term. Lexical classes of information considered for this regulation include etymology, pronunciation, gender, inflection, and translation, but not definition. "
But even with this wording, it is unclear how many entries would get included. It would be really advisable to have an estimate of the number of geographic entries that are likely to get included by the proposed criteria. --Dan Polansky 13:47, 29 July 2010 (UTC)
- While the requirement of attestability has some filtering power, its filtering power remains unclear. - the requirements are not suppose to filter any place name. There is absolutely no reason why any of those 2M US place names should not be added. The requirements are there to force users to add lexicographically relevant data, and not merely increase the number of entries with arguably useless entries, solely for the number of entries sake. --Ivan Štambuk 18:31, 31 July 2010 (UTC)
- Then why should those requirements go in Wiktionary:Criteria for inclusion? —RuakhTALK 23:18, 31 July 2010 (UTC)
- Strictly speaking these requirements guide not the inclusivity of the terms, which is a function with binary output of idiomacity+attestability+usefullness, but the threshold of comprehensiveness of the entries themselves, simply in order to prevent certain undesirable actions. But, since the output of both of these is the same (i.e. the decision whether the entry will get deleted or kept), it's reasonable to cover them on the same policy page. --Ivan Štambuk 12:26, 1 August 2010 (UTC)
- What you seem to be saying is that you consider it okay for Wiktionary to have ten times as many entries for geographic names as the entries for terms that are not proper nouns (common nouns, adjectives, verbs, etc.). I am simply not sure that I like this prospect. --Dan Polansky 07:25, 1 August 2010 (UTC)
- Not ten times, but likely as hundreds of perhaps thousands as many entries. There are at least 100 million species on this planet (possibly over a billion if you count for the extinct ones), and some day these two categories of place and species names will likely dominate both Wiktionary and Wikipedia in the overall entry/article count. Your concerns over the marginalization of "proper" words are irrational and misguided. This moment more than 9 out of 10 entries on Wiktionary are junk entries, inflected forms generated by bots simply because MediaWiki has no lemmatization capabilities. Place names are words just as any other, they are used and useful and there is no reason to exclude them. --Ivan Štambuk 12:26, 1 August 2010 (UTC)
- There is nothing necessary about Wiktionary containing all geographic names and all species names. This is a choice to be made by Wiktionary editors, a choice about the scoping of Wiktionary. That you see no disadvantage to including all geographic names and all species names in Wiktionary suggests you have not considered the cons seriously, and are only pushing the pros. If you did the accounting, and said that the pros outweigh the cons, that would be more convincing, to me anyway. In the absence of any cost-benefit analysis (or advantage-disadvantage analysis, phrased in a language that does not use the finance metaphor), I naturally feel uncertain about the best course of action. --Dan Polansky 19:57, 1 August 2010 (UTC)
- Could it be that, like me, Ivan does not see any cons in "all words, in all languages" meaning "all words (including proper nouns), in all languages"? Thryduulf (talk) 09:21, 2 August 2010 (UTC)
- For starters, if I want to get ten random English words and get ten random geographic names and species names instead, that is an undesirable thing. Admittedly, this may get overcome with further refinement of the random-word function. Downstream consumers of Wiktionary data would need to setup more filtering if they have no interest in the overflood of geographic names and species names. A downloader of an XML dump has to download many times (or hugely many times) bigger dump; operations on the complete dump get many times slower. None of these disadvantages is probably fatal to the proposal to include all geographic names and all species names in Wiktionary, but these disadvantages have not even been acknowledged to exists. Instead, I read that "there is absolutely no reason" to exclude some geographic names, an unconditional and strong claim. --Dan Polansky 13:01, 2 August 2010 (UTC)
- But Polansky you get now when you click on Special:RandomPage in 9 out of 10 cases some useless non-lemma junk. Do you mind that? You hypothesize that one day the number of place names would dwarf that of other "normal", in your opinion more useful in nature, entries, but this is exactly what this proposal is trying to prevent by requiring that we get only quality entries for place names which are not generated automatically. There is no way that such place names could come any close to the number of inflected forms which already contaminate the majority of the main namespace and number more than 1M. You also forget that we allow personal names and surnames now, which are much easier to generate and in fact can be completely automated as you don't even need to write a definition line, but use one of the predefined templates. Your concerns for our "upstream" users are commendable, but it's not your problem and it can be solved pretty easily by filtering the content of entries for specific categories and keywords in the definition lines. --Ivan Štambuk 18:06, 2 August 2010 (UTC)
- (unindent -1) I do mind a bit that the random page function gets overflooded with inflected forms. However, each inflected form is one click away from its lemma, so ten random inflected forms point to ten random lemmas. By contrast, ten random proper names do not point to ten random words that are not proper names. Again, this can be dealt with by adjusting the random page function. But it is a downside that should at least be acknowledged to exist, alongside the other downsides I have mentioned. It runs counter the arrogant claim that "there is absolutely no reason" against the inclusion of proper names and that the concerns are "irrational and misguided".
- The speed with which a group of terms can be entered into Wiktionary seems rather irrelevant to me. The key question is how many terms in a given class will be eventually included by the considered criterion. The wording "must initially include" used instead of "must eventually be able to include" looks a bit like a red herring: the eventual impact on the inclusion of a term is the same.
- As one of the key editors of an important free dictionary resource available world-wide on the internet, I am naturally concerned with the interests of the broader community of users of the resource that we are creating. That is why I fail to see the relevance of the comment "it's not your problem". --Dan Polansky 09:36, 3 August 2010 (UTC)
- "But it [proper names in the random entry feature] is a downside that should at least be acknowledged to exist". Dan I am not going to acknowledge that as downside because it isn't one. If I want to view a random entry, I'm just as well served by a proper noun (especially one that contains etymology, pronunciation and translations) as I am by a common noun. Despite what Ruakh has said below, I really do not understand the whole "names are not words" thing (they have meanings, pronunciations, etymologies, translations and (in relevant languages) inflections, just as common nouns and adjectives, etc, do). I'm not going to argue against improvements to the random feature, far from it, but the random entry feature giving me a random entry is not a downside. Also, almost all geographic names includable as a result of this vote are going to be just one click away from another word - be it an etymon, a homophone a word used in the definition, a translation, etc. Thryduulf (talk) 10:26, 3 August 2010 (UTC)
- (unindent -1) I am not as well served by a proper noun. I sense no need to expand my vocabulary by learning more proper names or more species names. The argument that geographic names are one click away from a non-proper-name word seems correct in its wording, but misses the point: the probability that I randomly hit a word in a lemma-only database is on par even though not equal to the probability that I randomly hit one of the word's inflected forms in a database that contains inflected forms. The impediment that inflected forms create for the random entry function is really minimal. --Dan Polansky 12:16, 3 August 2010 (UTC)
- For starters, if I want to get ten random English words and get ten random geographic names and species names instead, that is an undesirable thing. Admittedly, this may get overcome with further refinement of the random-word function. Downstream consumers of Wiktionary data would need to setup more filtering if they have no interest in the overflood of geographic names and species names. A downloader of an XML dump has to download many times (or hugely many times) bigger dump; operations on the complete dump get many times slower. None of these disadvantages is probably fatal to the proposal to include all geographic names and all species names in Wiktionary, but these disadvantages have not even been acknowledged to exists. Instead, I read that "there is absolutely no reason" to exclude some geographic names, an unconditional and strong claim. --Dan Polansky 13:01, 2 August 2010 (UTC)
- Could it be that, like me, Ivan does not see any cons in "all words, in all languages" meaning "all words (including proper nouns), in all languages"? Thryduulf (talk) 09:21, 2 August 2010 (UTC)
- There is nothing necessary about Wiktionary containing all geographic names and all species names. This is a choice to be made by Wiktionary editors, a choice about the scoping of Wiktionary. That you see no disadvantage to including all geographic names and all species names in Wiktionary suggests you have not considered the cons seriously, and are only pushing the pros. If you did the accounting, and said that the pros outweigh the cons, that would be more convincing, to me anyway. In the absence of any cost-benefit analysis (or advantage-disadvantage analysis, phrased in a language that does not use the finance metaphor), I naturally feel uncertain about the best course of action. --Dan Polansky 19:57, 1 August 2010 (UTC)
- Not ten times, but likely as hundreds of perhaps thousands as many entries. There are at least 100 million species on this planet (possibly over a billion if you count for the extinct ones), and some day these two categories of place and species names will likely dominate both Wiktionary and Wikipedia in the overall entry/article count. Your concerns over the marginalization of "proper" words are irrational and misguided. This moment more than 9 out of 10 entries on Wiktionary are junk entries, inflected forms generated by bots simply because MediaWiki has no lemmatization capabilities. Place names are words just as any other, they are used and useful and there is no reason to exclude them. --Ivan Štambuk 12:26, 1 August 2010 (UTC)
- Then why should those requirements go in Wiktionary:Criteria for inclusion? —RuakhTALK 23:18, 31 July 2010 (UTC)
- (unindent) Dan, please can you explain why geographic names and species names are somehow a lesser class of word than say adjectives or proper nouns for other classes of things? This isn't a facetious request, I genuinely don't understand why you and others see them as less desirable. From my point of view there are three classes of words in every language
- Attested words, including those that are proscribed, those regarded as misspellings (but which are nevertheless commonly used) and those that are not in current use (obsolete, archaric, etc).
- Nonce words, neologisms, and other insufficiently attested words
- Protologisms, dictionary-only words, reconstructed words and others that are unattested.
- In my view Wiktionary should include all words in class 1, regardless. What sort of word they are or what sort of concept they represent is irrelevant. Words in class 3 should not be in mainspace, but may be in an appendix, again regardless of what their meaning is. Words in class 2 I'm more flexible about, depending on many factors - their meaning not being one of them. Thryduulf (talk) 17:27, 2 August 2010 (UTC)
- Dunno what Dan's reasons are, but personally I don't consider place-names to be "words" at all. Ditto given names, surnames, and so on. That's not exactly because of their meaning — "Mom" is a word, and "Sue" is not, even they can both be used as proper nouns and can both have the same referents at times — but rather because of the linguistic property that one is a word and one is a name. (Actually, I suppose meaning does play a role, in that "Sue" simply has no meaning, it's just a name, whereas "Mom" is a word with meaning; but it's not the extent of it.) This isn't to say that we should only include words, never names, but it does mean that I view names as completely secondary to our goal of defining all words in all languages, and if they conflict with that goal, then they must go. —RuakhTALK 17:40, 2 August 2010 (UTC)
- But Sue has a meaning in actual use: it unambiguously denotes a female human in a specific context. The difference is that we are not interested (from a lexicographical perspective) in any property of the referent other than the fact that she is female. There is no distinction between "real" (non-name) definitions that generalize on every imaginable/relevant property of term's usage, and those that generalize only on the basis of the referent's sex. With place names it's even less so distinctive: in great many number of cases there is only a single toponym with a particular name. At any case, that whole distinction between words and names that you introduce and that I've never heard of before is a trivial matter of semantics that has no bearing on whether we should include or exclude (place) names. They are useful, covered in paper dictionaries, and that should be the deciding criterion. --Ivan Štambuk 17:57, 2 August 2010 (UTC)
- "Sue" does not denote generically a human female; rather, there are certain human females who happen to be called that, but there are also some well-known human males called it. Anyway, I'm not introducing this distinction; it's well known and widely understood. (I challenge you to find a competent English speaker who, asked for a common English word that rhymes with "cruisin'", will provide "Susan".) What you call a trivial matter of semantics, I call a fundamental distinction — more fundamental even than the distinction between words and phrases. —RuakhTALK 19:01, 2 August 2010 (UTC)
- Well in your specific example, "I challenge you to find a competent English speaker who, asked for a common English word that rhymes with "cruisin'", will provide "Susan".", is not a good one as in standard British English pronunciations at least the two do not rhyme (/ˈkɹuː.zɪn/, /ˈsuː.zən/), however to answer the point you were trying to make, I would expect no shortage of people who when asked for a common English word that rhymes with "rake" to provide "Jake", similarly "candy" and "Mandy" (although I don't itexpect either to be everybody's first choice, it will be for some people). Thryduulf (talk)
- Many names (esp. personal names and surnames) can be used generically in plural form to denote category of objects named as such, and in that way they are no different than "normal" words. Names do not need to denote a generic object. In case of many place names (e.g. names of countries, rivers, seas, mountains, many settlements and unique geographical objects) they only have a single, established and unambigous meaning. Dictionary definitions of words describe the distinctive set of properties of the referring objects/actions shared by all usages (i.e. "the meaning") of that word. In case of e.g. personal names, the only common attribute is the sex of the referent. Names are by their semantics no different that pronouns or determines, that they always need a context to disambiguate to a particular object. The fact that when used "isolated", they mean nothing, is immaterial. Indeed, there are definitions of a word in linguistics, but all of them circle around stuff such as morphology, orthography, ability to refer to a compound components etc., not on whether they name an object or a category of objects. The distinction is trivial and irrelevant, albeit amusing from a philosophical perspective. --Ivan Štambuk 20:57, 2 August 2010 (UTC)
- Re: plural: Yes, there is a common noun Sue meaning "a person named Sue", but that sort of usage is not what this vote is about. Regardless, you're welcome to see any distinction you like as "trivial and irrelevant", but if some editors see it as fundamental and relevant, then that makes it relevant to those editors. —RuakhTALK 22:04, 2 August 2010 (UTC)
- "Sue" does not denote generically a human female; rather, there are certain human females who happen to be called that, but there are also some well-known human males called it. Anyway, I'm not introducing this distinction; it's well known and widely understood. (I challenge you to find a competent English speaker who, asked for a common English word that rhymes with "cruisin'", will provide "Susan".) What you call a trivial matter of semantics, I call a fundamental distinction — more fundamental even than the distinction between words and phrases. —RuakhTALK 19:01, 2 August 2010 (UTC)
- But Sue has a meaning in actual use: it unambiguously denotes a female human in a specific context. The difference is that we are not interested (from a lexicographical perspective) in any property of the referent other than the fact that she is female. There is no distinction between "real" (non-name) definitions that generalize on every imaginable/relevant property of term's usage, and those that generalize only on the basis of the referent's sex. With place names it's even less so distinctive: in great many number of cases there is only a single toponym with a particular name. At any case, that whole distinction between words and names that you introduce and that I've never heard of before is a trivial matter of semantics that has no bearing on whether we should include or exclude (place) names. They are useful, covered in paper dictionaries, and that should be the deciding criterion. --Ivan Štambuk 17:57, 2 August 2010 (UTC)
- Responding to Thryduulf, 17:27, 2 August 2010:
- I do not use the term "lesser class of word", and do not quite understand it, but I will try to answer your question anyway.
- Geographic names and species names, when considered as whole classes of terms, are less inclusion-worthy than non-proper-name typographical words (such as "cat" but not "black hole") for their being dispensable in a comprehensive general language dictionary, and for their being extremely numerous. Their being dispensable follows from my experience with using general language dictionaries, and from the collective experience that has lead to the lexicographical precedent found in existing established general language dictionaries. I am not saying that geographic names and species names must be excluded from Wiktionary; I am merely saying that their comprehensive inclusion has some major drawbacks. I reject the slogan "all words in all languages" in its literal reading, so the question of wordhood is less relevant to me. --Dan Polansky 12:19, 3 August 2010 (UTC)
- On another note, if you want to focus on wordhood, you may try to explain as an exercise what makes you think that "Little New York" is a word while "Albert Einstein", "Much Ado About Nothing", and "King Lear" are non-words, if that is what you actually think. Or, maybe, whether you think that "Albert Einstein", "Much Ado About Nothing", and "King Lear" are lesser words than "Little New York" and why. --Dan Polansky 12:28, 3 August 2010 (UTC)
- "Albert Einstein" is not a word, but two words "Albert" and "Einstein" (a name can be a word, or composed of multiple words), both of which should be included.. "Little New York" does not appear to be the name of a specific place (there is no en.wp article with that title, and I'm not otherwise familiar with it) and so would seem to be an adjective little and a proper noun New York. "Much Ado About Nothing" is the name of a play composed of the words "much", "ado", "about", and "nothing", similarly "King Lear" is "king" and "Lear". Both examples are irrelevant to the question of the inclusion or otherwise of pace names, even if they weren't they would be no more includable than other colocations of words like "Spot Goes On Holiday", "So Long And Thanks For All The Fish" or "Ode To A Small Lump Of Green Putty I Found In My Armpit One Midsummer Morning".
- Regarding geographic names being dispensable in a comprehensive general language dictionary, that is only one of Wiktionary's functions and they are frequently included in both etymological and translating dictionaries (which are another two of Wiktionary's functions), just as my comprehensive general language dictionary doesn't include translations and a detailed etymology of (deprecated template usage) water for example is no reason to exclude these features from Wiktionary's entry. Thryduulf (talk) 13:10, 3 August 2010 (UTC)
- I assumed that "Little New York" was a geographic name, but it seems that I have goofed. So, can you answer my questions modified by replacing "Little New York" with "New York"? That is, for the purpose of "all words in all languages", is it that "New York" is a word while "King Lear" (play) is not a word and why? Or if both are words, is "King Lear" (play) lesser word than "New York" and why? What about "San Francisco"? What about "Grand Canyon"? Again, is "Much Ado About Nothing" (play) a word? I know that the name of the play is composed of words; is the name itself a word? If the name itself is not a word, why is it not a word while "Grand Canyon" is a word? I point to the fact that none of the mentioned names are semantic sum-of-parts; no proper name is a semantic sum-of-parts, so each proper name is in some sense a standalone lexical unit. --Dan Polansky 16:05, 3 August 2010 (UTC)
- Dunno what Dan's reasons are, but personally I don't consider place-names to be "words" at all. Ditto given names, surnames, and so on. That's not exactly because of their meaning — "Mom" is a word, and "Sue" is not, even they can both be used as proper nouns and can both have the same referents at times — but rather because of the linguistic property that one is a word and one is a name. (Actually, I suppose meaning does play a role, in that "Sue" simply has no meaning, it's just a name, whereas "Mom" is a word with meaning; but it's not the extent of it.) This isn't to say that we should only include words, never names, but it does mean that I view names as completely secondary to our goal of defining all words in all languages, and if they conflict with that goal, then they must go. —RuakhTALK 17:40, 2 August 2010 (UTC)
- A general note, related to the point of overflood: If we decide that we are okay with Wiktionary being overflooded with geographic names and species names, then a much simpler criterion with a very similar effect is the following: "A geographic name should be included iff it is attestable." In an inflected language, each attestable geographic name can be equipped with pronunciation and inflection, two classes of lexical information, thereby satisfying the current a bit more complex criteria. I see little point in penalizing English geographic names through their inability of getting inflection information. Anyway. --Dan Polansky 16:19, 3 August 2010 (UTC)
- While the requirement of attestability has some filtering power, its filtering power remains unclear. - the requirements are not suppose to filter any place name. There is absolutely no reason why any of those 2M US place names should not be added. The requirements are there to force users to add lexicographically relevant data, and not merely increase the number of entries with arguably useless entries, solely for the number of entries sake. --Ivan Štambuk 18:31, 31 July 2010 (UTC)
Decision
[edit]9-5 or 11-5 counting late votes. Either way it's no consensus. Sadly. Mglovesfun (talk) 19:49, 31 July 2010 (UTC)
- Sounds too close to close, and frankly I'm not sure why 2/3 isn't enough to pass. I'm extending it another two weeks. DAVilla 22:35, 31 July 2010 (UTC)
- Suits me (hey, I supported it). Mglovesfun (talk) 07:44, 2 August 2010 (UTC)
Looks like it passes 15-5-3. —Internoob (Disc•Cont) 20:49, 15 August 2010 (UTC)