Back to previous post: Moose Festival

Go to Making Light's front page.

Forward to next post: Biden

Subscribe (via RSS) to this post's comment thread. (What does this mean? Here's a quick introduction.)

August 21, 2008

The honor of your assistance is requested in a small matter of language
Posted by Abi Sutherland at 04:31 PM *

Gentle reader,

In the course of her duties today, this blogger was obliged to consider the vast range of input to be expected from the ladies and gentlemen who do her company the honor of using its software. In particular, she was occupied with the task of addressing the tendency of some users to express an excess of emotion, or to seek to produce an improper effect upon the unsuspecting reader, with the strength of their language.

In order to curb these unfortunate tendencies, and forestall the employment of coarse and unsuitable language, she was enjoined to produce a list of particularly crude and unsavory terms whose use would be most strictly prohibited. Nor would variants of the selected expressions be permitted; the software produced at her place of employment is of a sufficiently sophisticated nature to encompass the derivation of gerunds from the raw verbal forms &c. There will even be some discussion in the forthcoming weeks regarding the inclusion of the recently popularized “leet” forms produced by the systematic substitution of numeric characters for the letters to which they most closely bear a resemblance.

Due to the popularity of her employer’s product, this blogger’s task was further complicated by the requirement to produce appropriate lists in both the American and British dialects of the English language. Furthermore, because even within the several nations who have adopted the product there exist variations in the level of local sensitivity, it was deemed appropriate to produce two lists per dialect. The “core” assemblages contain those of the gravest offense, which are liable to shock and horrify even the most liberal-minded and worldly of readers. The “additional” lists are provided to broaden the range of prohibited speech in order to protect any more delicate-minded communities which may choose to uphold a stricter standard of decency. The selection of the list to adopt is of course entirely within the purview of the customer.

However, this blogger is sadly hampered in the execution of her duties by her sweet and innocent nature. (She will now pause in tactful silence while the gentleman in the back row endures his coughing fit; no doubt he has caught a slight chill. She hopes that he will be better soon.) After due consideration, she has decided to be so bold as to place the product of her initial efforts before this discerning crowd, to ascertain if she has perhaps omitted any words which would be better included, or indeed added to her list some innocuous term which, upon further investigation, is found to be merely a variety of orchid.

Be warned, gentle reader, that the remainder of this post contains profanity of the strongest nature. Please do not peruse this entry further if you are at all prone to offense or shock. This blogger would be most distressed to learn that she had caused any upset to an unsuspecting reader who further pursued this matter in the expectation that it would lead to a list containing anything but the greatest of obscenities.

US Core US Additional UK Core UK Additional
anal*
anus
ass*
asshole
bitch
clit
cock*
cocksucker
cunt
fag
fuck
kike
milf
nigger
penis
piss
shit
twat
whore  
 
 
 
 
 
 
boob
butt
dick*
dildo
dyke
erotic
fetish
gay
hell
horny
kink
labia
orgy
poo*
poop
porn
pussy*
sex
slut
spic
testicle
tit
vagina
vulva
wop
anal*
anus
ass*
asshole
arse
bitch
clit
cocksucker
cunt
fag
fuck
milf
nigger
penis
shit
twat
 
 
 
 
 
 
 
 
 
bitch
bollock
bugger
crap
dildo
dyke
fanny
fetish
hell
kink
labia
orgy
paki*
piss
poo*
porn
slut
spic
testicle
vagina
vulva
wank
 
 
 

Readers should note that those terms indicated with an asterisk should, if possible, be prohibited only in their stated forms, in order that innocuous terms which may unfortunately contain them may still be permitted within the discourse of the community. This would, for instance, allow the use of assume and analysis, as well the discussion of the works of Charles Dickens. The inhabitants of Scunthorpe may still find themselves unfortunately excluded from conversation.

Gentle reader, thank you for employing the strength of mind and character to peruse this unsightly list. What suggestions, pray, can you offer to improve it, and therefore be of inestimable service to your faithful blogger?

(Those readers of a historic or scholarly bent are also invited to provide, if they have the inclination, similar lists for earlier ages. Indeed, considering the nature of the community that does us the honor to contribute here, this blogger would not be surprised to obtain a similar list of terms for societies not yet extant.)

Addendum: It is perhaps of interest to note the precise context in which this list will be applied and, more importantly, the degree to which it will not be used. It is intended for application in a module which permits the general public to enter additional information with regard to items in the catalog of a library. It is not the practice of this blogger’s software to place any limitation upon the searches that may be undertaken by the general public. Furthermore, there is no restriction upon the information that may be entered into the system using the official sources. Only the data arising from the interaction with the general public will be filtered, and that filtration will only control the release of that data into the universally visible catalog.

Comments on The honor of your assistance is requested in a small matter of language:
#1 ::: James D. Macdonald ::: (view all by) ::: August 21, 2008, 04:40 PM:

There's this song I know whose chorus is:

Eat bite fuck suck gobble nibble chew,
Asshole hair-pie fingerfuck screw.

#2 ::: abi ::: (view all by) ::: August 21, 2008, 04:41 PM:

See, I knew that if I wanted help with my profanity, I should ask a sailor.

#3 ::: Dawno ::: (view all by) ::: August 21, 2008, 04:47 PM:

If there are issues with the word testicle, then perhaps scrotum and ball-sack might be considered? My son, who is 'In the Army Now', would definitely use those instead of testicle(s) although he probably wouldn't spell them correctly.

#4 ::: Tykewriter ::: (view all by) ::: August 21, 2008, 04:48 PM:

*Thatcher*


*except post mortem

#5 ::: Serge ::: (view all by) ::: August 21, 2008, 04:49 PM:

the tendency of some users to express an excess of emotion

This sounds like the kind of comment that shows up in my yearly review.

#6 ::: James D. Macdonald ::: (view all by) ::: August 21, 2008, 04:50 PM:

Abi, if 'felching' isn't on your list, it's not half a list, if you know what I mean.

#7 ::: pedantic peasanr ::: (view all by) ::: August 21, 2008, 04:52 PM:

Depending on your level of sensitivity:

sphincter; butt-hole; arse; douche (and compounds thereof); sperm; cum; -wad; kyke (alt. of kike); feces; mofo and mo-fo; homo; lezzie; shite; fuk; fukker; carpet-muncher; fudge-packer; a-hole; tosser; pansy.


Also, do you want to curtail IM speak with inappropriate additions, such as WTF, GTFO, STFU, and the ever-popular ROTFLMFAO?

#8 ::: Sarah S ::: (view all by) ::: August 21, 2008, 04:53 PM:

Jim @#1

S'funny.

I know one that goes:

Shit, shit, damn, damn.
Son of a bitch.
God damn.
Highty tighty
Christ Almighty.
Shit, shit, fuck.

Learned it at a Girl Scout camping trip, of course. Sung to the tune of "Hey Ho Nobody Home."

#9 ::: Gwen ::: (view all by) ::: August 21, 2008, 04:56 PM:

If you're including "hell", "sex", and "fetish" on there--"damn" (and "goddamn(ed)" and possibly "godsdamn(ed)") would probably fit too.

If I understand the asterisk properly: hell (as part of, frex, shell), butt (button), boob (booby, booby-trap), horny (thorny), kink (skink), and tit (interstitial), at the least, should also have one.

#10 ::: Tracey S. Rosenberg ::: (view all by) ::: August 21, 2008, 04:57 PM:

'Yid' seems borderline - and after reading The Yiddish Policemen's Union I nearly began using it myself - but I'll suggest it.

As for the UK additional, given that 'paki' is there I'd say 'chinky' could be added as well.

#11 ::: pedantic peasant ::: (view all by) ::: August 21, 2008, 04:57 PM:

And if you add 'felching' you probably want to add 'fisting' and 'fisted' as well.

If you can do pairs, curtailing 'bite me,' 'suck me,' and 'lick me' (and same words with with 'my') is probably also a good idea.

#12 ::: Serge ::: (view all by) ::: August 21, 2008, 05:03 PM:

asshole --> trou de cul

#13 ::: Liza ::: (view all by) ::: August 21, 2008, 05:04 PM:

Is "anal" really that bad? I suppose if you can't talk about vaginas, though, body parts are out. In which case you might want to add "breast" to the list ("bosom" as well?).

Also, you might want to allow variations on "shit," because some people can't seem to remember that "Shiite" has two i's.

The list reminded me of a scene from the dance class I took in high school (a lot of us took it to get out of phys ed, including a couple of guys). One day the teacher gently asked us to please try to refrain from saying "that sucks" in her presence, as to someone of her generation it meant a certain kind of sexual activity. Most of us blushed or snickered. One girl, however, looked puzzled, trying to figure out what it could mean, and finally blurted out "You mean, like, oral???" The rest of the semester everything in that class was "oral." Lose your book bag? That's really oral. Trip on the stairs? Man, that's oral. Etc.

#14 ::: Serge ::: (view all by) ::: August 21, 2008, 05:04 PM:

whore --> pute

#15 ::: Caroline ::: (view all by) ::: August 21, 2008, 05:05 PM:

You're going to have to come up with a list of compound words involving the ones you want to match only as stated, because people will think creatively. Append "licking," "eating" or "sniffing" to the nouns, for example.

Also, things like "carpetmuncher," "hobag," "slore" (slut combined with whore).

You may also want to add "turd" and "crap" if "poo" and "poop" are going to be considered off-limits -- I consider both of the first two stronger than the second two.

If "penis" is off-limits, shouldn't the full word "clitoris" be off-limits too?

And the full word "faggot" for the American list at least.

I can think of several words that can contextually be problematic, but that are used the same way in non-problematic contexts (like "screw" and "slit" and "nail"). Don't know how to handle those.

(Do you know that Making Light is the only place, online or off, where I still blush at this kind of language? I don't even know why. But I am sitting here turning pink, and I'm pretty sure it's the sex-related words -- the ugly slurs just make me mad, they don't embarrass me. I haven't blushed about sex since I was sixteen.)

#16 ::: abi ::: (view all by) ::: August 21, 2008, 05:06 PM:

Most of the text that I'm trying to filter will be one-word or multi-word tags. There is some space for connected prose (quick reviews), but very little.

We're going to have to tweak our stemming engine (ironically, based on the snowball algorithm) if we're going to optimize this stuff and reduce false positives. As things stand, if I block "fisting" I'll lose "fist" and all tags that use it. Still working on the balance there.

#17 ::: John Stanning ::: (view all by) ::: August 21, 2008, 05:07 PM:

Are these words anathema in all circumstances, or are legitimate uses allowed?

bitch is a female dog
dyke is what keeps the water out (or in)
cock is the partner of a hen
pussy is my cat
poop is the high bit on the stern of a ship
kink is something that happens to a wire rope
tit is a small bird
et cetera, et cetera

#18 ::: John ::: (view all by) ::: August 21, 2008, 05:08 PM:

I'd love to pitch in, but I've been watching "Deadwood" of late, so my sensitivity level is rather high. Or low, depending on your fucking point of view.

#19 ::: Evelyn Browne ::: (view all by) ::: August 21, 2008, 05:09 PM:

12-- 'Bite my...' and so forth, would be overkill, though, unless there were a way to only catch those collocations in the imperative.

#20 ::: Tracey S. Rosenberg ::: (view all by) ::: August 21, 2008, 05:09 PM:

Oh, and I'd love to contribute to Naughty Words of the Past, but many authors won't play ball. cf. one of my favorite lines in Thackeray, from Vanity Fair, ch. 21:

“————!” burst out his father with a screaming oath

#21 ::: Charlie Stross ::: (view all by) ::: August 21, 2008, 05:09 PM:

So. Your company doesn't anticipate further business in Scunthorpe, Essex, Sussex, or Clitheroe; not to mention Cockfosters or Milford. Nor is it Janus-faced.

(Note that these locations are merely those that spring to mind; I don't have a grelpable list of place names or surnames, but it's astonishing how many "rude words" show up as substrings of nouns, especially in Yorkshire.)

Hmm. Some improvement warranted here, methinks ...

#22 ::: abi ::: (view all by) ::: August 21, 2008, 05:09 PM:

The way our stemming works, "clit" blocks "clitoris" and "fag" blocks "faggot". Thus the asterisked terms, as explained below the table.

Many useful suggestions to consider. And on the Internet, no one can see you blush. Remember that.

#23 ::: Serge ::: (view all by) ::: August 21, 2008, 05:11 PM:

abi @ 16... We're going to have to tweak our stemming engine

"Captain! The engine can't take this anynmore!"
"Scotty! I need that (bleep!) power now!"

#24 ::: Doctor Science ::: (view all by) ::: August 21, 2008, 05:11 PM:

Coon

Spearchucker

Junglebunny

Towelhead

Jigaboo

Slut (possibly)

Hooker (possibly)

Dyke

Lezzie

Lesbo

Kike

Gyp

Jap

Dago

Spic

Chink

Haji

Camel jockey

Sand nigger

Ball busting

Castrating

Man hater

Man basher

Feminazi

From a post by ginmar, which happened to come up on my friends' list just after this post. [Warning: contents include justified bitterness which may be unpalatable for certain readers.]

Personally, I think that if adults are going to be using the software for general social communication, we must be permitted to use formal Latin terminology for body parts & things one does with them.

#25 ::: Evelyn Browne ::: (view all by) ::: August 21, 2008, 05:12 PM:

Clit should have an asterisk, at least if any linguists are ever going to use this software. (Clitic, enclitic, cliticization...)

#26 ::: Phil Palmer ::: (view all by) ::: August 21, 2008, 05:12 PM:

shirtlifter, turdburgler, brownhatter?

But please don't think me ginger (beer)
If I give you a zinger here;
You'll never stop a Cockney rhyme
Whose basis varies over time.

#27 ::: pedantic peasant ::: (view all by) ::: August 21, 2008, 05:13 PM:

Jim @ 1 and Sarah @ 8:

The two I once knew were:

(as a vulgar, tit for tat response to profanity): "Watch your fucking language, what the hell do you think this is a goddamn playground?"

and

"Twat you say? I cunt hear you. Never mind, give me a minute and I'll finger it out."


Abi:

Another pair: 'blow me,' and 'jerk off.' Also blowjob, tramp, slit, faggot, and afterbirth.

Possibly: Nazi, jackass, testosterone, estrogen, PMS, and menstrual.

#28 ::: L Lindsey ::: (view all by) ::: August 21, 2008, 05:13 PM:

You might want to add motherfucker

#29 ::: Tykewriter ::: (view all by) ::: August 21, 2008, 05:13 PM:

Not to mention Cockermouth. (A personal favourite.)

#30 ::: Gwen ::: (view all by) ::: August 21, 2008, 05:13 PM:

Also, according to Wikipedia: "Other ethnic slurs like coon, porch monkey, Alabama porch monkey, afrodite, sausage lips, tar baby, darkie (African-American), dottie (Indian/Pakistani)[citation needed], chink, gook (Asian), beaner, wetback, spic (Hispanic-American), guinea, wop, dago (Italian), honky, gringo, cracker (whites), heeb (Jewish), kraut (German -- used especially during World War II), sand nigger, raghead, towelhead, "rug merchant" (Sikh, or Arab in the US); and pejoratives like fattie, retard, and redneck or hillbilly aren't entirely profane at all times, but can be considered very offensive when used in the company of certain people, and not socially acceptable in polite settings or social situations."

Bastard and prick also made the December 2000 "Delete Expletives" paper, higher on the list than bollocks, arsehole, and paki.

#31 ::: Gwen ::: (view all by) ::: August 21, 2008, 05:14 PM:

Also, according to Wikipedia: "Other ethnic slurs like coon, porch monkey, Alabama porch monkey, afrodite, sausage lips, tar baby, darkie (African-American), dottie (Indian/Pakistani)[citation needed], chink, gook (Asian), beaner, wetback, spic (Hispanic-American), guinea, wop, dago (Italian), honky, gringo, cracker (whites), heeb (Jewish), kraut (German -- used especially during World War II), sand nigger, raghead, towelhead, "rug merchant" (Sikh, or Arab in the US); and pejoratives like fattie, retard, and redneck or hillbilly aren't entirely profane at all times, but can be considered very offensive when used in the company of certain people, and not socially acceptable in polite settings or social situations."

Bastard and prick also made the (British) December 2000 "Delete Expletives" paper, higher on the list than bollocks, arsehole, and paki.

#32 ::: Tykewriter ::: (view all by) ::: August 21, 2008, 05:16 PM:

And Penistone is down my neck of the woods.

#33 ::: John Stanning ::: (view all by) ::: August 21, 2008, 05:16 PM:

Oh yes, and are we forbidden to discuss
the country to the left of India
Winnie the Pooh
any kind of assessment or analysis
Shropshire Partners In Care (SPIC)
?

My advice: forget it. There are more words in heaven and earth than are dreamt of in your stemming engine.

#34 ::: abi ::: (view all by) ::: August 21, 2008, 05:20 PM:

Gwen,

In general, I think it'll be hard to block terms that are made up of innocent words in unfortunate combination (rug merchant, for instance).

But I am sadly out of date on ethnic slurs, so thanks for the list. I'll make good use of it.

#35 ::: Charlie Stross ::: (view all by) ::: August 21, 2008, 05:20 PM:

Also: would you be confused if I called you a fukcing cnut? Or just offended?

The mark one human eyeball comes with a built-in real-time spelling checker.

#36 ::: Tykewriter ::: (view all by) ::: August 21, 2008, 05:22 PM:

An episode of The Two Ronnies featured a list of banned words, including
Kn*ck*rs
Kn*ck*rs, and
Kn*ck*rs

#37 ::: James D. Macdonald ::: (view all by) ::: August 21, 2008, 05:23 PM:

Ho. (As in "Your sister's a ho!")

Which will keep Santa from using one of his favorite expressions, alas.

#38 ::: abi ::: (view all by) ::: August 21, 2008, 05:24 PM:

John Stanning @33:
You'll note that I'd rather not stem certain words, precisely to allow discussions involving Pakistan and Winnie the Pooh. Thus the asterisks.

Although there will always be a proportion of people smart enough to think around a stemming engine and a blacklist, there will also be a larger set of vandals who can't think of anything more creatively profound than "fuck this damn shit".

The trick is to find a balance. This list is part of the quest for that balance.

#39 ::: Tykewriter ::: (view all by) ::: August 21, 2008, 05:24 PM:

Mr Stross: only if my name was Cnut.

#40 ::: dolloch ::: (view all by) ::: August 21, 2008, 05:25 PM:

Well, there's always (in no particular order):

pr0n variation of porn
smeg and smegma
spooge
rimjob
Jeezus
upskirt
gangbang
faggot
queerbait
raghead
gook
darkie
dago

variations on "tit" like "titty" in order to exempt words like "titmouse"

Contextually, "nip", "chink", and "mick", but that's hard to differentiate. Combinations like "rusty trombone", "dirty Sanchez", or "snail trail" would also be troublesome.

#41 ::: pedantic peasant ::: (view all by) ::: August 21, 2008, 05:26 PM:

Abi:

Not sure what your software function is here. Some of these, while inappropriate out of context might be essential if one is running a medical business, an animal breedery, or similar.

Also, as Caroline @ 15 said, people will get creative over time.

If this is something to clean and filter e-mail or 'net chatter, then this may work well enough to stop and make someone stop and think when it gets bounced, but if someone is determined to be offensive, there are inordinate options available that won't be blocked, even before you add in creative thinkers:

toss my salad, tripod, eiffel tower, cum dumpster, ATM, etc.

#42 ::: Charlie Stross ::: (view all by) ::: August 21, 2008, 05:27 PM:

A ban on rude words doesn't preclude rudeness; it merely requires ingenuity to work around the ban. And once the flamers get the message, they'll gear up.

"I'm not saying you're a drooling, inbred cretin, but your family tree is a linked list, your mother is your sister, and you voted for Bush, twice."

#43 ::: ed g. ::: (view all by) ::: August 21, 2008, 05:28 PM:

Not only residents of Scunthorpe, but also those of Milford, will be excluded. Discussion of cooking will be difficult if one may not mention spices. Perhaps "milf*", "spic*", and "hell*" (I just noticed) would be more correct entries, or am I misunderstanding how the pattern-matching is done?

#44 ::: abi ::: (view all by) ::: August 21, 2008, 05:28 PM:

James @37:
Slightly off-topic, a common Dutch interjection is "hoor", meaning roughly, "you hear?" Unfortunately, it is pronounced "whore", which makes it startling on occasion.

#45 ::: abi ::: (view all by) ::: August 21, 2008, 05:32 PM:

pedantic peasant @41:
Library software, filtering user-entered tags, list names and short reviews in public and university libraries.

These lists are "seed" lists, just to get our customers started. It is expected that they will be configured over time.

#46 ::: Charity ::: (view all by) ::: August 21, 2008, 05:32 PM:

Do socio-economic terms count, or just racial ones? Thinking "chav" for the UK Additonal list.

Also... Making Light is always thought-provoking, but I think the time is particularly ripe for some introspection when I can read a list like that and think "what will be left for people to talk about on that service?"

#47 ::: Kevin Marks ::: (view all by) ::: August 21, 2008, 05:33 PM:

You won't be able to repress British profanity, as it is endlessly creative and allusive. See Roger's Profanisaurus, Viz magazine, Urban dictionary, Carry On films &c.
That said, you should add 'wog' to the British list. You should also consider 'retard' as a noun.

#48 ::: ajay ::: (view all by) ::: August 21, 2008, 05:35 PM:

Not forgetting, of course, the Lithuanian Typewriter... and the many other euphemisms for verbs of a carnal nature such as shag, root, monster, bonk, knob, and (if you are the Duchess of Marlborough) pleasure ("His Grace returned from the wars today and pleasured me twice in his top-boots").

#49 ::: Kevin Marks ::: (view all by) ::: August 21, 2008, 05:37 PM:

An excellent social engineering hack that Christy Canida came up with was to have such a list, but to present the error message as "your comment didn't pass our spell-checker - the word 'whore' was not recognised"

#50 ::: pedantic peasant ::: (view all by) ::: August 21, 2008, 05:40 PM:

Oooohh!

'Retard' is a good one. See also moron, doofus, bastard, and idiot.

Also, (I forgot if this was mentioned) crap, and -crap [as in bullcrap, horsecrap, etc.]


Gee Abi,

47 comments in an hour. Not the fastest fill-up, but certainly up there. Why do you think this one is so popular? ;)

You come up with the best stuff for us ....


Thanks!

#51 ::: Paula Helm Murray ::: (view all by) ::: August 21, 2008, 05:40 PM:

Too bad peaceday/peaceguy chat went away with Katrina.

It had some sort of 'bad language' algorithm and thus: a member's last name was Hitchcock it would appear as Hitchrooster, if someone said 'cocksucker' it came out as 'roosterlollipop'.

I am not sure what it translated other things as because if it decided you were swearing, it would warn you once, then bounce you out of chat the second usage of a bad word.

Once it kicked you out, you had to make up a new user id/password.

#52 ::: James D. Macdonald ::: (view all by) ::: August 21, 2008, 05:40 PM:

Seriously, the answer they want is "human moderation." There is no software answer to the problem posed.

Retired sailors would probably make a good choice for who to hire as moderators.

#53 ::: John Stanning ::: (view all by) ::: August 21, 2008, 05:41 PM:

Didn't we just have a thread on a related topic? Oh yes - “No,” he said apartmently.

And by the way, any software worth its salt should be able to distinguish between the word cunt and the same string of letters embedded within the word Scunthorpe.

#54 ::: Fragano Ledgister ::: (view all by) ::: August 21, 2008, 05:42 PM:

Dese bomboclaat lists too raas exclusive.

#55 ::: Fragano Ledgister ::: (view all by) ::: August 21, 2008, 05:43 PM:

Abi: You missed 'wog', and 'nig-nog' on the UK list. They were such joys in my childhood.

#56 ::: Joe J ::: (view all by) ::: August 21, 2008, 05:45 PM:

I will offer the insult, "skank," as a possibility. Though, this may be confused with a dance of the same name. (I'm not sure if this is US specific.)

#57 ::: abi ::: (view all by) ::: August 21, 2008, 05:46 PM:

Fragano @55:

I'm weak on racial insults in general, and British ones even more than American ones. I blame living in Ediburgh, which is one of the whitest cities I have ever seen.

I'll add those in.

#58 ::: mjfgates ::: (view all by) ::: August 21, 2008, 05:47 PM:

You might want to add "nigga" to the US list; it's a fairly common spelling these days. Or maybe I've been playing too much "Grand Theft Auto: San Andreas" this week.

#59 ::: abi ::: (view all by) ::: August 21, 2008, 05:48 PM:

Jim @52:
I'm also involved in designing our moderation interface, including a tool to see what extant tags a given term would exclude. The blacklist is only a tool to assist the moderator, not a substitute for one entirely.

#60 ::: abi ::: (view all by) ::: August 21, 2008, 05:50 PM:

pedantic peasant @50:
47 comments in an hour. Not the fastest fill-up, but certainly up there. Why do you think this one is so popular? ;)

I attribute it to the kind and helpful nature of the Making Light community, of course.

Thanks!
You're welcome. Having spent the entire day in a language exercise, I felt obliged to share.

#61 ::: Daniel Martin ::: (view all by) ::: August 21, 2008, 05:51 PM:

It seems from what has been leaked that the primary concern here is in an environment where you're filtering out offensive tags that people have applied to things with this software.

Given that, I am surprised to see "penis", "vulva", "vagina" and "testicle" on the list; the whole point of many of the rest of the other words on the list is that you don't use those real terms when being vulgar.

Asking as a matter of the policy you've been asked to help implement - should tagging a medical article "penile cancer" be disallowed? What about tagging an article "vaginal yeast infection"? The current list happens to allow one and exclude the other, unless your stemming engine is pretty smart.

I'm saddened that your engine would deem a tag of "gay" offensive, but I suppose that it is often used in a rather offensive manner as a tag. I might ask that you add "diaperhead" to the list, though maybe you should wait until the users of your company's product prove themselves that creative. Also "dego" and its variant "dago".

#62 ::: Fragano Ledgister ::: (view all by) ::: August 21, 2008, 05:51 PM:

I note that I was able to write a line that would have got me arrested and fined in several English-speaking Caribbean countries, without abi taking notice.

#63 ::: pedantic peasant ::: (view all by) ::: August 21, 2008, 05:51 PM:

Here are a few more:

fellatio, for unlawful carnage knowledge (speaking of creative thinkers, I actually got that as a response once: "Here's a buck, for unlawful carnal knowledge with yourself." [Apparently a higher-order thinking breed of troll]), masturbate, simian, cunnilingus (from college: "I heard you had a talented tongue and were a cunning linguist."), banging, balling, 'do me,' 'eat me,' pervert, feeb, snatch, tampon, skank and skanky, hooker, prostitute, roundheels, ATOGM, and jism.


[This list brought to you as further proof that teachers learn as much from their students as they teach -- whether they want to or not.]

#64 ::: Seth Gordon ::: (view all by) ::: August 21, 2008, 05:52 PM:

If I may quote from my alma mater's unofficial drinking song:

Penetration, fornication, copulation, fuck
Blow job, hand job, rim job, ream job, cunnilingus, suck
Eating beaver, dipping wick, taking it up the rear
These words don't mean a thing to me, 'cause I'm an engineer.

(to the tune of "Rambling Wreck from Georgia Tech")

#65 ::: Mary Dell ::: (view all by) ::: August 21, 2008, 05:52 PM:

You may encounter problems if people need to use proper names in the software. "Dick Cheney" may be a swear word but hopefully "Dick Van Dyke" is not.

#67 ::: Tlönista ::: (view all by) ::: August 21, 2008, 05:56 PM:

Those people?

#68 ::: pedantic peasant ::: (view all by) ::: August 21, 2008, 05:56 PM:

Abi:

You will update the list when alls been said and done, so we can all have access to "The Official Maaking Light Profanity Guide," right?

#69 ::: Serge ::: (view all by) ::: August 21, 2008, 05:59 PM:

Next, Mrs.Slocombe and her pussy...

#70 ::: abi ::: (view all by) ::: August 21, 2008, 05:59 PM:

pedantic peasant @68:
You will update the list when alls been said and done, so we can all have access to "The Official Making Light Profanity Guide," right?

If you like, yes. But it must be understood that no such list is perfect or comprehensive, even in its intended context.

#71 ::: Karen Williams ::: (view all by) ::: August 21, 2008, 06:00 PM:

"Anal," even though it's meant to be used just as is, could cut out compound medical terms. Of course, the only one I can think of right now is "anal probe," which may not make my case.

#72 ::: p mac ::: (view all by) ::: August 21, 2008, 06:01 PM:

Well, George Carlin had a little list. And sure enough, "motherfucker" doesn't appear on any of the lists.

#73 ::: Mark Wise ::: (view all by) ::: August 21, 2008, 06:02 PM:

bloody
sod
sodding
poof
pansy
fudgepacker

Isn't bloody more offensive than USians tend to think?

#74 ::: abi ::: (view all by) ::: August 21, 2008, 06:04 PM:

It sounds like I may want to move more of the technical medical terms to the second list, which contains terms we'll suggest to libraries in more (socially) conservative communities.

All of the terms here are configurable and removable, and new terms can be easily added. This is just a suggested starting point before each library optimizes its list to its community (and the standards of its moderators).

I just want it to be the best starting point possible.

#75 ::: abi ::: (view all by) ::: August 21, 2008, 06:04 PM:

"fuck" catches "motherfucker"

#76 ::: pedantic peasant ::: (view all by) ::: August 21, 2008, 06:06 PM:

Another of the great example of getting creative was the John Goodman Saturday Night Live referee sketch ten or twenty years ago where Goodman was playing the ref, and he was doing a press conference with a group of fans and they were all very politely calling him the names they would in the stands as press-type questions.

The one I recall best is Kevin Neelan's:
I'd like to invite the ref to have sex with himself, because that's what he can do, as far as I'm concerned."

#77 ::: Brooks Moses ::: (view all by) ::: August 21, 2008, 06:10 PM:

Going the other direction from most of the comments here, I'd have to contest "kink" being on those lists. In my everyday usage, it's far more commonly associated with a defect in a hose (or other similar situation) than with sexual kinks. "Kinky" may be a different matter.

#78 ::: Josh Jasper ::: (view all by) ::: August 21, 2008, 06:20 PM:

"Santorum" (google for it)

#79 ::: Pablo Defendini ::: (view all by) ::: August 21, 2008, 06:23 PM:

@ Josh Jasper #78
Unfortunately, that particular Google bomb has died out. However, might I direct you to the Santorum.

#80 ::: Jörg Raddatz ::: (view all by) ::: August 21, 2008, 06:24 PM:

Alas, I cannot you help you here, but allow me to tell the story of the day I wished to look up some details about the Achaemenid Empire on a public library's internet computer:
It was strangled by a clumsy filter that would not display any older texts (the kind commonly found for free online) that dealt with ancient Persia and contained the word "Aryan". Bah, I say!

#81 ::: Soon Lee ::: (view all by) ::: August 21, 2008, 06:24 PM:

Over at Asimov's, (where incidentally, the Moderation Gods have awakened from eons-long slumber & the atmosphere has been cleansed), the profanity filter was easily evaded by use of spaces or non-alphabetical characters, e.g. 'c oc ksu cker' or 'a/n/a/l'

#82 ::: sburnap ::: (view all by) ::: August 21, 2008, 06:29 PM:

Perhaps I should tell you the story of a large retail chain that started down this road and ended up really pissing off a Mr. Phuc Yu of Van Neys, California.

#83 ::: James Cummings ::: (view all by) ::: August 21, 2008, 06:33 PM:

Under "UK additional" you could add "slag", "scrubber" and "slapper" which are are all roughly synonymous with "slut". Also maybe "poofter" and "woofter" which are variant forms of "poof".

#84 ::: miriam beetle ::: (view all by) ::: August 21, 2008, 06:39 PM:

sburnap,

a large retail chain that started down this road and ended up really pissing off a Mr. Phuc Yu of Van Neys, California.

yes, vietnamese-in-english does seem to get the worst of it. there was a lovely family-run soup joint here in the vancouver area, that even made it onto letterman once: pho bich nga.

my vietnamese-born friend gets really upset at me whenever i try to pronounce it.

#85 ::: James D. Macdonald ::: (view all by) ::: August 21, 2008, 06:49 PM:

If my very dear friend Abi has not yet perused this site, I would think she had been remiss.

#86 ::: James D. Macdonald ::: (view all by) ::: August 21, 2008, 06:51 PM:

This rather reminds me of the time USS PONCE made a port call in Portsmouth, England. And all the Ponce sailors went ashore on liberty wearing their Ponce ballcaps and their Ponce windbreakers. And all the Brit sailors sort of sidled away from them in waterfront bars, and asked, out of the sides of their mouths, of other US sailors present, "What's the matter with those lads?" or words to that effect.

#87 ::: Johan Larson ::: (view all by) ::: August 21, 2008, 07:09 PM:

I strongly advise you to reconsider the blanket prohibition on "penis" and "vagina". These are the formal, even clinical, terms for these body parts. Someone may quite legitimately want to discuss medical procedures, say, and should not have to resort to maiden-aunt terms like "manhood" or "flower" to do so.

#88 ::: Terry Karney ::: (view all by) ::: August 21, 2008, 07:10 PM:

Having recovered from my sudden chill...

I am amused that "motherfucker" isn't on the list, as it's a core word; much overused.

Cunny, titty/titties, browneye, trouser-snake, pud,jack-off

The inclusion of wop leads me to think spic, beaner, wetback, homo, girly-man etc. might be better excluded.

Other than that, I seem to be drawing a blank; like in barracks certainly had a lot of foul language, and in the field all the more, but it's not springing fresh to mind.

#89 ::: Clifton Royston ::: (view all by) ::: August 21, 2008, 07:18 PM:

Speaking as one who's worked on related problems, I think this approach will have a disastrous effect on your software's usability. Here's why:

Arsenal. arsenic. analytic. psychoanalysis. assent. Holy Mass. passivity. enclitics. Bruce Cockburn. cocker spaniels. Joe Cocker. cumin. documents. cucumbers. Scunthorpe. seashells. hellenic art. Milford, Connecticut. sextet. sextillion. Essex. spices. auspices. haruspices. perspicacity.

Need I continue? Need I explain further?

My suggestion instead:

Disable most if not all stemming at run-time. Instead, in your internal table, add common compounds of those words ("-hat", "-hole"), and then explicitly stem the specific swearwords you want to ban, into the specific derivations you want to be considered as being variations on those words. You could use a grammar of some sort to automatically generate all the derivatives via -s, -es, -er, -ers, -ing, -ery, etc.

One exception: it is probably safe to stem "f_ck" at match-time because it's such a powerful word in English, that the language avoids any compound that includes or sounds like it.

If you're not convinced, look into the history of problems antispam software has had with similar lists. Broad pattern-matching has usually proved to be utterly disastrous.

Try my proposed approach instead, and you're fairly likely to pre-empt most of the compounds people will try first to get around the profanity filter: "asshattery", etc.

If it gets too much nuisance to figure out words that get past the filter, people will instead invent their own non-profane profanities to use in place. (For instance, in the SA forums, people use "lovingly caress", as in "lovingly caress that!") At that point your filter has succeeded.

Oh my. I think I just triggered ML's own profanity filter. I didn't think there was one!

#90 ::: Iain Coleman ::: (view all by) ::: August 21, 2008, 07:18 PM:

The one that jumps out for me is "Paki", which should be in "UK Core", not "UK Additional". The official ranked list of offensive words in the context of British broadcasting is quoted here, with a link to the full report. It looks like a useful resource for your purposes. (I also second the recommendation of Roger's Profanisaurus.)

#91 ::: Marc Moskowitz ::: (view all by) ::: August 21, 2008, 07:21 PM:

Sarah S @8:
I learned a two-verse version at church youth retreats:*

Biff Biff Bam Bam
Son of a bitch, god damn
Hidy Tidy Christ Almighty
Rah Rah Fuck!

Rah once, rah twice,
Holy Jumping** Jesus Christ
Biff Bam God Damn
Rah Rah Shit!

*Yes, Unitarian Universalist ones.
**Sometimes "Humping".

#92 ::: John ::: (view all by) ::: August 21, 2008, 07:32 PM:

Someone was recently tell me of a Vietnamese eatery called Pho King.

#93 ::: James D. Macdonald ::: (view all by) ::: August 21, 2008, 07:35 PM:

This reminds me of the search filters on US libraries some years ago, that prevented patrons from looking up "Superbowl XXX."

#94 ::: sherrold ::: (view all by) ::: August 21, 2008, 07:39 PM:

sherman alexi, writing for a seattle altweekly recently, on the word motherfucker:
23. "Motherfucker" is, of course, the purest distillation of mama insults. Since single mothers are sadly common and sweetly revered in black culture, mama jokes are ironically hilarious. However, I've always wondered why the term "fatherfucker" is so rarely used as an insult. I think it's far more original, powerful, and disturbing than "motherfucker." I assume that "motherfucker" is an insult borne of misogyny, so wouldn't "fatherfucker" be a more egalitarian, homoerotic, and therefore more disturbing obscenity? Wouldn't we all be challenging the patriarchy if we adopted its use?

24. Kobe Bryant is one mean and gifted fatherfucker. Does that work for you?

I walked into an unknown bar on Pike st, heard this being read aloud (loudly), and knew I'd found a keeper.

#95 ::: James D. Macdonald ::: (view all by) ::: August 21, 2008, 07:40 PM:

#89 Cliffton: Oh my. I think I just triggered ML's own profanity filter. I didn't think there was one!

There isn't, per se, but there is a list of terms frequently used by spammers that triggers automatic moderation.


I wonder if a Bayesian filter might not be constructed, so that each library creates its own list of forbidden words, based on their local mores?

#96 ::: Terry Karney ::: (view all by) ::: August 21, 2008, 07:46 PM:

Caroline: I too find myself blushing, which is funny. I think it's because some secret part of me wonders what people I wish to think well of me will wonder when they see the oddball list of things I know. What secret wonders will they have of my interests.

We blush because we are embarrassed. We are embarrased only when we fear shame.

We feel shame only from those who opinion we value.

#97 ::: Jo Walton ::: (view all by) ::: August 21, 2008, 07:53 PM:

Well, a quick google informed me what "milf" meant. My goodness.

If you're going to have British offensive racist terms, consider "wog", a word which caused me to gasp out loud the first time I saw it written down.

#98 ::: Soon Lee ::: (view all by) ::: August 21, 2008, 07:56 PM:

Jo Walton #97:
So you're not a fan of Military SF?

*ducks*

#99 ::: Soon Lee ::: (view all by) ::: August 21, 2008, 07:57 PM:

Ack, that should be Military Fantasy.

Oh, for a time machine.

#100 ::: Tim Walters ::: (view all by) ::: August 21, 2008, 08:01 PM:

The inhabitants of Scunthorpe may still find themselves unfortunately excluded from conversation.

Q: Which three English football teams have obscenities in their names?

A: Arsenal, Scunthorpe, and Manchester Fucking United.

#101 ::: James D. Macdonald ::: (view all by) ::: August 21, 2008, 08:12 PM:

I don't recall seeing or hearing "milf" anywhere before the movie American Pie came out, and the term was so obscure in that movie that they felt the need to define it. (I wouldn't be surprised to discover that the screenwriter made it up.)

Now, of course, it's a favorite of spammers and on-line porn sites everywhere.

#102 ::: Zak ::: (view all by) ::: August 21, 2008, 08:13 PM:

Since no one's mentioned them, a few more delightful terms of art:

Choad, quim, cooze, swive and sard.

Gunsel/gonsel, catamite (and remember, cenobites can be catamites and catamites can be cenobites, but cenobites are not automatically catamites and vice-versa), bundli, bufu, larro, quean (naughty in multiple meanings).

Oh, this could take a while.

#103 ::: lorax ::: (view all by) ::: August 21, 2008, 08:14 PM:

Like Daniel, I'm dismayed to find "gay" on the list; I understand that "dyke" is still perhaps more widely used as a slur than a self-identification, and that there are strong generational issues, so I'm not going to object to that. (Though I do wonder what this would do to Alison Bechdel's Dykes to Watch Out For books, or reviews of her more recent Fun Home that mention her earlier work.) "Gay", however, is the generally used term, and this filtering would effectively force people back in the closet. Worse, the allegedly-neutral term -- "homosexual" -- is predominantly used in a negative context. You're inadvertently selecting for bigotry here.

#104 ::: Calton Bolick ::: (view all by) ::: August 21, 2008, 08:15 PM:

Speaking of the problem with Scunthorpe, Wikipedia's on it.

#105 ::: Zak ::: (view all by) ::: August 21, 2008, 08:15 PM:

Oh yes, and if you're to include Dutch, my name (Zak) is apparently unsavory.

I was very happy indeed to discover that.

#106 ::: Kevin Marks ::: (view all by) ::: August 21, 2008, 08:17 PM:

Jo #97 the interesting thing linguistically about MILF is the intentionality embedded in it. I suspect there's a madonna/whore split in MILF vs Cougar.

Now that Polish and Eastern European immigrants are the latest wave in the UK, is there a new racial epithet for them to watch for?

#107 ::: Terry Karney ::: (view all by) ::: August 21, 2008, 08:23 PM:

Ah... My juvenile brain was amused that the UU Youth were sometimes humping at summer camp.

#108 ::: Sylvia ::: (view all by) ::: August 21, 2008, 08:29 PM:

I'm still giggling at the Lord of the Rings Online attempt to filter their webchat:

The chat system has a rather bizarre filtering system which blocks the hell in hello, along with sm, tit and other such useful morphemes - so don’t you go trying to talk about [FILTERED]elly black[FILTERED]iths pe[FILTERED]ioning tailored s[FILTERED]ching - let alone buttresses, you little devil….

Worked it out? ROT13 translation just in case: fzryyl oynpxfzvguf crgvgvbavat gnvybevat fgvgpuvat


And then there was drunk banned in Age of Conan because runk is rude in Scandinavian... Drunkenness Leads to Masturbation.

Not that it saved the AoC rep who got horny with a customer. Oh well :)

#109 ::: shadowsong ::: (view all by) ::: August 21, 2008, 08:37 PM:

Queef.

Yulia Tymoshenko is a PMILF. Also, the thing the Dutch boy stuck his thumb in is a dike, not a dyke.

#110 ::: shadowsong ::: (view all by) ::: August 21, 2008, 08:42 PM:

(in which he stuck his thumb, blah, blah, yer mom ends sentences with prepositions.)

#111 ::: Evan Goer ::: (view all by) ::: August 21, 2008, 08:43 PM:

Comparing Abi's post to Jim's, it looks like profanity is roughly six times more popular than Moose Festivals.

That's actually a lower ratio than I would have expected.

#112 ::: Hank Roberts ::: (view all by) ::: August 21, 2008, 08:44 PM:

This is a task requiring specialist statistical analysis.

Can I still say that?

#113 ::: meteorplum ::: (view all by) ::: August 21, 2008, 08:48 PM:

Hmm. Been there. Done that. See my comment over at BoingBoing. It is literally the last word (with several typos, alas). And ditto Clifton @89.

This is not to say that Abi's firm should not undertake the effort, but that the unintended consequences might cause more fracas amongst customers (and amusement among employees). I should also like to note that if one were to add "chink" to these lists, one would deprive oneself of quoting from A Midsummer Night's Dream, and I am speaking both as a "Chink" and a fan of the Bard.

Now waiting for Cockney (ooh, ooh) Rhyming slang to get the proverbial chop.

#114 ::: Claude Muncey ::: (view all by) ::: August 21, 2008, 08:52 PM:

The best restaurant name relevant here, in my opinion, is King Dong in Berkeley. Even better, two storefronts down is a massage parlor.

(The address is 2429 Shattuck Ave -- Street View tells the story.)

#115 ::: Sarah ::: (view all by) ::: August 21, 2008, 09:00 PM:

Charlie @ 35:

Because I am a member of the Harry Potter generation, my instinct was not to reverse the "u" & "n", but to replace "c" with "k" and come up with Knut, which is the smallest monetary unit.

Human spellcheck: Almost as fallible as Microsoft Word.

#116 ::: Stephen Hope ::: (view all by) ::: August 21, 2008, 09:04 PM:

My brother worked at a place that had a similar list on their email filters. It banned words like rape, pink, & bush. It would reject the whole email, and not tell you which words specifically triggered the ban (I figured the above words by elimination). It had so many words in the list that at one point 10% of messages were being changed and sent multiple times, trying to figure out what had set off the filter this time.

Eventually one of his co-workers wrote a little script that would take a message, insert a . into each word somewhere (2 or more if the word was big enough) and then send it. It produced messages that looked really weird, but at least they got sent.

#117 ::: Martin ::: (view all by) ::: August 21, 2008, 09:09 PM:

John @ 18:
"depending on your fucking point of view."
Not "depending on your point of fucking view"?

The latter feels much better to my ears.

#118 ::: Serge ::: (view all by) ::: August 21, 2008, 09:22 PM:

Stephen Hope @ 116... My brother worked at a place that had a similar list on their email filters. It banned words like rape, pink, & bush.

There goes Singin' In The Rain.

"Well, I can't make love to a bush!"

#119 ::: Allan Beatty ::: (view all by) ::: August 21, 2008, 09:29 PM:

I'd love to help, but conflict of interest prevents me. I've worked on such a list for my employer. I will note though that people named Michelle will not be happy with this. Oh, and Goshit is a real last name.

#120 ::: Tamago ::: (view all by) ::: August 21, 2008, 09:50 PM:

I was on a chat board with filtering so clumsily aggressive, I learned a few new swear words by figuring out what had been [expletive deleted] in the middle of other words and phrases.

My favorite, however, was in a discussion of timepieces: the word pocke[expletive deleted]ch.

#121 ::: Cygnet ::: (view all by) ::: August 21, 2008, 09:56 PM:

I may have mentioned this before, but I was once a fairly active participant in a forum where they instituted a filtering program that did an automatic string-replace, swapping the swear words out for less offensive terms.

"Cock" was swapped with "Thingy."

The problem?

It was a forum for poultry breeders.

Imagine my surprise when I wrote that I had some young marans cockeral for sale and in my post they were "thingyerals."

#122 ::: Paula Helm Murray ::: (view all by) ::: August 21, 2008, 09:57 PM:

Terry, @#96, I'm with you. There is a tiny bit of my brain that is appalled that i KNOW all these words. Then again I'm a word/spelling freak.

#123 ::: xeger ::: (view all by) ::: August 21, 2008, 10:05 PM:

Hm. 'rent boy' and 'cottaging' would almost certainly be dubious in the uk, and confusing in the US.

#124 ::: Trey ::: (view all by) ::: August 21, 2008, 10:09 PM:

What? No bint? Nor wetback?

#125 ::: Ken Houghton ::: (view all by) ::: August 21, 2008, 10:17 PM:

Speaking as the other half of the Montreal emigres, I was going to complain that "milf" was on the list at all. Difficult acronym to use as an expletive, though it likely does qualify as a term of sexual harassment.

(Really, Jim, people only learned the acronym from American Pie?? Gads.)

#126 ::: Brenda Kalt ::: (view all by) ::: August 21, 2008, 10:19 PM:

Nothing of Yiddish derivation? You darn putzes--er, yutzes.

#127 ::: heresiarch ::: (view all by) ::: August 21, 2008, 10:28 PM:

Of all the words in the Additional list, I've got to say: "erotic"? Really? It doesn't seem like it would be a useful word to have access to when tagging pieces of literature? (The same argument works for "fetish," too, though perhaps a bit less convincingly.)

Also, what lorax @ 103 said: why include "gay"? If the software is customizable, then let the bigots plug it in themselves. No need to enable their bigotry.

#128 ::: Serge ::: (view all by) ::: August 21, 2008, 10:29 PM:

Apparently 'frak' isn't on the bad-word list on my employer's email system.

#129 ::: Lizzy L ::: (view all by) ::: August 21, 2008, 10:31 PM:

I haven't got the stamina to read this thread closely -- but if putz is out, surely shmuck should also be on the list.

#130 ::: Caroline ::: (view all by) ::: August 21, 2008, 10:33 PM:

Terry Karney @ 96, I think you have put your finger on it.

sherrold @ 94, I ran into someone a little while ago who said she preferred the word "gutterfucker." You know, I like that better for insulting purposes -- it's so much harsher. It retains most of the power that the original has lost.

#131 ::: Clifton Royston ::: (view all by) ::: August 21, 2008, 10:37 PM:

Add me to the chorus against predefining "gay" as an offensive word.

#132 ::: cathy ::: (view all by) ::: August 21, 2008, 10:53 PM:

I'm surprised it took 97 comments for someone to mention wog.

In terms of racial epithets, I would probably add hebe, yid, beaner, wetback, spearchucker, junglebunny, chink, gook and possibly spade.

Can't really help with curse words and body parts. I've got kind of a limited vocabulary there.

#133 ::: AzureLunatic ::: (view all by) ::: August 21, 2008, 10:55 PM:

As there's no blanket ban on strings containing "ass", I might suggest the modern "asshat", which is somewhat more wholesome than "asshole", but also more evocative.

I see "pr0n" has already been mentioned.

Some of Japan's more interesting exports include hentai, which may or may not be yaoi, although these may be too specific.

#134 ::: Lance Weber ::: (view all by) ::: August 21, 2008, 11:09 PM:

I'd encourage you to reconsider banning the words that are actually anatomically proper (vagina, labia, penis, etc) unless you're sure there is no appropriate context for their use.

This was an ongoing problem when I worked at a large healthcare company with a Mommy-Knows-Best web filter. The clinical research people were constantly getting shut down for trying to do research that used horribly indecent words like "breast".

#135 ::: hapax ::: (view all by) ::: August 21, 2008, 11:16 PM:

abi, as a librarian (although in the USA, so perhaps not a client for your company) I would almost certainly not recommend this product for purchase.

We already have enough problems with filters on our city-provided internet computers that it literally interferes with our work, and most staff bring in their personal laptops and piggyback on external hot spots just to answer simple reference questions -- just today I was blocked from looking up the weather in Edinburgh, for no obvious reason I could see, but surely some "stemmed" word triggered the filter.

We also had the very not amusing problem of personal names prevented from being entered into our patron database -- many Thai names end with -porn, for example, and "Van Dyke" is not an uncommon surname.

Obviously you can't tell your bosses "Nah, forget it." But if you let them know that at least one potential client swears that your product would make her shoot her computer, it may give them pause.

#136 ::: mcz ::: (view all by) ::: August 21, 2008, 11:17 PM:

A couple of UK ones:

minge
berk

#137 ::: Lance Weber ::: (view all by) ::: August 21, 2008, 11:24 PM:

I have a couple of follow-on thoughts.

First, there are a couple of web service based profanity filter providers that would be a much better investment of programming resources than re-inventing this wheel:

CDYNE Profanity Filter Web Service
WebPurify Profanity Filter Web Service

Second, if you are committed to building it yourself the BBC has a rather long filter list.

#138 ::: janetl ::: (view all by) ::: August 21, 2008, 11:29 PM:

You have my sympathy. It's no fun implementing a feature that you know will be annoying to well- intentioned users, and not much of a deterrent to the putzes. Oh, well. The road to [FILTERED] is paved with good intentions.

#139 ::: Ginger ::: (view all by) ::: August 21, 2008, 11:33 PM:

I've got no words to add, just this little vignette: I left this page up on the desktop, which my partner shut down -- and then she had to ask me "What the hell were you reading?!?"

I've had similar problems with net nanny software blocking me from accessing a vendor's website, probably because it referred to the mouse's sex or some such offensive topic. This is a difficult parameter to monitor, and the human brain is better than any program at parsing out the true underlying meaning versus non-obvious expletives.

#140 ::: JESR ::: (view all by) ::: August 21, 2008, 11:40 PM:

Wow. All this way, and it's still left for me to say: siwash, papoose, squaw, brave, and chief.

I can't see a list being generated which is at once non-racist itself, and yet comprehensive in blocking racial slurs, nor capable of both blocking common sexual invicative and allowing medical and scientific searches.

#141 ::: hapax ::: (view all by) ::: August 21, 2008, 11:50 PM:

Oh, and I'll add that any library software that blocks access to Conrad's NIGGER OF THE NARCISSUS, not to mention Langston Hughes's poetry, deserves to be burned in bonfires whilst lovers of the English language dance around it.

Unfortunately, that includes the software that my library has installed...

#142 ::: C. Wingate ::: (view all by) ::: August 22, 2008, 12:09 AM:

One might observe that there's not much point in it all considering what the particle says about Carter's immediate successor.

Oh, why bother.

#143 ::: Adrian Smith ::: (view all by) ::: August 22, 2008, 12:15 AM:

A couple of UK ones:

minge

and "minger", oddly unrelated.

berk

Might as well stick in all the 836471926 synonyms for "fool" in that case.

"Orgy" would catch unexceptionable metaphorical usages like "an orgy of [...]".

And why is "fag" (cigarette) in the UK core?

"Pee" could conceivably go into UK additional, if people need to be protected from things like "crap". Also "frig", which often serves as a milder version of "fuck".

You could even stick "prude" in somewhere, as it's the epithet most likely to be directed at those deploying the software.

#144 ::: Xopher ::: (view all by) ::: August 22, 2008, 12:33 AM:

Thank you, everyone who has pointed out that 'gay' shouldn't be on the list. I agree. Better to be dissed than not to exist.

#145 ::: mcz ::: (view all by) ::: August 22, 2008, 12:55 AM:

I feel that berk belongs in the list because of its pedigree:


cunt --> Berkshire Hunt --> berk

#146 ::: Zeborah ::: (view all by) ::: August 22, 2008, 12:55 AM:

I'd really recommend going light on the stemming and leaving out terms that could possibly be used in legitimate searches. A lot of those words would be necessary for medical students in particular, and words like "erotic, fetish, kink" would be important for psychology, gender studies, and such.

My core list would be something really short and heavily asterisked, say:

arse* (NB: I'd allow people to search for donkeys. "Ass" shouldn't be core in British English anyway; saying someone's an ass is completely different from saying they're an arse.)
asshole
arsehole
clit*
cocksucker
cunt*
fuck
kike*
milf*
nigger
piss*
shit*
shite*
twat*

And if the libraries in question want to be prudes, let them add the words they don't like themselves.

#147 ::: JHomes ::: (view all by) ::: August 22, 2008, 12:57 AM:

@143

berk

Might as well stick in all the 836471926 synonyms for "fool" in that case.

Even the ones whose derivation doesn't come via "Berkeley Hunt"?

JHomes.

#148 ::: Adrian Smith ::: (view all by) ::: August 22, 2008, 01:15 AM:

cunt --> Berkshire Hunt --> berk

Oh yeah, I'd forgotten that - but it seems to have lost quite a lot of venom compared to the original.

Do we really want to start checking etymologies and stuff?

#149 ::: Paula Lieberman ::: (view all by) ::: August 22, 2008, 01:16 AM:

What, filter out Matsushita or veal spicata?! "Fanny" was not all that uncommon as a female first name.

Anus and anal are medical terms and also used when talking about maladies that pets can have... bitch is a female dog and used politely when referring to female dogs.

Cum isn't there, however, screening it would knock out a lot of Latin!

dickhead
pussylips
eating pussy
cocksucking
deep throating
ball busting

#150 ::: Dave Bell ::: (view all by) ::: August 22, 2008, 01:20 AM:

The victim's resort in the Scunthorpe case was to use the placename "Frodingham", which was one of the villages which combined to form the town.

"Frodingham"

Say it a few times and it starts to sound pretty dreadful.

#151 ::: Serge ::: (view all by) ::: August 22, 2008, 01:23 AM:

Paula Lieberman @ 149... bitch is a female dog and used politely when referring to female dogs

"There is a name for you, ladies, but it isn't used in high society... outside of a kennel."
- Joan Crawford in The Women.

#152 ::: abi ::: (view all by) ::: August 22, 2008, 01:43 AM:

For clarity, because I clearly did not explain this well enough (overly florid language actually blocks communication):

This list will not block searches. We do not block any terms in our searching. None of them.

It will not block cataloging. Data from trusted systems is not filtered. This is only aimed at content created by library patrons—something most libraries don't include at all.

It won't even block a member of the general public from entering a specific term for his or her private use.

What it will block is the indexing and display of information entered by a member of the general public. This is for a module that lets tags, lists and reviews entered by users be searched and displayed alongside regular catalog information. We want that to be clean.

#153 ::: Dave Bell ::: (view all by) ::: August 22, 2008, 01:50 AM:

Should have googled.

Computer Underground Digest

You may recognise the name of the contributor. :)

There is a British clothing retailer with the name "French Connection UK". They use the acronym a lot, including their .com domain-name.

Since I am working on a wedding scene for a story, with deliberately bawdy speeches by the natives, I've become quite aware of all the ways in which ordinary English phrases can be used.

Old men do say, no farmer buys a field,
That hath not taken plough from other men.
That folding furrow; soft, compliant, earth,
Well-proven by the crop it lately bears,
Is worth it's price, and yet 'tis not enough.
There are the gentle hills, the hidden vales,
And secret bosky corners, safe from view.
The spirit seeks out more than granary,
These lands, I find, are pleasing to my eye.

#154 ::: Dave Bell ::: (view all by) ::: August 22, 2008, 01:57 AM:

abi, given the way some people act, it is worth trying to limit what gets indexed and displayed from the data entered by the general public.

That still doesn't dodge the basic problems.

And, because the problems are difficult, I'm not sure that local configuration is going to help. J. Random Librarian isn't going to be coming to Making Light.

#155 ::: Earl Cooley III ::: (view all by) ::: August 22, 2008, 02:42 AM:

JESR #140: Wow. All this way, and it's still left for me to say: siwash, papoose, squaw, brave, and chief.

All of the times I have been called "chief" over the years (usually in the context of being either a subject matter expert or skilled and notably older than my peers) it has been a term of respect. Why do you think that word should be filtered? I don't think many Chief Petty Officers would agree with this either.

#156 ::: abi ::: (view all by) ::: August 22, 2008, 02:53 AM:

Dave @154 inter alia:

It is impossible to get a perfect solution. What I am trying to do is finesse this into the best possible solution.

Blacklists work pretty well in many contexts. I'd hate to run any blog commenting system without one. They require care in their implementation, which is what I'm working on now. They also require care in their maintenance, and we provide assistance to our customers in that regard as well.

The only way to solve the problem entirely is not to do this, and that's not going to happen. Our customers (the librarians) really like this piece of functionality, and I don't blame them.

Let's not make the best the enemy of the good.

Unless you have a completely different practical approach to suggest? Because that would be cool, too.

#157 ::: Tlönista ::: (view all by) ::: August 22, 2008, 03:02 AM:

For the same reason "boy" is a racial slur in some situations and perfectly innocuous in others.

#158 ::: Zeborah ::: (view all by) ::: August 22, 2008, 03:47 AM:

Abi, I get what you're working on - I wasn't clear myself because I was posting between missing one bus and not missing another bus. The point of letting users add tags to catalogue records is to supplement the frustratingly often-useless official subject headings; sharing them with other users is the point. Keeping them clean is an admirable goal and a lot of librarians would be on board with that; but likewise a lot of librarians would be quick to point out that doing so at the expense of usefulness is maybe not so admirable. Why should med students not be able to share tags on certain portions of the anatomy? Or English or film students not be able to share tags on erotica? Or psychology students not be able to share tags on fetishes? Or gender studies students not be able to share tags on sex?

After posting I also realised I confused the distinction between stemming and truncation. I'm not certain these are the official definitions, but what I mean by them is:

stemming: where the computer automatically adds/removes a set of inflectional suffixes eg -s, -es, -d, -ed, -ing.

truncation: where the computer finds any word starting with (or containing) a string.

Truncation is a lot more powerful than stemming, and it's this that I think you should go lightly on. It'd probably be reasonable to put stemming on all terms; but I'd restrict truncation to those terms which will not appear in useful and unobjectionable terms.

And I'd keep the core list of terms, as I said, short. You've said it's a customisable list, so librarians can make it stricter if they need to, but why should it default to disadvantaging people studying certain topics? (I speak from an academic library point of view because I work in one; public library patrons have different needs and different ways in which filtering can cause problems.)

Personally I don't see why, if a word is already common in the library catalogue, users shouldn't be able to use that same word in their tags. To this end, I've just run your original core US list through the University of Canterbury catalogue. Numbers show a general keyword search with end truncation / a general keyword search without truncation / a title keyword search without truncation.

anal* 23021/7/0
anus 31/3/0
ass* 48134/49/23
asshole 3/1/1
bitch 43/31/10
clit 88/1/1
cock* 772/83/24
cocksucker 0/0/0
cunt 5/0/0
fag 367/9/4
fuck 19/12/1
kike 10/4/3 (all Japanese)
milf 465/0/0
nigger 29/24/16
penis 32/16/0
piss 69/4/0
shit 231/19/2
twat 0/0/0
whore 80/50/20
(Note that this is only end truncation as the UC library catalogue doesn't allow front truncation.)

At the very least, those with significantly higher numbers for the truncated keywords are probably ones you shouldn't truncate. Ie, other than those you've already marked with an asterisk:

anus* (beginning of a number of surnames)
clit* (linguistics)
cock* (cockade, cockroach, beginning of surnames)
cunt* (Cuntz algebras, misc)
fag* (beginning of surnames; foreign usages)
milf* (Milford Sound; other Milfords including a publisher)
penis* (various less common surnames)
piss* (Pissarro and other surnames)
shit* (Japanese titles and surnames)

#159 ::: Dave Bell ::: (view all by) ::: August 22, 2008, 04:20 AM:

abi, might it be better to default to a really short list, shorter than your current minimum, and use obvious words such as "cunt" and "cock" to show the risks. I mean, "ass"? That four-legged horse-like critter? (Yes, I know.)

And make it clear that the default is very much a minimum, needing adjustment to fit a particular library service's specific requirements.

Also, pretty obviously, an exception list, into which words such as "Scunthorpe" and "Shitake" can be put. (For the USA, placename lists by State. "Hell" is, I think, a US placename.)

It may be the exception-handling which makes or breaks this software.

User-configuration needs user-education. And this whole comment thread can be seen as driven partly by a misunderstanding of what the software is meant to do, combined with a fear of what it could do.

#160 ::: chris y ::: (view all by) ::: August 22, 2008, 05:01 AM:

I read a story about a woman called Dickson who was rejected by the registration software on a community site somewhere in the US. When she got snarky and tried Penisson, she went straight in. There's a moral there, and I think it's basically that these lists don't work. In about 1980, the company I worked for used a similar approach with their mainframe sales software (for passwords), and all that happened was that a lot of junior staff spent their breaks seeing what they could get away with.

Wherefore, I endorse everything Dave Bell says. Especially about user configuration.

#161 ::: abi ::: (view all by) ::: August 22, 2008, 05:25 AM:

Zeborah @158:
Oooh, data...shiny. I shall have to dig through this later on today.

You are correct, pretty much in your definitions of stemming and truncation (though I would call the latter substringing). Our stemming engine could be said to be the thing that knows to "change the y to i and add es".

Our stemming engine, which we use to generate matches on user queries as well, is actually more sophisticated than we need for this purpose. We may have to dumb it down to tighten up the matches.

#162 ::: abi ::: (view all by) ::: August 22, 2008, 05:39 AM:

Dave @159:

Although I take your point, you have to remember that the number of librarians who want to spend a lot of time discussing whether "fuck" covers "motherfucker" or not is limited. Note how even some of our commentariat here, a hardened lot if ever I have seen one, have been blushing at their computers. Picture the face to face meeting, or even phone call, which includes lines like, "Next on the agenda is 'nigger'." I'm sure you can see that not everyone will be keen.

The result is that, if we don't offer a relatively complete base list, some libraries will adopt one from somewhere else. It won't be tailored to our stemming engine, and will likely be over-inclusive (rather than taking the risk of letting things through).

Indeed, I started with a library's list that included the word "adult". That took out all the tags about "young adult literature", for instance.

#163 ::: David DeLaney ::: (view all by) ::: August 22, 2008, 06:26 AM:

Question: Perhaps a slight change in the software's intent might be helpful, given what abi has said this filtering list would be used for? Instead of outright banning the input of the forbidden words into tags from the general public ... would it be more useful to have the software _set aside_ tags with these words/word-pieces in them, to be looked at by a "moderator" for discarding or approval? Perhaps with the offending portion of the tag marked for visibility? (This could also have a side-effect of allowing said moderators to keep track of which users, if any, were adding unwanted data tags repeatedly, and taking Steps towards getting them to clean up their language...)

A filter list's usefulness depends strongly on where it'll be used, and why; stuff that one has to keep repeatedly weeding out of a general chat channel, for example, might turn up much less often in tags like these. And vice versa.

--Dave

#164 ::: abi ::: (view all by) ::: August 22, 2008, 06:32 AM:

David @163:
An interesting thought. As I may have said already, we already have a thingie that allows a moderator to see what extant tags are blocked by what word (though it's not prettied up yet).

Having a report of what entered terms have been suppressed in the last [insert time period] would be an excellent idea, and linking them to a username would give moderators some idea who might be misbehaving.

If you were in Amsterdam, I'd give you the traditional reward for cleverness around our office, which is cake*.

-----
* No lie.

#165 ::: Cadbury Moose ::: (view all by) ::: August 22, 2008, 06:46 AM:

Automatic 'sanitizing' filters give inappropriate results, too:

"Grimsby and Svaginahorpe Rape Crisis"

"Parbreastion Magic"

Cadbury.

#166 ::: ajay ::: (view all by) ::: August 22, 2008, 06:56 AM:

Indeed, I started with a library's list that included the word "adult". That took out all the tags about "young adult literature", for instance.

"Viewers are warned that this programme deals with adult themes - specifically cynicism, exhaustion, compromise, the gradual slide towards irrelevant senility, and monthly mortgage payments".

Sex? Drugs? Partial nudity? Swearing? Violence? Those are teenage themes.

#167 ::: James D. Macdonald ::: (view all by) ::: August 22, 2008, 06:57 AM:

I still think that a Bayesian filter, which learns, would be the way to go.

#168 ::: Paul Duncanson ::: (view all by) ::: August 22, 2008, 07:21 AM:

Mark Wise @ 73: Isn't bloody more offensive than USians tend to think?
Depends on where you are, I guess. In some parts of Australia, it is practically punctu-bloody-ation.


Re: Gay. Given that just about every word that can be used to describe any sexual act, organ or other piece of equipment seems to offend someone, there must be people out there who are offended by all reference to sexuality. On that basis, an argument could be made for keeping "gay" on the list but only if "straight" is also included. And possibly anything prefixed with "bi".

#169 ::: MAL ::: (view all by) ::: August 22, 2008, 08:00 AM:

Just delurking to add a botanical reason for not stemming either clit (the genus Clitandra, for instance) or fag (the beeches, genus Fagus, and their plant family Fagaceae, for instance).

#170 ::: martyn44 ::: (view all by) ::: August 22, 2008, 08:16 AM:

I see that Dame Jaqueline Wilson's 'My Sister Jodie' is going to be amended in the next edition. Because 1 mother complained, the very deliberate use of 'twat' will be replaced by 'twit' - which may describe her publishers but doesn't mean quite the same and somewhat defuses the impact of a book that has already sold 150000 copies.

#171 ::: R. M. Koske ::: (view all by) ::: August 22, 2008, 08:48 AM:

#24, Doctor Science -

Thank you for putting spearchucker on your list. I thought it was roughly equivalent to redshirt until you added it and I looked it up. I don't know that I'd have used it (I favored redshirt as more familiar to me) but without this information, I could very well have done.

#172 ::: Dave Keck ::: (view all by) ::: August 22, 2008, 09:07 AM:

Look I'm being vaguely helpful! Here's a thingamajig from an Advertising Standards Authority that's practically a guidebook to tuning your offensiveness.

I'm bringing it with me next time I talk to the English (just in case).

#173 ::: Dave Keck ::: (view all by) ::: August 22, 2008, 09:09 AM:

(I might have said "aforementioned" thingamajig now that I look closely).

#174 ::: John Stanning ::: (view all by) ::: August 22, 2008, 09:20 AM:

I'd expect the Making Light community to know this, but I'll point out anyway that "Haji" (see #24) is a title of respect for a Muslim who has made the pilgrimage to Mecca. Yes, it's been used by a few morons as a term of abuse, but we owe it to civilisation to resist such a word being hijacked by those same morons. Putting it on a list of "banned words" not only invites contempt, it's letting the moronic fringe take over the language.

I mention "Haji" just as an example - there are others. Similarly, rejecting someone's name because it (or part of it) is on some foolish list is offensive and ignorant, and we should be saying so, loud and long.

We must resist this sort of idiocy, not surrender to it.

#175 ::: Rob Rusick ::: (view all by) ::: August 22, 2008, 09:27 AM:

My fair city has been reported as ranking second in the nation for looking up profanities on the internet. Someone comments we should strive to be first.

#176 ::: Ingvar M ::: (view all by) ::: August 22, 2008, 09:31 AM:

On the UK list:
toss (possibly, euphemistic term for male masturbation; also means "throw")
tosser (unless "toss" is non-stemming)

I see that "cock" is not on the UK list, this is good as it blocks (among other things) discussion of male fowl and Cockermouth (in Cumbria).

I suspect people wanting to tag books about Penistone (outside Sheffield) may be annoyed by the non-stemming nature of "penis", though.

#177 ::: Bruce E. Durocher II ::: (view all by) ::: August 22, 2008, 09:52 AM:

And the thought of what Twain (or, god forbid, Brann) could have said without this list even twitching is both frightening aand awe-inspiring.

#178 ::: Vassilissa ::: (view all by) ::: August 22, 2008, 09:56 AM:

When I tried signing up for Second Life, my net-name was deemed unacceptable. It took me a very long time to work out why, since their error message implied that the name was already taken, not that there was something offensive about 'Vass'.

On the subject of songs:
Mum's out, Dad's out, let's talk rude
Pee, po, belly, bum, drawers.
Dance in the kitchen in the nude,
Pee, po, belly, bum, drawers,
Let's write rude words all down our street,
Stick out our tongues at the people we meet,
Let's have an intellectual treat,
Pee, po, belly, bum, drawers.
- Flanders and Swan

#179 ::: hapax ::: (view all by) ::: August 22, 2008, 10:27 AM:

abi, your explanation of the software's purpose does alleviate a lot of my concerns. I do think the idea of "flagging" rather than "blocking" offensive tags is an excellent one, though.

I must confess, I do hatehatehate Library 2.0 -- that's what the idea of incorporating user-generated content into library systems is called here, I don't know if it's an international term -- despite the fact that it is very much the hip and fashionable trend.

It's not concern about "naughty" tagging, or some sort of huffy notion that *of course* librarians will provide superior tags than the Great Unwashed. (For heavens sake, this is the same crowd that took nearly a century to crawl from "aeroplane" to "aircraft" to "airplanes" in the subject headings!)

It's just that one of the great contributions, imo, of the library database is the very strictly controlled language. It is one of the few ways to remain sane in the onslaught of information overload; and if it limits flexibility in access, it also forces precision in querying. I think any reference librarian will attest that "not really thinking through what I'm asking and thus getting too much" is a far more common problem at the reference desk than "knowing exactly what I want and not finding any."

But that's not the fault of your list. And is hardly relevant to your query, although it is a definite pleasure to vent...

But I must respectfully note that anyone who thinks "the number of librarians who want to spend a lot of time discussing whether "fuck" covers "motherfucker" or not is limited" has definitely not spent as much time as I have on cataloging listservs.

#180 ::: Joel Polowin ::: (view all by) ::: August 22, 2008, 10:28 AM:

Felgercarb, nimnul, grut.

Swut, joojoofloop, turlingdrome. B*lg**m. Zark, zarking, etc.; possibly derived from 'Zarquon'..?

"Jesus H."

#181 ::: Joel Polowin ::: (view all by) ::: August 22, 2008, 10:30 AM:

Er, that should be "joojooflop".

"Don't write naughty words on walls if you can't spell..."

#182 ::: ajay ::: (view all by) ::: August 22, 2008, 10:39 AM:

Presumably the server will also be set to block occurrences of "By Grabthar's Hammer!"

#183 ::: Xopher ::: (view all by) ::: August 22, 2008, 10:48 AM:

Paula, 'fanny' is a name and a harmless term for 'buttocks' in the US. Its UK meaning is 180° from there, if you understand me! It's not AFAIK used as a name there.

abi, that makes a lot more sense now, but I'd still want to be able to create tags for Gay Studies, Gay Literature, Gays in the Military, Gay Marriage, Gay Liberation Movement, etc.

#184 ::: Serge ::: (view all by) ::: August 22, 2008, 11:00 AM:

"What a maroon!"
- Bugs Bunny.

#185 ::: John Mark Ockerbloom ::: (view all by) ::: August 22, 2008, 11:02 AM:

We use a tagging system at our library, which can be used to tag comment on items in our catalog as well as items out on the Net (lots of the resources we buy are electronic resources).

We don't filter for content, and haven't seen a need to thus far. Part of the reason is that we're a university library, where users are expected to have some maturity and not be shocked to death by strong language.

Part of it is probably also because every publicly viewable tag is signed, so you know who's responsible for using a particular tag or comment. This sort of social oversight might also work in public libraries in small enough communities. (It might not be as effective in larger communities that feel more anonymous.) You can also make your tagging private, either completely or post by post, but the default is public view.

Our tagging system is popular, but it isn't hugely high-volume either. People tag for both personal and public reasons. You don't want to put roadblocks for personal use, but public viewing doesn't necessarily have to be instant.

So If I felt the need to implement filtering, what I'd probably do is let all tag-posts be accepted instantly for private use, so users could see and use their own tags without any delay. Then I'd have an automated system to decide whether to put a tag-post out immediately for public viewing and indexing, or flag it for librarian review.

The decision could be based on both a keyword scan and an identity check. E.g. tags from new or anonymous users (if you allow anonymous users at all-- you may have spam problems if you do), or with highly controversial terms, could get flagged for review before going public. Users with a track record for responsible posts would have fewer, or no, triggers for review. Troublesome users might have all their posts flagged or made private by default.

Posts would either be approved or rejected for public view; trying to let them be seen but wth 'sanitized' content would probably not be worth the trouble.

#186 ::: abi ::: (view all by) ::: August 22, 2008, 11:06 AM:

hapax @179:
Yes, we're a Library 2.0-type place, which does mean we sit on a slightly different place on the "controlled vocabulary" vs "uncontrolled vocabulary" (taxonomy vs folksonomy, if you will) spectrum than you do. That doesn't mean we're replacing Library of Congress subject headings with user-generated tags or anything; user contributions are simply another source of search terms, with its own strengths and weaknesses. I'm fond of them, personally, because they tell me what users find notable about a given item. And surely that is useful for helping other users to find it.

What we're talking about here is only part of the means to the end that you and I clearly share: getting library patrons to the thing they want as easily as possible (even if they don't know what they want or how to ask for it).

This is not really the time or the place to get into relevancy algorithms, or the ways that our software can help a user refine and steer a vague query. But those are much more important than accepting user input and adding it to the searchable data in our system.

(Shorter me: Don't hate me because I'm 2.0!)

But I must respectfully note that anyone who thinks "the number of librarians who want to spend a lot of time discussing whether "fuck" covers "motherfucker" or not is limited" has definitely not spent as much time as I have on cataloging listservs.

Clearly not. Good thing or bad thing?

#187 ::: abi ::: (view all by) ::: August 22, 2008, 11:08 AM:

Xopher:

Your feedback, among others, means that "gay" is probably going completely off the list. Which is why I posted this here, so thank you.

#188 ::: abi ::: (view all by) ::: August 22, 2008, 11:14 AM:

John @185:
Your system and ours are not far apart.

A user can always see his/her own content, even if it is comprised exclusively of the blackest of the blacklisted terms. (Actually, this is true even if we've suppressed the entire user as well.)

This filter is to decide which content then gets imported into the catalog for public view and searching.

As suggested, suppressed tags will probably be available for review through periodic reports, but the solution would be to tweak the blacklist rather than allow individual ones. If that turns out to be a problem, we will look at it at a later time.

I suspect that our university customers will run with no blacklist, or a very minimal one. However, as I mentioned, we also sell to public libraries, which have very different community standards to meet.

#189 ::: Faren Miller ::: (view all by) ::: August 22, 2008, 11:21 AM:

All week long, in the comic "Non Sequitur", Danae (the "bad girl" daughter) has been discussing ways to swear without being caught. Most involved archaic expressions -- reminiscent of grumpy Victorians -- but in today's panel she has made up her own phrase, which she intends to register: "Holy Farglesnot!" Lucy the talking pony is not impressed.

(My link may not work for others, since it involves registration, but someone here can probably come up with a better one.)

#190 ::: Serge ::: (view all by) ::: August 22, 2008, 11:40 AM:

"Baldrick, I want you to go out and buy a turkey so large you'd think it's mother had been rogered by an omnibus."

#191 ::: hapax ::: (view all by) ::: August 22, 2008, 11:40 AM:

abi@186: "Shorter me: Don't hate me because I'm 2.0!"

Oh, I don't, I don't. What I hate is the indiscriminate, unintelligent adoption of 2.0 -- e.g., the incorporation of Amazon-style "rating systems" into library catalogs, which almost always end up binary, polarizing, and useless -- which is of course not where you're going.

Like almost everything else controversial, it's not a matter of good v evil, but of two conflicting goods -- I'd say in this case Ranganathan's Law #4 v #5, with a tasty frosting of #1 on both sides.

And with THAT incredibly librarian-geeky allusion, I'll fade back into the background, to continue to take notes on how to rilly rilly offend people on two different continents.

#192 ::: John Stanning ::: (view all by) ::: August 22, 2008, 12:03 PM:

Xopher, Fanny is and was rare as an official name in the UK, but very common as an informal name for a girl called Frances.
The First Aid Nursing Yeomanry - FANY - was the corps of British army nurses (female, of course) in the 1914-1918 war, and in the 1939-45 war thay also drove trucks and even did spook work with SOE. Its survivors proudly call themselves 'Fannies'.

#193 ::: cgeye ::: (view all by) ::: August 22, 2008, 12:05 PM:

the broader term "golliwog" for "wog" should be added.

"Sissy" should be considered if it's in the context of homophobic or misogynist talk pointed at effeminate men.

#194 ::: lorax ::: (view all by) ::: August 22, 2008, 12:06 PM:

abi #187, thank you very much.

#195 ::: ajay ::: (view all by) ::: August 22, 2008, 12:24 PM:

192: survivors? The Fannies still exist! I know some of them...

http://www.fany.org.uk/

Mission: To provide response teams in support of the Civil and Military authorities within London during a major event, incident, or in planning and exercise roles.

They also (heaven knows why) are very keen on getting their gels through jump school...

#196 ::: John Mark Ockerbloom ::: (view all by) ::: August 22, 2008, 12:27 PM:

Oh, one other filter any tagging system will definitely need (and which I'm sure abi's group has already thought of, but others might not): You need to strip out, or very carefully control, anything that can be treated by a client as HTML code or other content that isn't visible, uninterpreted text. ("Very carefully control" would include using a tight whitelist of allowed characters and markup, not a blacklist of things you've thought of that might be risky.)

Cross-site scripting and other malware attacks are no fun for libraries and their patrons to deal with.

#197 ::: abi ::: (view all by) ::: August 22, 2008, 12:40 PM:

John @196:

Yes, filtering for HTML code and other nasties was built in from the start.

Because malware is Bad. Do Not Want.

#198 ::: Serge ::: (view all by) ::: August 22, 2008, 12:43 PM:

Fanny Ardant... Be still, my heart.

#199 ::: Jen Birren ::: (view all by) ::: August 22, 2008, 12:58 PM:

UK additional: "batty boy"? TTBOMK it's always an insult, not a neutral term, but I could be wrong- anybody closer to Caribbean-derived slang know?

#200 ::: JESR ::: (view all by) ::: August 22, 2008, 01:03 PM:

Earl Cooley at 155, look at that word in the context of the other words I've listed. Any of them ever thrown at you in anger or contempt?

Which reminds me of a story one of my Uncles told, about almost throwing a punch the first time someone called him Chief, after his promotion to CPO in 1944. Which would have been embarrassing, since he was SP. Growing up Res adjacent apparently gave him sensitivities he had not previously realized.

#201 ::: Carol Witt ::: (view all by) ::: August 22, 2008, 01:13 PM:

Don't forget Fanny Price!

#203 ::: abi ::: (view all by) ::: August 22, 2008, 01:30 PM:

novalis @202:
Your quest may be hopeless.

Depends on my quest.

I'm not trying to prevent kids from being groomed by paedophiles with this blacklist. I just want Granny not to see the word "fuck" on the results listing when she searches for "Italian Cooking".

I can still think of a few ways to mildly engineer around our entire solution—as opposed to engineering around any particular blacklist term. But our impact assessment is that that will have a fairly small effect if any one else stumbles on them.

Neat article, though. Just not my quest.

#204 ::: Serge ::: (view all by) ::: August 22, 2008, 01:40 PM:

A few years ago, when my Hong King-born co-worker married an American-born Chinese named John, she'd refer to him as 'her John' to distinguish him from our team's own John. I explained to her what 'her John' could mean. I think some blushing ensued.

#205 ::: Xopher ::: (view all by) ::: August 22, 2008, 01:47 PM:

JESR 200: Well, I've never seen the word 'siwash', but there are two classes among the other words you list. 'Papoose' and 'squaw' are terms that refer exclusively to indigenous North Americans, and are in fact derived from American languages; they have no other use. 'Chief' and 'brave', on the other hand, are common English words, and are offensive only in the context of their application to the aforementioned people. Unless I missed it, abi's company isn't doing context checking; this is a list of words whose presence disallows a tag from being imported into the public tagbase.

I can't see any reason that 'papoose' and 'squaw' would need to be among the public tags; 'brave' as an adjective might well be useful, so you wouldn't want to exclude it, and 'chief' you definitely wouldn't want to exclude, even if you don't care about such things as "Chiefs of the Lakota, 1872-1877."

I think having a human eye look at the tag and decide whether it's offensive or not would be best for cases like this.

#206 ::: Fragano Ledgister ::: (view all by) ::: August 22, 2008, 01:53 PM:

Jen Birren #199: But note that while 'batty b(w)oy' and 'batty man' are pejorative, 'batty' by itself is Western Caribbean (both Jamaican and Belizean) baby talk for 'buttocks'.

#207 ::: trifles ::: (view all by) ::: August 22, 2008, 02:03 PM:

Moving gently in the direction of historical lovelies, herewith is a link to the:

DICTIONARY OF THE VULGAR TONGUE.

A
DICTIONARY
OF
BUCKISH SLANG, UNIVERSITY WIT,
AND
PICKPOCKET ELOQUENCE.


UNABRIDGED FROM THE ORIGINAL 1811 EDITION WITH A FOREWORD BY
ROBERT CROMIE

COMPILED ORIGINALLY BY CAPTAIN GROSE.

AND NOW CONSIDERABLY ALTERED AND ENLARGED, WITH THE MODERN
CHANGES AND IMPROVEMENTS, BY A MEMBER OF THE WHIP CLUB.

ASSISTED BY HELL-FIRE DICK, AND JAMES GORDON, ESQRS. OF CAMBRIDGE;
AND WILLIAM SOAMES, ESQ. OF THE HON. SOCIETY OF NEWMAN'S HOTEL.


Cheery lads. Terms include: abbess, to blow the grounsils, Carvel's ring, laced mutton, riding St. George, twiddle poop.

#208 ::: Constance ::: (view all by) ::: August 22, 2008, 02:22 PM:

Is twittybint allowed?

How do we refer to Fan*y Price and Fa*ny Burney?

Love, C.

#209 ::: joann ::: (view all by) ::: August 22, 2008, 02:23 PM:

abi #162: Picture the face to face meeting, or even phone call, which includes lines like, "Next on the agenda is 'nigger'." I'm sure you can see that not everyone will be keen.

Not everyone, but it might turn into an attraction for some people. Ever hear/read the Lenny Bruce routine about "He said blah blah blah"?

(My own take on the whole thing, after your more detailed explanation, is that you're more involved in graffiti removal than in censorship. This has its own issues, notably whether some graffiti are an artform, and in particular, which ones.)

#210 ::: JerolJ ::: (view all by) ::: August 22, 2008, 02:35 PM:

Mary @ 65. If I were God, Cheney's last name would be a swear word too.

#211 ::: JESR ::: (view all by) ::: August 22, 2008, 02:42 PM:

Xopher, siwash is Chinook jargon, from the French sauvage; my whole list is as location-specific as Fragano Ledgister's post at #54. "Brave" is very much eqivalent to "Boy" in context; "Chief" is a word used at the end of a billy club or as the lock turns in a holding cell. All of my list are of equivalent offensiveness to spear-chucker on the mild end and the n word on the strong one.

And yes, exactly: all of this discussion seems to be coming down to the need for human moderation to sort out perfectly OK constructions using common English words from expressions of bigotry and hatred which have no place in a library's tagging system.

#212 ::: Serge ::: (view all by) ::: August 22, 2008, 02:51 PM:

"Aren't you too old to be called a boy?"
- Robert Ryan to Woody Strode in 1953's City Beneath the Sea.

#213 ::: Earl Cooley III ::: (view all by) ::: August 22, 2008, 03:08 PM:

JESR #200: Earl Cooley at 155, look at that word in the context of the other words I've listed. Any of them ever thrown at you in anger or contempt?

Nope. In any case, the members of my family to which they may have applied married whites too early for their descendants to be counted in the major censuses of the time.

The terms of discrimination most often applied to me either deal with my disabilities or my age; disability-related euphemisms are more offensive to me than pure insults, because of the element of perceived condescension they bring to the table.

#214 ::: Clifton Royston ::: (view all by) ::: August 22, 2008, 03:25 PM:

abi:

Just to clarify, I was kind of grokking the general usage scenario from the beginning. Two cases worth considering:

Actor: 14-year-old Joe Jones has just learned the word "anus". He has been hearing his father rant about the damn "Mes'cans" a lot lately.

Scenario: Joe Jones attempts to tag all Latino-themed teen books he can find in the catalog with "spic anus licker".

My concern is how your software will distinguish this case from another probable usage scenario, viz:

Actor: Maribel Smith loved the 'Cooking with Asian Spices' she just checked out from the library. Maribel has a low frustration threshold and gets angry with computers easily.

Scenario: Maribel is attempting to tag the cookbook she is returning with "spicy finger-licking delicious".


I think you'll have the best chance of distinguishing these cases and making the first fail and the latter succeed if you use only fairly strict grammatical stemming, not wild-card matching. It sounds like your stemmer falls somewhere between the two, which is why I was thinking you might want to either bypass it for this purpose or turn down its "sensitivity."

A further refinement would be to assign scores to each word, sublist of words, or list, associated with the perceived intensity of each word and trigger rejection over some threshold, moderation at another threshold.

Parenthetical note:
Thinking about the more general problem, I realized there's a specialized implicit grammar for forming English profanity compounds. E.g. given a base cussword, you can usually append -head, -hole, -lick-, and various other suffixes to it and get a new improved cussword.

Try it, it's fun! Offend your neighbors, alarm your cats!

#215 ::: Jules ::: (view all by) ::: August 22, 2008, 03:41 PM:

@10: As for the UK additional, given that 'paki' is there I'd say 'chinky' could be added as well

'Chink' is more common. 'chinky' would usually be used as an adjective, or possibly in reference to a restaurant.

@46: Do socio-economic terms count, or just racial ones? Thinking "chav" for the UK Additonal list

I would tend to think of chav as more cultural than socio-economic. While it's a culture that's strongly associated with a particular socio-economic background, there isn't a 1-1 mapping. Also, most members of the group self-identify as chavs. So, no, I wouldn't have felt that it's appropriate -- the word isn't really offensive because those it's aimed at don't think of it negatively.

@56: skank," [...] (I'm not sure if this is US specific.)

No, it's used in the UK as well.

@73: Isn't bloody more offensive than USians tend to think?

I'm not easily offended by language, but I don't think so. It's worth considering, but note it is not offensive when used by itself or with other non-offensive terms (e.g. "Bloody Mary"). Think of it as a modifier which multiplies the offensiveness of its subject by some factor. If the subject is not offensive, the result isn't offensive. If the subject is quite offensive, the result is very offensive.

@121: I may have mentioned this before, but I was once a fairly active participant in a forum where they instituted a filtering program that did an automatic string-replace, swapping the swear words out for less offensive terms.

"Cock" was swapped with "Thingy."

The problem?

It was a forum for poultry breeders.

I've seen some ill-advised use of this feature elsewhere. A friend of mine regularly uses a UK-based forum for rat keepers. The forum has been in trouble in the past for comments its users have made concerning one of the country's best known chain pet shops, Pets At Home. They therefore decided to use this feature to replace any reference to said pet shop, including a number of variations on the name, with the string "petshop". Generally the results are OK, stuff like "I bought a new rat from petshop today, but he seems to have a broken leg" or whatever.

One day my friend was reading a post in which the poster was generally complaining about some situation or another, unrelated to said pet shop. Then he got to a paragraph that simply read "petshop!". It took a while to figure out that the poster had originally written 'Pah!' as a general exclamation, and this was filtered as an abbreviation of the name.

@132: In terms of racial epithets, I would probably add hebe, yid, beaner, wetback, spearchucker, junglebunny, chink, gook and possibly spade.

Hebe and spade are too easily confused with common words with useful meanings, IMO.

@183: 'fanny' is a name and a harmless term for 'buttocks' in the US. Its UK meaning is 180° from there, if you understand me! It's not AFAIK used as a name there.

It's not a popular name, but it is generally recognised as a name, and was popular as recently as the 1950s. Also, the use of the word as a euphemism is considered very old-fashioned now. I'd imagine the euphemism may die out, at which point the name may or may not become more popular again.


(And a totally offtopic note to Patrick/Teresa/anyone else who may work on the site stylesheets:

'EM EM { font-style: normal; }' is a useful rule to include.)

#216 ::: Colin Hinz ::: (view all by) ::: August 22, 2008, 03:41 PM:

JESR @ #211:

Where I grew up, that being Saskatchewan, "siwash" was just a word for a garment which seems to be called a "Cowichan sweater" everywhere else. (Yeah, Saskatchewan, the place where 'hoodies' are called 'bunnyhugs', and the slang word for 'chocolate milk' is even more obscure.)

Beware the perils of localisms!

#217 ::: Debbie ::: (view all by) ::: August 22, 2008, 03:51 PM:

Clifton Royston @214 -- how in the world do you happen to know my mother? (the fake name didn't fool me!)

#218 ::: Clifton Royston ::: (view all by) ::: August 22, 2008, 04:11 PM:

Now I'm really tempted to write a generalized swearword generator that could take a list like abi's column 1 and generate the various possible productions such as "fuckheads", "dickhattery", "milflicking", etc. However, I'm at work, and I can't figure this as even vaguely related.

#219 ::: Xopher ::: (view all by) ::: August 22, 2008, 04:32 PM:

Colin Hinz: OK, I've got to know: what is the Saskatchewanese slang for 'chocolate milk'?

Do you have any idea why a hoodie is called a bunnyhug? Being an Old Fart, I'm just getting used to the word 'hoodie'.

#220 ::: Colin Hinz ::: (view all by) ::: August 22, 2008, 05:34 PM:

@219:

Xopher, the Saskatchewanian slang for chocolate milk is "vico", from the name of a now-defunct supplier. And as for "bunnyhug", I know nothing of its origins. It was just a word that everyone knew.

#221 ::: eric ::: (view all by) ::: August 22, 2008, 06:18 PM:

Belgium?

#222 ::: Dave H ::: (view all by) ::: August 22, 2008, 06:23 PM:

I used to work with a guy whose language was fairly salty, and who had no social filter. The team were "bonding" in a pub, and he used the c-word. Our sole female team member objected, to which he responded "What about -" and proceeded to reel off over 30 variations. Most were fairly obvious, but one...

"Okay, I get all of those except ship-sinker."
"What sinks ships?"
"Loose... Oh."

Where a 19 year old got WWII references I declined to pursue.

#223 ::: Clifton Royston ::: (view all by) ::: August 22, 2008, 06:53 PM:

Dave H: jonescrusher (Frank Zappa)

#224 ::: Mary Aileen ::: (view all by) ::: August 22, 2008, 07:00 PM:

Dave H (222): I first read that as "bonding in a [hot] tub," which gives a very different mental image!

#225 ::: Serge ::: (view all by) ::: August 22, 2008, 07:13 PM:

Dave H @ 222... Where a 19 year old got WWII references

Maybe from watching all those 1940s movies on Turner Classic Movies?

#226 ::: Zeborah ::: (view all by) ::: August 22, 2008, 08:04 PM:

hapax@179:
It's called Library 2.0 everywhere, except where it's called Bibliothèque 2.0 and so forth.

I think any reference librarian will attest that "not really thinking through what I'm asking and thus getting too much" is a far more common problem at the reference desk than "knowing exactly what I want and not finding any."

On Google, absolutely; on databases, sometimes. In the library catalogue, quite the reverse. There was a study - Google Books vs BISON - which mentions that at one university library, 18% of searches in the library catalogue return 0 results. 18%! Granted many of these are likely mis-spellings (if I could choose one thing in the whole world to add to our catalogue it would be a Google-style spell-checker), but many are not.

The vast majority of queries I get at the reference desk are from people who couldn't find anything at all, and I have to help them translate their straight-forward query into official library jargon. Who mentioned Grandma searching for "Italian cooking"? She won't get nearly as many results as the catalogue actually holds, because the Library of Congress subject term is "cookery". If our catalogue ever gets user tagging, I'm going to see if there's a way I can run a search on an unuseful term and batch-tag the results with a useful term. (There probably won't be, alas, because that could be as dangerous in the wrong hands as useful in the right ones.)

#227 ::: Eric ::: (view all by) ::: August 22, 2008, 08:54 PM:

My fortitude in reading long comment threads being woefully lacking, I've only read about half way through this one, so my apologies if this has been mentioned already. That said, I'm confused by the fact that you have some racial slurs (nigger, kike, etc) in the core list, and others (spic, wop, paki, etc) in the additional list. Are the former deemed more offensive than the latter, or simply more common, or am I missing some other justification?

#228 ::: Dave H ::: (view all by) ::: August 22, 2008, 08:54 PM:

serge@225
Maybe from watching all those 1940s movies on Turner Classic Movies?

I prefer to think of them being handed down from father to son...

"when a mummy and a daddy love each other very much, he sticks his meatpole in her cocktrough."

#229 ::: Eric ::: (view all by) ::: August 22, 2008, 09:08 PM:

Oh, and on the topic of naughty rhymes, a favorite among my friends circa seventh grade was (to the tune of "Row, row, row your boat"):

Fuck, fuck, fuck a duck,
Screw a kangaroo,
Fingerbang an orangutan,
Support your local zoo!

Orangutan was, of course, often pronounced "orangutang", as apparently rhyming is more important than accuracy amongst seventh-graders.

#230 ::: Joel Polowin ::: (view all by) ::: August 22, 2008, 09:23 PM:

"Orangutan" and "orangutang" are both accepted spellings, according to my dictionary, and there are other variations involving splitting the word with a hyphen or a space. I find both versions in different Discworld books, describing the Librarian.

#231 ::: Rob Rusick ::: (view all by) ::: August 22, 2008, 09:24 PM:

Another localism: A Gannett spinoff (a free paper for the purpose of delivering the classifieds to the younger demographic) celebrated Mother's Day with a cover celebrating the moms.

#232 ::: miriam beetle ::: (view all by) ::: August 22, 2008, 09:41 PM:

dave h,

"when a mummy and a daddy love each other very much, he sticks his meatpole in her cocktrough."

huh. well that's a manifestation of the patriarchy for you, isn't it? that just made me realize how many rude words for the vulva/vagina refer to male anatomy, but i can't think of any rude words for penis which refer to how they (ahem) interact with vaginas.

#233 ::: Phil Palmer ::: (view all by) ::: August 22, 2008, 09:47 PM:

Dit dit dah dit
Dit dit dah
Dah dit dah dit
Dah dit dah.

#234 ::: Jon H ::: (view all by) ::: August 22, 2008, 10:05 PM:

Ah, profanity filters.

When I worked at Britannica.com, there was a story about an incident that occurred when they implemented theirs.

The code went live, and the team started testing it by searching for all kinds of things, like "horse fucker". The 'fucker' would be stripped out and the search results would be returned as if the search term was "horse".

Unfortunately there was a caching bug. The head of development was on the out on the west coast and did a search for 'horse'.

The page that returned said "You searched for horse fucker" etc etc. Whoops!

#235 ::: Paula Lieberman ::: (view all by) ::: August 22, 2008, 10:14 PM:

No planet Uranus ?

#236 ::: Serge ::: (view all by) ::: August 22, 2008, 10:38 PM:

Dave H @ 228... "when a mummy and a daddy love each other very much, he sticks his meatpole in her cocktrough."

Coming up next, ectoplasmic shvanshtukers.

#237 ::: Serge ::: (view all by) ::: August 22, 2008, 10:40 PM:

What was the title of that Philip José Farmer story that began with the line "The mad Fokker struck again"?

#238 ::: James D. Macdonald ::: (view all by) ::: August 22, 2008, 10:51 PM:

"And that Fokker was flying a Messerschmitt!"

#239 ::: Paula Lieberman ::: (view all by) ::: August 22, 2008, 11:12 PM:

#238 James

And he jetted in it, ex/spending his energy.

#240 ::: Earl Cooley III ::: (view all by) ::: August 22, 2008, 11:49 PM:

Serge #237: "The mad Fokker struck again"?

That sounds like Greatheart Silver. I don't have a copy handy to check, though.

#241 ::: Michael Turyn ::: (view all by) ::: August 23, 2008, 01:46 AM:

Some anti-semitic works use "Khazar" as a pejorative alternative to "Jew"---pejorative because of the claim that "the Jews" stole Judaism from "the real Jews" (Anglo-Saxon and Celtic peoples).

Similarly, the term "talmudist-Mystery-Babylon"* is employed.

"Mud people" is also popular for non-African people of colour, sometimes extended to include "off-white" types like Greeks and Southern Italians (who, one must admit, eat garlic in bed)---Northern Italians are treated differently, probably because they're deemed to be citizens of the Salo Republic, and because the Northern League have done well with hate.

*What would "Mystery Babylon Theatre 3000" be like? I can't help but see a guy and his robots making fun of porn, but then again what _won't_ make _any_ of us imagine that?

#242 ::: John Stanning ::: (view all by) ::: August 23, 2008, 02:18 AM:

Joel #230 : whatever an English-language dictionary may say, in Malay (bahasa) the spelling is orang utan (orang man, utan or hutan jungle), pronounced with light emphasis on the first syllable of each word.

#243 ::: Kevin Hayden ::: (view all by) ::: August 23, 2008, 02:26 AM:

The compulsion to chime in with dirty word offerings is obviously enormous, as the number of comments attests to.

Being an old fart, I admit to finding nothing too offensive in any of the suggestions. I know of only two that cause me to shudder and hang my head after uttering: Bush and Cheney.

#244 ::: abi ::: (view all by) ::: August 23, 2008, 03:03 AM:

Zeborah @226:
The LOC vocabulary problem† is fun (what term do you use for gravestones?*). Mass tagging is one solution (watch your edge cases!), but it's easier—if you can—to use software that offers the user the alternatives automatically.

One can do a lot with a good thesaurus, and even translation programs can help (translate the term into another language, then back again, to get possible alternatives; test against the catalog to see which ones are reasonable). There are similar ways to fish out alternative spellings from catalog data and offer them to the reader, either as "did you mean?" or as a related search.

I could get all bright and shiny and excited about how we use the mental model of a conversation with the user to tease out exactly what they're looking for, but I'm not trying to turn this thread into a sales pitch. Contact me offline if you want a link to one of our installations to have a play.

-----
† Which is not to say that a controlled vocabulary doesn't have many strengths, of course
* Sepulchral Monuments, of course.

#245 ::: MLR ::: (view all by) ::: August 23, 2008, 03:13 AM:

I've yet to see a word filter that can't be gotten around with creative spacing (f uck, f u c k), nevertheless, I'll add "foad" to your list.

#246 ::: John Stanning ::: (view all by) ::: August 23, 2008, 04:56 AM:

If you have a mailbox that receives spam, take a look in whereever your spam filter puts its rejects, and observe the creativity of spammers trying to get around spam filters. The variations are almost infinite, using mis-spellings, accented letters, etc. For example, one in today's collection offers Cia|iis and V1aagra (it got filtered anyway, but for other characteristics of the message).

But though I think attempts to block 'bad words' are doomed to failure, somewhere up there there was a glimmer of something else. In someone's list, among all the anatomical words, there was the phrase "I was banned". Hmmm! How about collecting a few more phrases like that, and making a troll filter?

#247 ::: chris y ::: (view all by) ::: August 23, 2008, 05:30 AM:

FWIW, "bloody" is entirely inoffensive in Brit English. For example, my centrist, Anglican mother (b. 1925) used it regularly, and so did most of her friends. It would be out of place in an official statement, but colloquially it's about as strong as "blasted".

#248 ::: Serge ::: (view all by) ::: August 23, 2008, 05:39 AM:

Earl Cooley III @ 240... That's the story. I think I had come across it in one of Byron Preiss's Weird Heroes anthologies of the 1970s, but couldn't be sure. (Which isn't that surprising as I'm recovering from fighting the post-Denvention's nasty flu while I was also preparing a big project for our team. The project just ended, and passed with flying colors, and without any mad Fokker.)

#249 ::: Peter Erwin ::: (view all by) ::: August 23, 2008, 07:11 AM:

Not that you really need any more reasons to exclude "gay" from your lists, but there is also the possibility of people wanting to refer to old movies, books, etc., where the word was used in its prior sense (e.g., if someone wants to refer to Nietsche's The Gay Science, or to the movie The Gay Divorcee).

#250 ::: Peter Erwin ::: (view all by) ::: August 23, 2008, 07:45 AM:

shadowsong @ 109:
Also, the thing the Dutch boy stuck his thumb in is a dike, not a dyke.

"Dyke" is listed in at least some dictionaries as a recognized alternate spelling for "dike" (i.e., earth embankments holding back water). I do have the vague impression that the "y" form may be more common in British use -- e.g., Offa's Dyke.

Just to confuse things further, "dike" is recognized as an alternate spelling for "dyke" (i.e., the usually offensive term for lesbians).

Oh, and "Dyke" is also an English surname. (And, Wikipedia informs me, the name of a small town in Virginia and of a hamlet in Lincolnshire.)

#251 ::: James D. Macdonald ::: (view all by) ::: August 23, 2008, 08:01 AM:

"Dykes" is also a slang term for "diagonal cutters," as in "Hand me those dykes."

#252 ::: Peter Erwin ::: (view all by) ::: August 23, 2008, 08:18 AM:

James @ 251:
And that term is also spelled both ways ("dykes" and "dikes")... as Mark Lieberman of Language Log found out.

(I was going to mention the "diagonal cutters" term in my previous comment, but decided it was too obscure -- clearly I was wrong!)

#253 ::: Paula Lieberman ::: (view all by) ::: August 23, 2008, 09:01 AM:

The Enola Homosexual dropping bombs?

#254 ::: James D. Macdonald ::: (view all by) ::: August 23, 2008, 09:02 AM:

Given the way electricians talk, it's more usually heard as "Hand me those fuckin' dykes."

#255 ::: Rob Hansen ::: (view all by) ::: August 23, 2008, 09:09 AM:

More than a little surprised to see 'fag' on the
UK core list. It might be an offensivre term for a gay person in the US but over here it's slang for a cigarette. It's also never used as a diminutive for 'faggot' which, in any case, is a large meatball usually eaten with peas and gravy in the UK. We're aware of its US usage but it just isn't used that way in the UK, so why is it on the list?

#256 ::: MattF ::: (view all by) ::: August 23, 2008, 09:42 AM:

So much for my travelogue to Fucking Austria

#257 ::: Faren Miller ::: (view all by) ::: August 23, 2008, 10:26 AM:

Jerol J (#210): Mary @ 65. If I were God, Cheney's last name would be a swear word too. In my family at least, "chaney" was slang for "toilet". (No idea why! Doubt it was a reference to Lon.)

#258 ::: Paul A. ::: (view all by) ::: August 23, 2008, 11:11 AM:

R. M. Koske @ #171:

The term you were probably thinking of is "spear carrier".

#259 ::: Kevin Reid ::: (view all by) ::: August 23, 2008, 11:45 AM:

#197: Yes, filtering for HTML code and other nasties was built in from the start.

*twitch*

You quite likely already know this, and it may be entirely irrelevant to the use case as punctuation is uninteresting, but this way of putting it disturbs me.

Filtering is the wrong way to look at this problem. It is better to declare “the input is ‘plain’ text”, and treat it as such; that is, the text string “1 < 2” is just that and the “<” has no significance. Being arbitrary plain text, when it comes time to put it in a HTML document, one must encode (or ‘escape’) it properly according to the syntax of HTML, i.e.

<p>Contributed mathematical fact: 1 &lt; 2</p>

but this happens at the display end, not user input.

Compared to filtering, this approach is more general (doesn't arbitrarily prohibit text that happens to be significant considered as HTML), more robust (if there's a bug, you fix the display routine, rather than cleaning what the filter let through out of your dataset), and likely easier to implement (every good web framework should have an "encode plain text for HTML" routine).

(On the other hand, filtering is the correct approach for e.g. blog comments, where you want to let the users enter some HTML.)

#260 ::: abi ::: (view all by) ::: August 23, 2008, 11:57 AM:

Yes, I was using the wrong term. I got tired and careless, and I'm not a developer.

#261 ::: Amy Rye ::: (view all by) ::: August 23, 2008, 01:35 PM:

A young acquaintance of mine appears on text messages as "carpetbanger." A female friend was scandalized until she learned it was a reference to his part-time job laying carpet.

For your listening pleasure, I recommend the late Gilda Radner's excellent "Let's Talk Dirty to the Animals."

(sorry, originally appended this to the wrong thread)

#262 ::: Jane Smith ::: (view all by) ::: August 23, 2008, 02:45 PM:

I'm sure that these have been included in shorter forms, but just in case:

rimming

wanker

(And once again I love Making Light for encouraging me to post such things in such a helpful way.)

#263 ::: Ursula L ::: (view all by) ::: August 23, 2008, 03:49 PM:

Would it be possible, in your software, to have the list adapt to the code assigned to the book via the Dewy Decimal or LOC catalog (whichever the library uses)?

That way, you could have a master list, but have exceptions based on the type of book - some words allowed in the medical section, or the history section, but not allowed in the children's books, perhaps. That would allow some automatic correction for context, and for public libraries, it would help them meet the community expectations for different sections, particularly children.

#264 ::: Clifton Royston ::: (view all by) ::: August 23, 2008, 04:02 PM:

#251-2: Yeah, I almost put "dykes" in my list of problematic words, but figured it didn't quite belong with the others I was listing.

abi: I went and googled for "Snowball algorithm" and found the Porter algorithm and mini-language very interesting. It's not as aggressive as it had sounded from what you'd said earlier.

#265 ::: Peter Erwin ::: (view all by) ::: August 23, 2008, 04:14 PM:

Further to Allan Beatty's and Zeborah's comments on the problem of accidentally flagging real surnames (or corporate names) if you apply truncation to "shit", I'll note that there's an Indian surname various spelled "Dikshit" and "Dixshit" (as well as the less-likely-to-be-flagged "Dixit" and "Diksit"). The current Chief Minister of New Delhi is a woman named Sheila Dikshit, for example.

#266 ::: A.J. Luxton ::: (view all by) ::: August 23, 2008, 08:51 PM:

Pleased to see that others have gotten to "gay" before me, and wanting to call extra attention to Gwen @ 9 and her list of words that really, really seem to need asterisks.

(Asterisks all 'round! 'S on me!)

This seems like the best possible blog for a query of this nature, too. Good going.

#267 ::: Rob Rusick ::: (view all by) ::: August 23, 2008, 09:15 PM:

Jane Smith @262: rimming &rarr (Rimmer? — Red Dwarf ref?).

#268 ::: Adrian Smith ::: (view all by) ::: August 23, 2008, 10:34 PM:

Alas, no. Anilingus ref.

#269 ::: Terry Karney ::: (view all by) ::: August 24, 2008, 01:01 AM:

miriam beetle: #232 "heat-seeking moisture-missle" is probably as close as any I've heard actually comes, but I don't think I've heard it in anything other than a jokey/hokey way. I think your point stands.

#270 ::: Terry Karney ::: (view all by) ::: August 24, 2008, 02:28 AM:

miriam beetle: #232 "heat-seeking moisture-missle" is probably as close as any I've heard actually comes, but I don't think I've heard it in anything other than a jokey/hokey way. I think your point stands.

#271 ::: Clifton Royston ::: (view all by) ::: August 24, 2008, 03:15 AM:

Another practical suggestion:

Locate and run a very large English vocabulary + surname list + placename list through the stemmer to generate the stemmed versions, and see what false positive matches you might get from each of your various proposed block list entries.

#272 ::: Rob Rusick ::: (view all by) ::: August 24, 2008, 07:13 AM:

Adrian Smith @268: My (I guess overly cryptic and poorly constructed) comment was to ask if (or suggest that) the name of the character Rimmer in Red Dwarf was derived from 'rimming'.

Thanks, though, for the definition. This thread is a real vocabulary builder.

#273 ::: Adrian Smith ::: (view all by) ::: August 24, 2008, 07:41 AM:

My (I guess overly cryptic and poorly constructed) comment was to ask if (or suggest that) the name of the character Rimmer in Red Dwarf was derived from 'rimming'.

I presume so, it's not the kind of thing the show's writers would have been unaware of considering the overall level of innuendo on display.

#274 ::: FrancisT ::: (view all by) ::: August 24, 2008, 08:36 AM:

Some more UK words.

Spunk (== Cum for our ex-colonial friends)

Pikey, Gyppo (== member of the 'traveling community')

Ultimately I think this could be destined for Epic Fail but the comment thread has been fascinating

Do not put Fuk on the list unless you specifically eliminate Fuku* since lots of Japanese names have Fuku in them (it means lucky/fortunate)

#275 ::: abi ::: (view all by) ::: August 24, 2008, 09:45 AM:

FrancisT @274 (inter alia):

Ultimately I think this could be destined for Epic Fail

Yeah, Orville, it'll never fly.

Still, it might be worth, before tearing my hair and scattering ashes atop my head, pointing out the consequences of a false positive.

If I tag a book with a term that triggers the blacklist filter (let's use "fuck" for this example), I don't get any feedback to that effect. I click "save" and the tag is saved. I can then look at my tag cloud and select all the books I've tagged "fuck" and browse through them. My entered content is always available to me, unaltered, unfiltered.

All that happens is that people will not see my book on their results list when they perform a catalog search for "fuck". If they look at the book, my tag will not be among those displayed. That is it. And if "fuck" later becomes a non-blacklisted term, then within a day everyone can search on "fuck" and find my tag.

In other words, anything you enter, you can use. The only impact of the blacklist is to suppress the entry from public visibility.

I do think I can use the feedback (edge cases, additional words, suggestions of terms to remove) to construct a blacklist that will block a lot of offensive content without requiring a moderator to review every tag entered. I think I can tune it to give us a very low level of false positives as well.

But we are doing this for good and sensible reasons, it is better than any alternative I can think of for our particular circumstances, and the world will not end as a result.

(Yes, I am a little tired of the "cannae nae dae thats" from people on a site whose comment threads are right this minute protected by a blacklist-based approach. Sorry. I am human.)

#276 ::: Paula Lieberman ::: (view all by) ::: August 24, 2008, 10:06 AM:

Lipshitz conditions... Lipshitz is a Germanic language family name and there are mathematical constructs that one of them created....

From googling...

"Mathematical Methods for Scientists and Engineers: Linear and ... - Google Books Resultby Peter B. Kahn, Kahn - 1996 - Science - 496 pages
The mean value theorem says that if df/dx is uniformly bounded in D. f satisfies a Lipshitz condition in D. In this chapter, we limit our discussion to..."

#277 ::: xeger ::: (view all by) ::: August 24, 2008, 11:01 AM:

Peter Erwin@252 ... (I was going to mention the "diagonal cutters" term in my previous comment, but decided it was too obscure -- clearly I was wrong!)

Indeed :) I spent a goodly part of last night running around the house trying to find my dykes (and nippers).

#278 ::: debcha ::: (view all by) ::: August 24, 2008, 11:04 AM:

Sherrold, #94:...24. Kobe Bryant is one mean and gifted fatherfucker. Does that work for you?

I walked into an unknown bar on Pike st, heard this being read aloud (loudly), and knew I'd found a keeper.

I just moved to Seattle (for a year, on sabbatical) - what bar on Pike was this?

#279 ::: Serge ::: (view all by) ::: August 24, 2008, 11:11 AM:

abi @ 275... I for one find silly the idea that you'd participate in the creation of a tool of censorship.

#280 ::: John Stanning ::: (view all by) ::: August 24, 2008, 11:29 AM:

Abi #275 (Yes, I am a little tired of the "cannae nae dae thats" from people on a site whose comment threads are right this minute protected by a blacklist-based approach. Sorry. I am human.)

Whee. If our comments are protected by a blacklist that yet allows all the words that have been posted in this thread... what on earth (or off it) is on the blacklist?

#281 ::: Terry Karney ::: (view all by) ::: August 24, 2008, 11:35 AM:

Whee. If our comments are protected by a blacklist that yet allows all the words that have been posted in this thread... what on earth (or off it) is on the blacklist?

Muahhaahah.... you'll never know. :)

#282 ::: Serge ::: (view all by) ::: August 24, 2008, 12:03 PM:

John Stanning @ 280... what on earth (or off it) is on the blacklist?

"Anall nathrach, oorfas bethud, dorhiel dienvay"

#283 ::: Ursula L ::: (view all by) ::: August 24, 2008, 12:45 PM:

Is there any way you could incorporate some sort of spell checker into the tagging system? A problem with any user-generated tagging is often that some users misspell words, so that you wind up with tags that are intended to be the same, but where misspellings in the tag means that it doesn't show up when checked.

Something that, when tags are submitted, it checks, and then just prompts users to either accept tags that aren't in the spelling dictionary, or choose from suggestions (so it doesn't have to match, but so that they get suggestions if it doesn't, and can choose between their word and known words of similar spelling) would probably make the system much more useful, long term.

This would also catch people trying to misspell words to get around the program, without putting in a list of every possible misspelling.

On the extreme, you might have only words in the spelling dictionary as public tags, with a way for users to submit new words for approval to the dictionary, so things like names get added. Which would probably satisfy even the most paranoid public library.

#284 ::: abi ::: (view all by) ::: August 24, 2008, 12:50 PM:

Ursula @283:
We tend to do the spelling variant work at the search end of things, scanning the database not only for the entered term but also for plausible misspellings. We have ways of offering those results, either as an alternative search or as the classic "Did you mean?" link when there are too few results for the search as typed.

Might be interesting to include the spell checking module in the tag processing as well, though. Hmmm.

#285 ::: James D. Macdonald ::: (view all by) ::: August 24, 2008, 02:01 PM:

#280 John Stanning: Whee. If our comments are protected by a blacklist that yet allows all the words that have been posted in this thread... what on earth (or off it) is on the blacklist?

Just for example, there's "batterylaptoppower.com".

All that happens if you use that string is that a moderator looks at your post before releasing it.

#286 ::: Kevin Marks ::: (view all by) ::: August 24, 2008, 03:09 PM:

Allowing user tagging is an excellent way to generate lists of related words (synonym is a misnomer for me - there are very few pure synonyms, as the old thesaurus game of "how quickly can you get to 'homosexual' from an arbitrary word" shows).
Different people use different levels of specificity for labelling things, depending on their expertise and the fineness of distinctions they are used to making. Eleanor Rosch's prototype theory expresses this well. See David Weinberger's "Everything is Miscellaneous" for a splendid exposition of this.

#287 ::: FrancisT ::: (view all by) ::: August 24, 2008, 04:21 PM:

Abi,

In re #275 and others where you explain the usage

It seems to me that the basic aim is to stop the prudes / thin skinned from being offended by not seeing tags that less prudish/thicker skinned folks have used.

This is good BUT I think you should also have various "opt-in" levels for people who are also less likely to be offended.

In addition to the obvious general, but subjective, "profanity/obscenity/blasphemy/racist/X-phobic" classification of potentially offensive tags I think an obvious addtion that you don't seem to be doing is allow people who use a particular tag to see stuff that other people have tagged with the same tag. So if (for example) I list MILF as a tag for something then I'd like to see stuff tagged MILF by other users.

#288 ::: Bacchus ::: (view all by) ::: August 24, 2008, 04:55 PM:

Speaking as a semi-professional purveyor of porn, I take very mild issue with the inclusion of "MILF" (it's an acronym, never seen in small letters, and I'd say meaningless except in capitalized form) on the list.

That's because it's a highly-specialized term for a porn marketing category and (by subsequent pop-cultural adoption) the mothers featured therein. It's virtually impossible to use MILF as an epithet, as the word is utterly lacking in derogatory connotation. In its original sense, it was both plural (M for "mothers") and entirely, though lustfully, admiring. It has since been de-numbered, or perhaps singularized, in popular usage ("Did you see that MILF?", "Wow, check the MILFs at that table") but I have never once encountered its use in traditional cursing (you'll never hear "Dumb MILF!" or "Sorry MILF-licking dog!" or any constructions of that sort).

In fine, it's both more mild and less common than similar acronyms (see, e.g., the much more common and more derogatory military slang REMF, or the gaming term BFG, for big f-ing gun) that are not included in the core list. Although it's indisputably naughty, its presence on any core list of naughty words strikes me as anomalous.

#289 ::: Vicki ::: (view all by) ::: August 24, 2008, 05:48 PM:

James @ 285:

Might a person inquire as to whether the local blacklist contains anything that is neither an URL nor an email address?

#290 ::: James D. Macdonald ::: (view all by) ::: August 24, 2008, 06:01 PM:

Might a person inquire as to whether the local blacklist contains anything that is neither an URL nor an email address?

Indeed it does. For example, "[link=" is forbidden, as is "comment1". Not to mention the ever-popular "Of course, but what do you think about that?" (Google on the phrase to find out why.)

#291 ::: Adrian ::: (view all by) ::: August 24, 2008, 07:08 PM:

All these comments, and nobody has yet pointed out that "sex" is not profanity. I guess it's my turn. I've never heard anybody use it as an expletive. Several people have pointed out the false positives such as "Middlesex County," but many conversations will hit false positives with the word itself. Pregnant women speculate about the *bleep* of their babies before birth. Public health workers write about the political challenges of *bleep* ed, and how teen pregnancy rates change where nobody dares to talk about safer *bleep* in schools. You're talking about a library network, so how would you want people to index that book by Simone de Beauvoir? Would it need two entries, one for non-US systems, and US systems with lax obscenity settings, and _The Second *Bleep*_ in other parts of the US?

#292 ::: Terry Karney ::: (view all by) ::: August 24, 2008, 07:29 PM:

Adrian: I didn't think the list was about the profane, but about the offensive.

I don't think anyone here can deny there are those who (vocally, and with gusto) find references to sex offensive (how they feel about sex itself, I can't say).

So leaving the word as default shown in public searches of the type abi seems to be working on, is a recipe for complaint.

#293 ::: James D. Macdonald ::: (view all by) ::: August 24, 2008, 08:39 PM:

Bacchus #288: Although it's indisputably naughty, its presence on any core list of naughty words strikes me as anomalous.

But the list Abi's trying to come up with isn't one of spoken naughtiness, but rather of user-supplied tags for library materials.

As such, I can't think of many uses of the tag MILF on a work that isn't ... well, questionable.

This is leaving aside the question of whether user-supplied tags are a good idea, or whether a blacklist is a good idea. Abi says that both are going to happen, regardless of our philosophical positions on the wisdom or morality of either.

#294 ::: Xopher ::: (view all by) ::: August 24, 2008, 09:20 PM:

Jim 290: I googled. I don't get it. Seem to be a variety of different topics covered there, and many different websites.

Am I just slow?

#295 ::: Jim Macdonald ::: (view all by) ::: August 24, 2008, 10:59 PM:

Xopher -- did you go to those websites and look at the comments? That string is the entire comment (sometimes you'll find it five times in a row, each time from a different poster) with a link to a spam site from the poster's name.

It's all over the web in places with open comment threads where the moderators don't take much interest.

#296 ::: xeger ::: (view all by) ::: August 24, 2008, 11:01 PM:

Based on the comments here, I'd have to propose 'censorship' as a dirty word... ;)

#297 ::: Xopher ::: (view all by) ::: August 24, 2008, 11:04 PM:

Oh. Sorry, no I didn't. I just looked at the Google page. From there, it doesn't look so bad.

My stupid. Sorry.

#298 ::: Bacchus ::: (view all by) ::: August 24, 2008, 11:38 PM:

Jim Macdonald #293 -- I confess I did not fully apprehend the function of the list from abi's post. In truth I was keying on her "to express an excess of emotion, or to seek to produce an improper effect upon the unsuspecting reader" phrase. That sounded like an effort to list words (1) capable of use as exclamatory epithets or (2) notorious for their ability to shock or otherwise elicit strong reaction. Hence my argument -- which of course I put forth purely in a sense of fun -- that MILF qualifies on neither ground.

I do confess that I can't think of an appropriate reason to use the tag in the sort of user submission system abi is working on. There's certainly no *harm* to having it on the list. I guess I'm just missing how it's more "core list" worthy than any other acronym that incorporates the initial of a profane word.

#299 ::: Jakob ::: (view all by) ::: August 25, 2008, 08:46 AM:

In response to the list: profanity is the crutch of the inarticulate motherfucker.

On the other hand, carefully-applied profanity has an impact like nothing else.

#300 ::: Serge ::: (view all by) ::: August 25, 2008, 09:02 AM:

Meanwhile, on the poop deck...

#301 ::: R. M. Koske ::: (view all by) ::: August 25, 2008, 12:58 PM:

#258 - Paul A. -

Aha! Thank you.

#302 ::: cajunfj40 ::: (view all by) ::: August 25, 2008, 01:02 PM:

Maybe I have a slightly different devious slant to my mind, but I'm sitting here thinking of ways to break this system without hitting any of the proposed filter-words.

Things like tagging the entire list of books that were ever banned as "children's literature".

Tagging books by Ann Coulter or Rush Limbaugh as "liberal". (And the reverse, of course - tagging books by Al Franken as "conservative".)

Tagging books by Anais Nin as "french cooking".

The basic idea is the same as the one behind "Googlebombing", with the specific intent of "poisoning" other people's search results.

Think the "inappropriate" results sometimes found in Amazon.com and other site's "People who enjoyed this also liked X" function...

Or the old "My Tivo thinks I'm gay!" thing.

Maybe there's an entirely different part of the software that covers this, or perhaps the "user generated" search results aren't visible to the casual catalog searcher without additional button mashing.

I'm trying to point out ways of using the "user tag" system to find (via their angrily, if quietly, storming up to the Librarian's desk) those dear souls who may be Offended! to find books by that "horrid and vulgar" Al Franken suggested when they're searching for their "wholesome" Rush Limbaugh compendiums.

Not that I'd actually condone poisoning a Library database in this manner, but if I can think it, so can others with less restraint.

#303 ::: abi ::: (view all by) ::: August 25, 2008, 01:43 PM:

cajun @302:

That is the exploit that I first thought of. I'm not going to tell you about the discussion that ensued, I'm afraid. But I would point out that the base assumption is that there is no existing content in the library catalog that is unsuitable for viewing.

Bad data is not the same level of offense as bad words.

I've thought of a couple of others with other impacts as well, but I haven't found any options that lead to disastrous consequences.

Risk and impact assessment are part of my job, so this is something I've been musing about.

#304 ::: abi ::: (view all by) ::: August 25, 2008, 01:58 PM:

I've finished the first version of the blacklists, but I'm not in a position to reproduce it here right now. Sorry about that...

A couple of things I will say:

* "gay" is completely off all blacklists
* proper anatomical terms are all on the optional list, and not recommended for academic market
* a number of offensive words that constitute substrings of common ones have been omitted or made optional. If they become a problem, a librarian can add them
* more racial insults have been added. It's a sign of my privilege that I didn't know them; I am aware how lucky I am.
* words describing the content of books (eg "erotic" and "sex") have been downgraded in severity or removed

I tested the list against the population of tags already in our systems, which influenced my choices, particularly in assessing how common terms are in the data already entered. I'll be monitoring the situation as more tags accumulate, and advising librarians on how to proceed if innocuous tags are being blocked.

The list will evolve, in other words, both as a core list and in its instantiations in various customer libraries.

I know that the idea of such a list is either offensive or impractical in the view of some of the commenters here. But as long as there are trolls, this is a problem that has to be solved. This is one of a range of tools that, in my professional opinion, will address it in our system.

#305 ::: Terry Karney ::: (view all by) ::: August 25, 2008, 03:45 PM:

abi: For whatever assistance I was able to provide, you have my thanks.

#306 ::: abi ::: (view all by) ::: August 25, 2008, 04:42 PM:

Terry,

Boot's on the other foot. Thank you, and I hope that cough is better!

#307 ::: Serge ::: (view all by) ::: August 25, 2008, 04:52 PM:

abi @ 306... Boot's on the other foot

This reminds me of an article I once read in a newspaper's business section where someone was described as having both feet firmly on the ground but without his going overboard about it.

#308 ::: Terry Karney ::: (view all by) ::: August 25, 2008, 05:23 PM:

abi: Boot is on both feet. You got widespread, thoughtful feedback. We got to help design a cool thing.

Cough is better.

#309 ::: pedantic peasant ::: (view all by) ::: August 25, 2008, 05:25 PM:

Abi:

I've been away for a couple days, and haven't read the past 150 or so comments.

I've been hemming and hawing over this: I know you said draft one is done, and this is more likely used as a search term than extreme conversation, but you may wish to consider BDSM and D/s as candidates.

#311 ::: Erik Nelson ::: (view all by) ::: August 25, 2008, 08:50 PM:

misunderstood #309 momentarily:

"but you may wish to consider BDSM and D/s as candidates."

to mean, if you were running for public office, these are things you shouhould consider doing.

#312 ::: lorax ::: (view all by) ::: August 25, 2008, 08:54 PM:

Bad data is not the same level of offense as bad words.

Am I the only one who had the immediate reaction of "no, it's worse?"

#313 ::: Serge ::: (view all by) ::: August 25, 2008, 09:13 PM:

Bad Data?
You mean his brother Lore?

#314 ::: Michael I ::: (view all by) ::: August 26, 2008, 07:52 AM:

Serge@313

No, Bad Data is actually the brother of Bad Horse. There's a bit of sibling rivalry going on with Bad Horse all famous and everything (what with being the leader of the Evil League of Evil) while Bad Data isn't as widely known. However, Bad Data takes comfort in the thought that, being more subtle, he can sometimes do even more damage...

#315 ::: Joel Polowin ::: (view all by) ::: August 26, 2008, 09:23 AM:

"but you may wish to consider BDSM and D/s as candidates."

What, instead of McCain? Or as his running mate?

#316 ::: cajunfj40 ::: (view all by) ::: August 26, 2008, 09:44 AM:

Abi @#303:
That is the exploit that I first thought of.

Good. I didn't think it very likely that anyone who posts here would miss that, but I'd rather be told my idea is redundant than let a potential "troll exploit hole" go.

I'm not going to tell you about the discussion that ensued, I'm afraid. But I would point out that the base assumption is that there is no existing content in the library catalog that is unsuitable for viewing.

I have to say thank you for that. IMHO, that "base assumption" should be the case for every library on the planet - and not on the basis of limiting what is in the library to begin with.

It makes me feel better when I know people who care are the ones working on these sorts of tools, where they are sometimes (sadly) needed.

#317 ::: Ursula L ::: (view all by) ::: August 26, 2008, 09:47 AM:

Thinking more on the idea of running a spell check on the creation of tags. This could be set up so that if someone creates a tag that is not in the spelling dictionary, the tag gets flagged to the librarian. The librarian could then review the tags, and either add the tag to the spelling dictionary, correct the spelling if the patron has misspelled the word, or delete/set to "this patron only" if the patron has misspelled a forbidden word to get around the blacklist.

Also, if a patron does a string of unacceptable tags, is there a way to set all of the tags this person creates to "this patron only"? That way, if someone was trying to do the googlebomb style hack, or maslabel things for the fun of it, they could be stopped without reviewing every single tag. Let them use the tags for their own purposes, but not have the fun of the "lets mess up the system for everyone" game.

And is there a way for other patrons to flag a tag for librarian review, if they come across something they consider a problem? Given the large proportion of patrons to librarians, odds are that patrons will stumble across some problems before librarians do, and a painless way for them to report, and the librarian to receive the report, would keep things running more smoothly. (I'm thinking of the same effect as people here reporting when they find spam by putting "sees spam in thread X" in their name in a comment. Lets the mods be in more places at once.)

Is there a way to set it so words in the official catalog description/tags are exempt from the list, just for that entry? So if a book is about a person or place with a name with "f#ck" as part of it, the name could be a tag just for that entry, but you could still use "f#ck" as a blocked word when part of a larger word, in other entries.

#318 ::: Serge ::: (view all by) ::: August 26, 2008, 09:49 AM:

Michael I @ 314... Does Mister Ed know about Bad Horse?

("Wilburrrrrr!!!")

#319 ::: abi ::: (view all by) ::: August 26, 2008, 11:33 AM:

Ursula @317:
Interesting set of ideas.

1. That's not something we can do at present. Really, you're talking a very large whitelist rather than a very small blacklist. Although it looks good, and would stop some spelling problems that this lets through, I think that it would introduce a lot of unforseen problems. It would also make a heckuva lot of work for a librarian.

2. Yes, we can suppress a user and all his works. And that's precisely what it does: keep their content private to them, exclude it from the common pool of data. This is the precise reason we did this.

3. We do not yet have a "flag this" function, but it's on our list of possible future features. At the moment, our tag population is quite small (even if our customers seed their database with the Library Thing tags, which is an option). But as it grows, we will need the ability to crowdsource moderation as well as tagging.

4. That would be a complex and time-consuming piece of functionality. Considering that one of the things that we want are tags that do not match the catalog data, so that we're broadening the range of searchable terms for an item, this might not be enough use for the time and effort it would require.

But all very good and interesting ideas. Would you like virtual strawberry tart, or virtual chocolate cake?

#320 ::: Sandy B. ::: (view all by) ::: August 26, 2008, 12:12 PM:

171,258: I was briefly mystified... then went "Spear catcher?"

#321 ::: Xopher ::: (view all by) ::: August 26, 2008, 12:56 PM:

abi 319: 2. Yes, we can suppress a user and all his works.

Is it called the Abrenuntio button?

#322 ::: John Stanning ::: (view all by) ::: August 26, 2008, 01:11 PM:

excommunicatio ad evitanda scandala

#323 ::: abi ::: (view all by) ::: August 26, 2008, 02:43 PM:

Xopher @321:
Is it called the Abrenuntio button?

Sadly, my Dutch studies have gotten in the way of my long-term goal of producing a Latin version of the software.

We have Dutch, French, English and Spanish versions, but I contend that we're not truly international till we have Latin. My bosses concur, in the kind of weak-nodding and backing out of the room way that I know signals their wholehearted admiration of my brilliance.

#324 ::: Individ-ewe-al ::: (view all by) ::: August 27, 2008, 11:27 AM:

abi, you've convinced me that this particular blacklist is going to be a force for good, even if my immediate reaction is that blacklists just cause bugs. (Anyone remember when LiveJournal tried to block searches for users interested in depression?)

Berk is half my surname. This led to some annoying incidents in primary school, but I think it's more annoying when I can't enter my own name into a system because it's considered offensive! So I have much sympathy for all the people with Porn or Fuk or Wang in their names. But for abi's application I think blacklisting it wouldn't be a disater; as long as the list didn't ban Berkshire itself (or Berkeley, for that matter), the only time you'd really need it for a tag would be in searching for a particular theorem in medical statistics, which is pretty obscure.

#325 ::: Ken Brown ::: (view all by) ::: August 28, 2008, 10:38 AM:

The well-known writer Anthony Trollope was the son of the almost-as-well-known writer Fanny Trollope.

I think both words meant what they mean now. Some people just have silly names. Either way "fanny" isn't really unusable in Britain, that's just a joke we tell Americans.

"Shit" and "piss" and "bugger" aren't that strong either.

#326 ::: Xopher ::: (view all by) ::: August 28, 2008, 12:49 PM:

Ken 325: That's why a group of prostitutes is called an essay of trollops.

#327 ::: Tom Reynolds ::: (view all by) ::: August 28, 2008, 08:29 PM:

Hmmm, where I come from cunt is not only a form of punctuation, but also a term of endearment.

Methinks the UK lot are somewhat less worried about such language as our colonial cousins...

#328 ::: takuan ::: (view all by) ::: February 20, 2009, 05:20 AM:

A question of local custom: how do you kill spammers around these parts?

#329 ::: David Goldfarb ::: (view all by) ::: February 20, 2009, 05:30 AM:

As above: post a comment with something like "sees spam" added to it (that way, in the list of recent comments it shows up as "Paul Duncanson sees spam on The honor...") -- ideally, make the comment such that if the spam is deleted it'll be clear that you're not referring to the post immediately above. One of the moderators will come along in a little while and zap the offending comment.

(I tend to be a night owl, in an earlier time zone than most of the regulars here, so I don't get to participate in the threads while they're seeing lots of activity; but I do sometimes see spam before anyone else does.)

#330 ::: abi ::: (view all by) ::: February 20, 2009, 05:36 AM:

Moderators then come wandering by, look at the last 1000 comment list for unusual activity, do a word search on "spam", and tidy up. (This is why we like people to report the spam in their usernames, as Paul Duncanson has done.)

Sometimes we delete, sometimes we publish the IP address. Depends on mood, really.

#331 ::: Serge ::: (view all by) ::: February 20, 2009, 06:14 AM:

David Goldfarb @ 331... "I tend to be a night owl."

As for myself, the hours I keep (and the minutes I go minus, thus the sleep deficit), combined with the timezone differences, have had some of the people here wonder if I ever sleep.

#332 ::: Terry Karney ::: (view all by) ::: February 20, 2009, 09:23 AM:

I suspect my odd times of comment probably have the same effect.

#333 ::: Elliott Mason ::: (view all by) ::: September 06, 2009, 12:56 PM:

ajay @166 said:"Viewers are warned that this programme deals with adult themes - specifically cynicism, exhaustion, compromise, the gradual slide towards irrelevant senility, and monthly mortgage payments". / Sex? Drugs? Partial nudity? Swearing? Violence? Those are teenage themes.

Along those lines, see also this thoughtful essay on trends in Summer Blockbuster Movies, with special attention paid to the recent Star Trek reboot film.

#334 ::: Elliott Mason ::: (view all by) ::: September 06, 2009, 01:30 PM:

Another entry in the 'naughty playground rhymes' category ... this one was what we called a 'handplay' in the early/mid-Eighties in Chicago, because it went with a handslap game/pattern/thing, like Pattycake but more complicated.

The first verse and some of the chorus are identical to a pop song from long before my time; I presume the rest has accreted to it over the years. It also includes, I'm sure, lot of eggcorns. In the James Bond verse, we often, as Catholic schoolchildren, automatically said BLEEP instead of the 'swears' in it, which I find hilarious in retrospect.

"Rockin' Robin"; specimen verses.

She's a rockin' in the kitchen, all night long,
Huffin' and a puffin' and a singin' that song.
All the little birds on Jaybird* Street
Love to hear the robins going tweet tweet tweet,
Rockin' Robin.

I went downstairs to get my gun
Made a mistake and stabbed my son.
Went upstairs to get my knife
Made a mistake and shot my wife,

I went to the kitchen to get a stick of butter
Saw James Bond sittin' in the gutter.
Got a piece of glass, shoved it up his ass,
Never saw a motherfucker run so fast!

Mama's in the kitchen, burnin' that rice;
Daddy's on the corner, shootin' them dice.
Brother's in jail, just raisin' his hair&Dagger;
Sister's on the corner sellin' fruit cocktail!

* sometimes Damen, a local arterial route
This verse had hand-gestures to go with each action; they were to be performed amidst the rest of the rhythmic claps and slaps, without breaking pace.
&Dagger Probably originally 'raising hell,' I would guess. But the 'raising hair' gesture was fun.

Choose:
Smaller type (our default)
Larger type
Even larger type, with serifs

Dire legal notice
Making Light copyright 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020 by Patrick & Teresa Nielsen Hayden. All rights reserved.