Back to previous post: Equality? What’s That?

Go to Making Light's front page.

Forward to next post: Wasting just a bit of my goddamn life

Subscribe (via RSS) to this post's comment thread. (What does this mean? Here's a quick introduction.)

April 10, 2012

The Morning’s Best Comment Sp4m
Posted by Jim Macdonald at 09:31 AM *

Hot from the moderation queue:

Ive to say, I dont know if its the clashing colours or the poor grammar, but this weblog is hideous! I mean, I dont wish to sound like a know-it-all or anything, but could you have possibly put just a little bit far more effort into this subject. Its truly interesting, but you dont represent it nicely at all, man.

Note the iron law of the Internet: Any grammar-flame (even a computer-generated one) must have at least one grammatical error.

What tripped up this robot and dumped its spew into the moderation queue? Using a common contraction with no apostrophe. (We get anything from dozens to hundreds a day with that marker.)

Oh, and the hideous (nevertheless truly interesting) post with the terrible grammar? Jon Singer’s turkey algorithm

SEO delenda est!

Comments on The Morning's Best Comment Sp4m:
#1 ::: David Goldfarb ::: (view all by) ::: April 10, 2012, 10:10 AM:

That one actually brings up a rule of English grammar that is interesting to me and somewhat little-discussed, that I've seen: namely, that the various forms of "have" can only be contracted when they are auxiliary forms -- you can say "I have to say" but not "I've to say". (This is complicated a bit by the existence of the phrasal verb "have got", which I gather is somewhat of an Americanism.)

#2 ::: Megpie71 ::: (view all by) ::: April 10, 2012, 10:40 AM:

If the text quoted there is a direct cut and paste, they have not used an apostrophe anywhere. Which probably gives a knowledgeable computer geek at least a passing guess at which languages their spambot may have been coded up in (I shall put a couple of dollars on poorly-written C, thanks).

Now I am wondering whether the next generation of spambots is going to be using a more formal style of English, in order to avoid the use of possessives and contractions. It would be fun to watch.

#3 ::: Fragano Ledgister ::: (view all by) ::: April 10, 2012, 10:43 AM:

Jim, you do realise that the Gnomes would characterise George Bernard Shaw as a constant spammer?

#4 ::: John Mark Ockerbloom ::: (view all by) ::: April 10, 2012, 10:53 AM:

The book submission form for my free online books directory not infrequently gets hit by stray comment spammers. (It doesn't do any good for them; no submissions are visible at all till I approve them, and spam either gets auto-deleted or manually nuked before it ever gets anywhere near my live catalog.)

Sometimes the spammers' tests misfire, and I see a little bit of the code underlying the spam. Sometimes there's an obvious variable placeholder in the submission. Recently I saw a comment attempt with alternatives in braces, saying something like "This was a {fascinating|interesting} post! I'll recommend it to my {friends|colleagues|parole officer}!" It looks like it's an lame attempt to resist identifying spam by spotting repeats of the exact same phrases, but I suspect it doesn't actually work very well.

#5 ::: Kip W ::: (view all by) ::: April 10, 2012, 11:01 AM:

Tonstant Spammer aint tawkin.

#6 ::: Jim Macdonald ::: (view all by) ::: April 10, 2012, 11:10 AM:

Megpie71 #2: I think that the lack of apostrophes (and/or lack of quotemarks) provide some very good clues as to what they're using for delimiters.

Fragano Ledgister #3: If George Bernard Shaw were a frequent commenter here, I'd have to find a new filter.

John Mark Ockerbloom #4 :

I think we have just a bit of the mis-firing mad-lib in the quoted example:

just a little bit far more effort

Just a little bit or far more? Make up your mind, would you?

I too see the mis-fired mad-lib spam with the hints of what lies beneath. My own filters also use multiple-choice matches. E.g.:

/a (useful|informative|helpful|educational|benificial|beneficial) (and|along with) (funny|interesting|amusing) (publication|write.?up|essay|post|article|submitting|submission|script)/i

The language is PERL, and it's trivially easy to create such a filter using Notepad and Google. The spammers themselves provide all the material I need (particularly when we get twenty or thirty nearly-identical posts within a couple of minutes, all from different apparent IP numbers, all with different e-mail addresses and supposed posters' names, but all advertising the same site).

#7 ::: James ::: (view all by) ::: April 10, 2012, 11:37 AM:

More likely perl than C, since single quotes aren't string delimiters in C. Lazy perl, because you can just escape the apostrophes with backslashes.

#8 ::: Serge Broom ::: (view all by) ::: April 10, 2012, 11:42 AM:

Fragano @ 3...

GEORGE BERNARD SHAW: Right. Your Majesty is like a dose of clap.
THE PRINCE OF WALES: What?!?
GEORGE BERNARD SHAW: Before you arrive is pleasure, but after is a pain in the dong.

#9 ::: Christopher Wright ::: (view all by) ::: April 10, 2012, 11:50 AM:

For a while I was starting to suspect that the spambots were actually using some kind of data-sorting algorithm to match their messages with sites carrying content that made it easier for the spam to hide. In other words, a "how to" website would get more "thanks information useful, will bookmark" posts than one about the color choices used in the upholstery of 1964 convertibles.

Now that this particular piece spam has been directed at a site run in part by a freaking editor for Tor Books, I'm going to quietly shelve that theory for a while...

#10 ::: abi ::: (view all by) ::: April 10, 2012, 01:28 PM:

Among the spam that makes any kind of effort to look like ordinary comment traffic, we tend to run about 60% flattery (like John Mark Ockerbloom's formula above) and 30% pretend-relevant comments (often purporting to address a particular comment number or commenter).

The remaining 10%, including the specimen that started this off, seem have spent too much time in the Tubes with pick-up artist web traffic and think The Neg is a really good idea. And I do know moderators who are slower to delete negative comments because it makes them feel thin-skinned.

Not here, though.

#11 ::: Daniel Martin ::: (view all by) ::: April 10, 2012, 01:43 PM:

It's probably in violation or their terms of use, but I couldn't help but post a copy of that to the contact form on

w w w . p o s t - c o m m e n t s . c o m

#12 ::: Xopher HalfTongue ::: (view all by) ::: April 10, 2012, 03:40 PM:

David 1: I'm not sure that same rule exists in UK English. "I've a dozen roses for you" is a sentence of a type I've seen in UK English, and 'have' is definitely being used substantively there.

It's definitely a rule in my dialect, though. The difference between "I'm gonna vote" and "I'm going to vote" is that the former can only mean "I fully intend to vote," while the latter can mean either that or "I'm on my way to the polling place now."

I mention this because it was the sentence I uttered that first made me notice the rule. The crossing guard at my corner saw me crossing the street in a different direction than usual and asked what was up. (Yes, I am a creature of such consistent habits that it quite startles people when I deviate...seriously, we were on a "good morning, nice day" basis for months before this, so it's not quite as strange as it sounds.)

#13 ::: Fragano Ledgister ::: (view all by) ::: April 10, 2012, 06:18 PM:

Serge #8:

GBS would have correctly address the Prince of Wales (either of the two in his lifetime) as "Your Royal Highness".

#14 ::: Fragano Ledgister ::: (view all by) ::: April 10, 2012, 06:19 PM:

Jim Macdonald #6: When do we get the time machine?

#15 ::: Q. Pheevr ::: (view all by) ::: April 10, 2012, 09:21 PM:

So spammers are wont
to omit apostrophes
in writing their cant?

#16 ::: Laura Gillian ::: (view all by) ::: April 10, 2012, 09:23 PM:

David #1, Xopher HalfTongue #12:

In my dialect, I only contract the auxiliary form of 'have' as well. I have heard and read the pattern Xopher pointed out many times, but I don't use it myself.

I wonder if the distinction to be made in this example is that in "I have to say...", 'have' belongs to the phrasal verb 'have to', which acts like a single unit and makes the 'have' unavailable to combine with the 'I'?

#17 ::: Mary Aileen ::: (view all by) ::: April 10, 2012, 10:13 PM:

Laura Gillian (16): My brain is making a connection between your 'have to' and David Goldfarb's 'have got', in terms of grammatical usage and (lack of) contraction, but I can't quite tease it out.

#18 ::: Laura Gillian ::: (view all by) ::: April 10, 2012, 10:50 PM:

Mary Aileen #17:

For me, 'have got' is different, in that it sounds perfectly natural to say, "I've got $5" or "He's got a point" but totally wrong to say, "I've to wash the dishes". I can't think of any examples in which 'have to' can be contracted like that, can you? I think there's a pattern there somewhere, but I can't quite grasp it either.

#19 ::: Terry Karney ::: (view all by) ::: April 10, 2012, 11:03 PM:

"I've too many things to do right now."

#20 ::: Terry Karney ::: (view all by) ::: April 10, 2012, 11:03 PM:

Or, "I've too much on my plate,", etc.

#21 ::: Bruce E. Durocher II ::: (view all by) ::: April 10, 2012, 11:24 PM:

Jim Macdonald: If George Bernard Shaw were a frequent commenter here, I'd have to find a new filter.

If William McGonagall were a frequent commenter here, I cannot imagine the type of filter needed.

#22 ::: Xopher HalfTongue ::: (view all by) ::: April 10, 2012, 11:56 PM:

'Have to' meaning 'must' is not the same verb as either auxiliary 'have' or 'have' indicating possession. Homophonous, homographic, not the same.

#23 ::: thomas ::: (view all by) ::: April 10, 2012, 11:58 PM:

Oh, thou demon Spam, thou fell destroyer;
Thou curse of society, and its greatest annoyer.
What hast thou done to society, both lion and lamb?
I answer thou hast caused the most of ills, thou demon Spam.

#24 ::: Laura Gillian ::: (view all by) ::: April 11, 2012, 01:46 AM:

Terry Karney #19, 20:

I would classify both of those sentences under the type Xopher mentioned in his comment #12. 'Have' indicating possession.

Xopher HalfTongue #22:

Thank you for clearly saying in very few words what I was groping for. I knew there must be a better way to describe what I meant. And in my dialect, only the auxiliary kind of 'have' can be contracted.

#25 ::: David Harmon ::: (view all by) ::: April 11, 2012, 06:26 AM:

Terry Karney #19, 20: Also, you're abusing a homophone: "too" and "to" don't even match grammatically.

#26 ::: eric ::: (view all by) ::: April 12, 2012, 02:23 AM:

And cleaning out the spam queue here:

Absolutely written articles, thankyou for entropy


Entropy I can provide. In fact, I can hardly prevent it.

#27 ::: Allan Beatty ::: (view all by) ::: April 12, 2012, 09:02 PM:

For reasons unrelated to spam, I have to say that comma-separated value files are the work of the devil. Tab-delimited files are somewhat better except when emitted by Microsoft® Excel™.

#28 ::: Jim Macdonald ::: (view all by) ::: April 14, 2012, 09:25 AM:

From this morning's spam-kill, here we see someone trying to escape the apostrophes ... inexpertly.

I downloaded and started using Mint over the weekend. I absolutely love it. First, I no longer have to my banks\' website for my transaction and balance. Second, I\'m able to also link (and sync) all my student loans along with car loan onto the app! Third, I love the pie chart breakdown of my expenditures. Who knew I ate out so much?! Apparently, Mint does
#29 ::: Bill Stewart ::: (view all by) ::: April 14, 2012, 12:39 PM:

Like the gazebo, you can't escape the apostrophe. And who knew that one of the popular Linux distributions would handle your bank account for you!

#30 ::: Jim Macdonald ::: (view all by) ::: April 18, 2012, 12:51 PM:

Join us now as Yog Builds a Spam Filter.

This morning a spam got through. The link was borked, else it would have been caught anyway, but the text was a typical mad-lib style comment spam:

I'm extremely pleased to uncover this website. I wanted to thank you for your time, and for this excellent study - I definitely appreciated every single bit of it, and already have you book marked to check out new stuff.

How I proceed to create a filter:

First, take a sentence from the spam. I chose this one:

I definitely appreciated every single bit of it, and already have you book marked to check out new stuff.

Now, cut that in half, and Google on half of it. The reason for Googling on half is this: When you search, Google will show you the whole sentence. I chose this bit:

"and already have you book marked to check out new stuff."

Note the quote marks. They're important.

This got me over a hundred thousand hits.

I then went though the first two pages of the Google results, copying the first half of the sentence into Notepad. Here's what I got:

I definitely savored every part of it
I definitely loved every bit of it
I definitely enjoyed every little bit of it
I definitely loved every bit of it
I definitely enjoying every tiny amount of it
I absolutely enjoying every tiny amount of it
I absolutely loved every little bit of it
I definitely appreciated every part of it
I definitely savored every part of it
I certainly enjoyed every bit of it
I definitely appreciated every part of it
I definitely appreciated every bit of it
I without a doubt really enjoyed just about every bit of it
I definitely really liked every part of it
I definitely enjoyed every part of it
I definitely really liked every bit of it
I definitely really liked every bit of it
I definitely appreciated every part of it
I definitely really liked every bit of it
I certainly loved every bit of it
I definitely loved every little bit of it
I definitely enjoying every little bit of it


Okay, I'm sure you can see where I'm going with this.

I imported that block into WordPerfect (though I'm certain other word processors have similar functions, that's the one I'm familiar with), and sorted it by first word of each line.

Every one begins with "I", so I start to construct my filter with the word "I."

I put .I at the top of the page. I started the filter-in-progress with a period, so that when I sort, it'll get sorted to the top.

Now I use Search and Replace to remove all the leading Is.

This is what I now see:

.I

definitely loved every bit of it
definitely enjoyed every little bit of it
definitely loved every bit of it
definitely enjoying every tiny amount of it
absolutely enjoying every tiny amount of it
absolutely loved every little bit of it
definitely appreciated every part of it
definitely savored every part of it
certainly enjoyed every bit of it
definitely appreciated every part of it
definitely appreciated every bit of it
without a doubt really enjoyed just about every bit of it
definitely really liked every part of it
definitely enjoyed every part of it
definitely really liked every bit of it
definitely really liked every bit of it
definitely appreciated every part of it
definitely really liked every bit of it
certainly loved every bit of it
definitely loved every little bit of it
definitely enjoying every little bit of it
definitely savored every part of it

Sort on first word, and:


.I

absolutely loved every little bit of it
absolutely enjoying every tiny amount of it
certainly enjoyed every bit of it
certainly loved every bit of it
definitely loved every bit of it
definitely enjoyed every little bit of it
definitely appreciated every part of it
definitely savored every part of it
definitely loved every bit of it
definitely appreciated every part of it
definitely appreciated every bit of it
definitely enjoying every little bit of it
definitely really liked every part of it
definitely enjoyed every part of it
definitely really liked every bit of it
definitely really liked every bit of it
definitely appreciated every part of it
definitely really liked every bit of it
definitely enjoying every tiny amount of it
definitely loved every little bit of it
definitely savored every part of it
without a doubt really enjoyed just about every bit of it


Great! The way to make a mask in PERL where any one of several possible terms match is to enclose 'em in parentheses, with the terms separated by vertical lines:

Thus, the first part of the filter now reads:

.I (absolutely|certainly|definitely|without at doubt)

I remove those words from my block-o-text, and re-sort to get:

.I (absolutely|certainly|definitely|without at doubt)

appreciated every part of it
appreciated every bit of it
appreciated every part of it
appreciated every part of it
enjoyed every little bit of it
enjoyed every bit of it
enjoyed every part of it
enjoying every tiny amount of it
enjoying every tiny amount of it
enjoying every little bit of it
loved every bit of it
loved every bit of it
loved every little bit of it
loved every bit of it
loved every little bit of it
really liked every bit of it
really liked every bit of it
really liked every part of it
really enjoyed just about every bit of it
really liked every bit of it
savored every part of it
savored every part of it

As before, to get the next variable term:

.I (absolutely|certainly|definitely|without at doubt) (appreciated|enjoyed|enjoying|loved|really liked|really enjoyed just about|savored)

every part of it
every bit of it
every part of it
every part of it
every little bit of it
every bit of it
every part of it
every tiny amount of it
every tiny amount of it
every very little bit of it
every bit of it
every bit of it
every little bit of it
every bit of it
every little bit of it
every bit of it
every bit of it
every part of it
every bit of it
every bit of it
every part of it
every part of it

Once more with sorting, and we get:

.I (absolutely|certainly|definitely|without at doubt) (appreciated|enjoyed|enjoying|loved|really liked|really enjoyed just about|savored) (every|every little|every tiny|every very little)

amount of it
amount of it
bit of it
bit of it
bit of it
bit of it
bit of it
bit of it
bit of it
bit of it
bit of it
bit of it
bit of it
bit of it
bit of it
part of it
part of it
part of it
part of it
part of it
part of it
part of it

Similarly:

.I (absolutely|certainly|definitely|without at doubt) (appreciated|enjoyed|enjoying|loved|really liked|really enjoyed just about|savored) (every|every little|every tiny|every very little) (amount|bit|part) of it

Now to wrap it in forward slashes, add small-letter i so it'll ignore case, to get :

/I (absolutely|certainly|definitely|without at doubt) (appreciated|enjoyed|enjoying|loved|really liked|really enjoyed just about|savored) (every|every little|every tiny|every very little) (amount|bit|part) of it/i

And put it into the filters, and we're done.

#31 ::: SamChevre ::: (view all by) ::: April 18, 2012, 12:58 PM:

Thank you for the demonstration.

May I note that there is a typo in the filter; the text block has "without A doubt" but the filter has "without AT doubt".

#32 ::: Jim Macdonald ::: (view all by) ::: April 18, 2012, 01:02 PM:

Thanks. Fixed.

#33 ::: Jim Macdonald ::: (view all by) ::: April 20, 2012, 10:38 AM:

We're still getting variations on the spam from the OP:

I have to say, I dont know if its the clashing colours or the unhealthy grammar, but this blog is hideous! I imply, I dont want to sound like a know-it-all or something, however may you will have possibly put slightly bit more effort into this subject. Its really interesting, however you dont characterize it properly in any respect, man. Anyway, in my language, there aren't much good source like this.

You know my methods:

/or the (bad|dangerous|poor|unhealthy) grammar, (but|however) this (blog|weblog) is hideous\!/i

#34 ::: Jim Macdonald ::: (view all by) ::: April 21, 2012, 03:49 PM:

This morning's best, advertising a scum-sucking vanity press:

Yeah, sorry. A lot of SPAM is getting through our filters at the moment, but the controls for mass deletion in the new version of WP are taking some getting used to. Make a long story short, a certain amount of good comments are getting flushed with the bad. Well get it worked out soon, though, promise.

It was caught by two different filters before it hit the thread.

#35 ::: Jim Macdonald ::: (view all by) ::: June 22, 2012, 01:21 PM:

Moments ago, and stopped by the gnomes before it posted. They did not offer it tea, scones, and jam.

Do you have a spam problem on this blog; I also use Blog Engine, and I was wondering about your situation; we have developed some excellent practices and we would like to exchange thoughts with others, please Email me if interested. Personally built a log house in the woods on a stream with a waterfall in North Alabama.

I'm proud of the gnomes.

#36 ::: Jim Macdonald ::: (view all by) ::: July 09, 2012, 07:31 PM:

We've been getting multiple copies of this one all day:


Wow you website really good. Value I to see the hope can with. Very often I come and see you through any new story is my greatest happiness ha ha a little humor. Thank you oh

The gnomes have been given bonus chocolate for stopping the flood.

#37 ::: Xopher HalfTongue ::: (view all by) ::: July 09, 2012, 07:44 PM:

"Value I to see the hope can with." Sorted by some non-obvious criterion. Clearly syntax is absent. Or to be be not.

#38 ::: Jim Macdonald ::: (view all by) ::: July 11, 2012, 08:38 AM:

An outstanding example of ... something ... here:

Premature Ejaculation may be the lack of ejaculatory control and it really is the most common of all sexual problems in men. Since it's natural, you can use it freely without any risk of adverse effects.

Since it's natural, and therefore has no risk of adverse effects, I'm sure men will be lining up to buy a premature ejaculation or two. They'll all want to have frequent premature ejaculations, I'm sure.

(What this particular spam was advertising, BTW, was "Work from home.")

#39 ::: Jim Macdonald ::: (view all by) ::: July 11, 2012, 10:40 AM:

"Darling, that box from 'Work From Home' you've been waiting for has arrived."

"At last!"

[SFX: Paper crumpling, tape ripping]

"What is it?"

"A natural premature ejaculation!"

"Oh goodie! Can we try it out right now?"

#40 ::: Xopher HalfTongue ::: (view all by) ::: July 11, 2012, 04:42 PM:

Reminds me that in my local grocery store...you know the signs that tell you what's in each aisle? One of them says "incontinence."

I avoid that aisle whenever possible, even though I'm sure I could avoid purchasing any incontinence.

#41 ::: Jeremy Leader ::: (view all by) ::: July 11, 2012, 08:59 PM:

Jim, you might be amused to know that the cool kids these days (for some very particular values of "cool") consider "PERL" incorrect usage. They say the language is named "Perl", and the interpreter (the command you invoke from the command line, or the thing you might install a new version of) is named "perl". https://www.socialtext.net/perl5/perl_vs_perl gives some aesthetic reasons for avoiding the all-caps spelling.

I've seen this distinction suggested as a way of sorting programmers' resumes, much like some people advocate using the level of spelling errors in cover letters as a quick filter when reviewing a slush pile.

By the iron law of the internet, I assume this comment will contain a capitalization error...

#42 ::: Jim Macdonald ::: (view all by) ::: July 11, 2012, 10:00 PM:

Y'know, the day I'm submitting my resume for a programming job the entire infrastructure is so scrod....

#43 ::: David Wald ::: (view all by) ::: July 11, 2012, 10:15 PM:

Jim Macdonald@42: Y'know, the day I'm submitting my resume for a programming job the entire infrastructure is so scrod....

Well, only if you were doing it to get the job, rather than for some kind of literary research.

#44 ::: Jeremy Leader ::: (view all by) ::: July 12, 2012, 02:50 AM:

Jim Macdonald @42: Hence my choice of "amused" rather than "vitally concerned"!

#45 ::: Jim Macdonald ::: (view all by) ::: July 16, 2012, 10:31 AM:

Today's top of the catch:

I'm noneffervescent learning from you, but I'm trying to achieve my goals. I perfectly copulate measure all that is posted on your place.Fastness the tips coming. I enjoyed it!
#46 ::: abi ::: (view all by) ::: July 16, 2012, 11:12 AM:

I kinda liked this pair:

I guess this is a true majuscule article call.Thanks Again. Really Majuscule.

and

Hunt overbold to metropolis much. Majuscule blog article.More thanks again. Zealous.

Zealous. Really Majuscule. Yup.

#47 ::: Jim Macdonald ::: (view all by) ::: July 17, 2012, 05:22 PM:

Today:

Convey you for making this website so elementary to conceive entropy. worthy object. Protection this one for afterwards.
#48 ::: Jim Macdonald ::: (view all by) ::: July 19, 2012, 08:18 AM:

Today's best comment spam:

Man the spam here will drive me crazy! Get rid of it!

Working on it, chum. Working on it.

#49 ::: Jim Macdonald ::: (view all by) ::: August 07, 2012, 11:52 AM:

Just arrived in the filters:

Nice post, I saw your site has hardly any spam comments, makes the site look a lot cleaner, what are you doing to stop the spam taking over, thanks :-)

What am I doing? Writing filters that catch your stuff before it ever posts, my friend.

#50 ::: Jim Macdonald ::: (view all by) ::: August 23, 2012, 03:07 PM:

Just a public notice: If you link to twitter.com, in the body of the text, as your URL, or anywhere else, your post will be held for review.

We get anything from scores to hundreds of links to twitter in the comments every day, and 99.99% of them are spam.

#51 ::: Jim Macdonald ::: (view all by) ::: September 09, 2013, 09:31 AM:

Today:

This is the first time I bought a replica handbag from e-shop, and it impressed me so much. Who says replicas are all fakes?

Well, I do. And the gnomes say that "cheap", "duplicate", and "knock-off" are all fakes too.

Choose:
Smaller type (our default)
Larger type
Even larger type, with serifs

Dire legal notice
Making Light copyright 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017 by Patrick & Teresa Nielsen Hayden. All rights reserved.