Back to previous post: geek knitting

Go to Making Light's front page.

Forward to next post: Open thread 16

Subscribe (via RSS) to this post's comment thread. (What does this mean? Here's a quick introduction.)

January 12, 2004

Another spam attack
Posted by Teresa at 08:03 PM * 134 comments

Making Light and Electrolite are under attack again (see previous round), and it’s aggressive—over 50 hits in the last hour, from 21 different IP addresses. The format’s the same for every post: a porn URL, followed by some bit of text, usually from a computer manual.

Anyone else getting hit? What’s the scoop?

Addendum, 8:17 p.m.: Kip Manley, of Long Story Short Pier, got hit by these last night. He’s posted the IP address blacklist he compiled during the attack. Patrick is adding it to our defenses right now. If you think there’s any chance you’re going to be targeted, you might want to do the same; Kip Manley took 400 hits.

One of MT Blacklist’s options is to make your blacklist public. Here’s Kip Manley’s. Here’s ours.

If you have MT Blacklist installed but don’t know how to use these lists: Copy the list to your clipboard. Go to the main MT Blacklist screen. Right under the title there’s a little row of gray boxes. Click the one that says Add. This will take you to another screen that has a large empty box labeled Import blacklist. Paste the entire list into the box. Or paste in some fraction of it, if you know what you’re doing and don’t want the whole thing; but just pasting in the entire list is easiest. Finally, click the button underneath the box that says Import entries. That should do it.

Don’t worry about pasting in duplicate entries. MT Blacklist will automatically strip out any duplicates.

Addendum, 8:54 p.m.: A comment from Kip

Add it and add it now, is my advice. It takes forever (well, an hour, but it seemed like forever) to comb them out by hand—there were 50 or 60 URLs total, used over and over again.

I didn’t block the IP addresses—I think it’s the work of a lone [expletive inadequate], who’s munging IP addresses somehow. Instead, I blocked the URLs listed—all the usual fetishy suspects, badly spelled, lightyears away from a legitimate link.

My working theory (not that I know much about this stuff at all) is that somebody’s randomly generating names, email addies, IP numbers, lorem ipsum text, and URLs—most of the URLs don’t go anywhere, but are chaff, to delay and discourage you from cleaning it all off until Google has a chance to register the link to the one or two “real” sites buried in the onslaught. —So this will work until the next iteration of this spam bot. And then we’ll have a new list of fucked-up URLs we’ll have to add.

Just wait: the next wrinkle will be chaff that are legitimate URLs you like, culled from people’s blogrolls. —Though my heart is heavy at the idea of comments registration (to be available with MT 3.0), I’ll probably be leaping to upgrade and implement it.

Sigh.

What he said.

Welcome to Making Light's comments section. Moderator: Teresa Nielsen Hayden.

Comments on Another spam attack:

#1 ::: --kip ::: (view all by) ::: January 12, 2004, 08:47 PM:

Add it and add it now, is my advice. It takes forever (well, an hour, but it seemed like forever) to comb them out by hand--there were 50 or 60 URLs total, used over and over again.

I didn't block the IP addresses--I think it's the work of a lone [expletive inadequate], who's munging IP addresses somehow. Instead, I blocked the URLs listed--all the usual fetishy suspects, badly spelled, lightyears away from a legitimate link.

My working theory (not that I know much about this stuff at all) is that somebody's randomly generating names, email addies, IP numbers, ipsem lorem text, and URLs--most of the URLs don't go anywhere, but are chaff, to delay and discourage you from cleaning it all off until Google has a chance to register the link to the one or two "real" sites buried in the onslaught. --So this will work until the next iteration of this spam bot. And then we'll have a new list of fucked-up URLs we'll have to add.

Just wait: the next wrinkle will be chaff that are legitimate URLs you like, culled from people's blogrolls. --Though my heart is heavy at the idea of comments registration (to be available with MT 3.0), I'll probably be leaping to upgrade and implement it.

Sigh.

#2 ::: Jonah ::: (view all by) ::: January 12, 2004, 09:35 PM:

kip wrote:
>I didn't block the IP addresses--I think it's the work of a lone [expletive inadequate], who's munging IP addresses somehow.

I'm not an expert, but I do know a little bit about this stuff, and I don't think that's possible. To put it as simply as I can: in order to post here you must be able to send information to the server, and the server must be able to send information to you. In order for this to happen you (or your computer) must know the IP address of the server, and the server must know the IP address of your computer(1).

If the IP address you provide the server with is not the true IP address of your computer, any information the server tries to send to you, will never actually reach you. (Instead it will be delivered to the computer at the IP address you provided, if such a machine exists.) This would make opening an tcp connection with the server impossible, and opening an tcp connection is a necessary prerequisite to doing things like viewing a web page, or posting via a web form. (See http://www.grc.com/dos/drdos.htm for a nice graphic.)

(1)Ok, in practice this might actually be the address of your NAT router or proxy server, but the principle remains more or less the same (and if you don't know what a NAT router or a proxy server is, don't worry about it).

#3 ::: Rich McAllister ::: (view all by) ::: January 12, 2004, 09:48 PM:

While Jonah's right that it's really hard to fake IP addresses with TCP, there's apparently a whole little underground economy of people who break in to machines, set up various remote control/forwarding servers, and sell the IP addresses of the captured machines to spammers. The spammer then bounces all their junk through the unwitting victims. So in fact the IP addresses might be those of innocent dupes. Even if you figure they deserve to get blocked to encourage them to protect their machines better, blocking those IP addresses isn't going to help cut down the spam since the spammer will be using a whole new set of captured IPs tomorrow.

#4 ::: Patrick Nielsen Hayden ::: (view all by) ::: January 12, 2004, 09:49 PM:

Well, whoever was doing this was certainly contriving to make dozens of comments, posted over a very brief period, appear to come from a wide range of IP addresses. Over a half-hour, before I stopped writing them down, these are the IP addresses I logged the spams as coming from:

61.11.26.134 (3x)
63.226.96.246 (3x)
64.14.144.85 (2x)
64.139.64.146
64.144.205.138
66.46.189.98
66.50.123.43
80.58.40.42 (3x)
80.58.46.235 (2x)
80.58.49.170 (2x)
131.178.0.213
194.154.176.242 (2x)
195.184.123.154 (5x)
199.104.82.18 (3x)
200.11.203.100 (3x)
200.48.69.116 (2x)
200.252.72.9
202.155.109.117 (6x)
203.160.178.149
203.169.250.28 (2x)
205.204.242.23 (3x)
205.207.198.1 (3x)
210.197.88.11
211.89.253.100 (4x)
212.145.130.176
213.208.67.82 (2x)
217.40.125.218 (3x)
218.247.228.188 (6x)

#5 ::: Patrick Nielsen Hayden ::: (view all by) ::: January 12, 2004, 09:52 PM:

Ah, Rich McAllister "slipped in" (as we used to say on the Well). Yeah, that would likely explain it.

#6 ::: Teresa Nielsen Hayden ::: (view all by) ::: January 12, 2004, 10:08 PM:

Maybe so, Rich, but I won't have to clean out this batch all over again. I understand MT Blacklist is slated to add the ability for users to post spammer IP addresses to a central registry. That will have its own problems, of course, but it'll greatly diminish the utility of comment-spamming. The trick is to make the practice more troublesome and expensive than it is rewarding.

#7 ::: Steve ::: (view all by) ::: January 12, 2004, 10:50 PM:

It's reasonably hard to fake an IP address at the network level, but it's generally easy to trick a web server. This is good in some contexts and bad in others.

#8 ::: Liz Lawley ::: (view all by) ::: January 12, 2004, 10:57 PM:

I got hit, too. Combed over a hundred out by hand. Feh. Worse than lice.

Thanks for sharing your blacklist. Hopefully that will keep things under control tonight on my site...

#9 ::: Erik V. Olson ::: (view all by) ::: January 12, 2004, 10:58 PM:

The rub. If Rich is correct, that this guy is using cracked machines to forward, then blocking by ip will be exactly as effective as the Maginot Line was.

A few tests, using PNH's handy list of IPs, will tell me more. However, a few quick lookups show me these machines are scattered all across the net. One appears to be a school district machine in Idaho (www.d59.k12.id.us=199.104.82.18) One's in japan (exp110rb.nsz.co.jp = 210.197.88.11) One, well, I guess Comms Resources won't be including Mail. (mail.commsreources.com=194.154.176.242)

More on this in a bit -- this sort of looking about takes a bit of time, and the machine is also running "make buildworld" for FreeBSD 5.2-RELEASE, so this might take a bit.

#10 ::: Scott ::: (view all by) ::: January 12, 2004, 11:00 PM:

Hmmm. I wonder if this is related:
from http://www.internetnews.com/ent-news/article.php/3297661

"Another day, another virus.

Unsuspecting Internet users were greeted Friday with an e-mail message purportedly from windowsupdate@microsoft.com to update their computers. The message has the subject line: Windows XP Service Pack 1 (Express) - Critical Update. Problem is, the message isn't from Microsoft and the patch is actually a back door Trojan."

This particular Trojan, called Xombe, downloads a file that launches DoS attacks. I heard about this today from my EDUCAUSE subscription, and when I saw you were getting hit again, it rang a bell.

#11 ::: Patrick Nielsen Hayden ::: (view all by) ::: January 12, 2004, 11:03 PM:

Not related. Although the Windows trojan did occasion a warning from our corporate systems department this morning.

#12 ::: Scott ::: (view all by) ::: January 12, 2004, 11:08 PM:

Worth a shot. Maybe someone will reconize it and NOT download that particular nasty.

Good luck getting this one shut down.

#13 ::: Reimer Behrends ::: (view all by) ::: January 12, 2004, 11:11 PM:

While I don't have a weblog, I run a few wikis, and some of them got recently hit by spam attempts as well. And as with weblogs or guestbooks, it is hard to keep wikis from being abused by bots. Now, I have developed some techniques to keep them out, several of which are probably also used by weblogs.

Most importantly, however, I am formatting all external URIs so that they go through a redirector instead of referencing the target page directly. The URI for the redirector (such as http://www.example.com/redirect?uri=http://target.example.org/) is not accessible to Google and other search engines (blocked in /robots.txt). Thus, links to external sites (except where explicitly approved) do not do anything for the pagerank of the external sites, rendering googlespamming ineffective.

Naturally, that doesn't help a lot if only a few individual sites do it. But if such techniques were widely deployed, comment spamming would become essentially a pointless exercise. It would not prevent the spam, but there wouldn't be anything to be gained from it, either.

The downside is of course that URIs become longer, less readable and require an additional server access (unless you augment the A element with Javascript). There are probably also other problems that I haven't figured out yet. But at least I have the satisfaction that any spammer who gets through my defenses won't gain anything from it.

#14 ::: Teresa Nielsen Hayden ::: (view all by) ::: January 12, 2004, 11:19 PM:

Reimer, any increase in difficulty is good.

I've been having a big upsurge in spam over the last several days. It's not related, yes no?

#15 ::: Jason ::: (view all by) ::: January 12, 2004, 11:29 PM:

Not being savvy with this stuff, I have to ask: what about those of us who don't have MT Blacklist? Or don't blog with MT at all? Does anyone know a way that we (I use greymatter, for the record, but can't speak for anyone else) might be able to take advantage of the lists that Kip and Patrick and Teresa have generously provided with?

Many thanks, in advance.

#16 ::: Kevin J. Maroney ::: (view all by) ::: January 12, 2004, 11:32 PM:

I posted about widescale ip spoofing for spammers back in October on my Livejournal. It's probably worth reading if you don't know about it.

#17 ::: Erik V. Olson ::: (view all by) ::: January 12, 2004, 11:43 PM:

A bit passes.

It's clear that all of these machine are compromised. All are responding to web proxy queries on odd ports.

Appropriate abuse addresses have been notified.

The Solaris 8 box that has been up for over a year was impressive, but they didn't keep the patchlevels up -- esp. on Apache. Oops.

#18 ::: Teresa Nielsen Hayden ::: (view all by) ::: January 12, 2004, 11:45 PM:

Erik, I have a sense that that would be perfectly fascinating at slightly greater length.

#19 ::: Teresa Nielsen Hayden ::: (view all by) ::: January 12, 2004, 11:48 PM:

p.s., Erik, should I save you some of the cranberry-and-clementine marmalade?

#20 ::: Erik V. Olson ::: (view all by) ::: January 12, 2004, 11:51 PM:

Exhaustion is more so.
Thoughts hazy, ask again later.

But, in short. These machines all are running proxy servers, in unusal places. They're being used as relays. A couple also have open SMPT relays, so the spammers are probably using them too.

The real message a sysadmin gets when he sees a machine that's still up after a year is "Did they really not find a flaw in the kernel? I doubt that."

#21 ::: Steve ::: (view all by) ::: January 13, 2004, 12:05 AM:

This latest attack seems to be a wave of crapflooders using a tool called "FloodMT". I didn't realize that this particular practice had migrated off of Slashdot. Their CVS page on Sourceforge seems to be down, or I'd take a look at it and see how it works, but Erik's diagnosis is right. A lot of techniques that would be dandy against comment spammers, who have the defined goal of increasing their PageRanks, are going to be useless against these sorts of attacks, which are just vandalism.

I strongly suspect that the next version of Movable Type will include options to enforce logons or some sort of Bayesian filtering. Possibly both.

#22 ::: Reimer Behrends ::: (view all by) ::: January 13, 2004, 12:10 AM:

Jason,

assuming that you have an appropriately configured Apache, the following lines in your .htaccess will block the given addresses from submitting any forms (that includes posting comments). They will still be able to read from your site.

<Limit POST>
Order allow,deny
Allow from all
Deny from 61.11.26.134
Deny from 63.226.96.246
Deny from ...
</Limit>

Addresses should be numeric IP addresses. You can also block entire subnets. Consult the Apache documentation on how to do that. Access control can be even more fine-grained, blocking only access to certain URIs (cf. the Location, Directory and Files directives). This may be necessary if the spammers use GET instead of POST requests to insert their spam.

#23 ::: Andrew Willett ::: (view all by) ::: January 13, 2004, 12:10 AM:

Blacklist updated, although my comments channels are generally so quiet that an uptick in traffic of such magnitute would be perversely exciting. Almost. Thanks for providing such a valuable public service.

#24 ::: Paula Helm Murray ::: (view all by) ::: January 13, 2004, 12:11 AM:

Wistfully replying, with no hope, of course, "I'd love some o' that clementine and cranberry marmelade, Theresa," but it's not offered to me. (and I'm the only one in my house that would consume such....) Yikes Sometimes I want to do more than Live Journal, sometimes not. these times not.
Paula

#25 ::: Yule Heibel ::: (view all by) ::: January 13, 2004, 12:13 AM:

Shelley Powers at Burningbird came out of vacation to post on this:
http://weblog.burningbird.net/fires/technology/mt_comment_help.htm
It's an entry called "MT Comments Help" with specific instructions/ suggestions for how you may be able to counter-attack this problem.... Hope this helps!

#26 ::: Paula Helm Murray ::: (view all by) ::: January 13, 2004, 12:14 AM:

If i have to register somewhere to make comments HERE, I'd gladly do it. There's several places where I'd just go, "Whatever, I don't need it." but here is not one of those places. We had so much irritating personal shit on Disturbing Auctions Daily that the owner had to take it down for an undetermined while (I'm going into withdrawl, where will I find tipsy martini glasses and angelic taxidermied mice?-- yeah, I'm weird....)

#27 ::: Patrick Nielsen Hayden ::: (view all by) ::: January 13, 2004, 12:26 AM:

Wow, I don't know Shelley Powers at Burningbird from a hole in the ground, but does that post ever reek of cranky attitude and old arguments of which I know not wot. I am, as ever, only an egg.

Unfortunately, our MT installation isn't configured to use MySQL. Increasingly, I gather that this was a mistake. I'd correct it, save that I'm not sure how to do it without rebuilding the world from a frightening level of scratch.

#28 ::: Yule Heibel ::: (view all by) ::: January 13, 2004, 12:34 AM:

Geez, Patrick, what are you channelling? "reek"? "cranky"? "old"?

forgive me for trespassing on your hallowed comments box...

#29 ::: pericat ::: (view all by) ::: January 13, 2004, 12:39 AM:

Yule, I'm thinking Patrick refers to Burningbird's writing style, not your comment.

#30 ::: Steve ::: (view all by) ::: January 13, 2004, 12:45 AM:

You saw this MT Berkely DB-to-MySQL script, though, right? I cannot vouch for its frightening scratchitude or lack thereof. (I expect the biggest problem would be preserving post IDs, but if busting all your permalinks or writing a PHP/mod_rewrite workaround is acceptable, it doesn't look too bad.)

#31 ::: Patrick Nielsen Hayden ::: (view all by) ::: January 13, 2004, 01:14 AM:

Jeez, Yule, what pericat said. I appreciate all technical pointers, certainly yours. I was just bemused at the score-settling tone of Burningbird, and puzzled about what to make of it.

"Busting all of our permalinks"? Ack! No!

#32 ::: Yule Heibel ::: (view all by) ::: January 13, 2004, 02:16 AM:

Patrick, whom I also don't know from something in the ground: I am not a techie -- to me, all this stuff is mumblefoo. But I'm hardly "bemused" by how anyone could interpret Shelley's "tone" as "score-settling" or as being in any way in an emotional register. *I'm* writing right now in an emotional register, ok, that's clear, but it's because I'm pissed off that when a woman writes factually, she's *perceived* as reeking, old, and ...what else did you say? Oh, yeah: cranky! How original! I just don't get that. I thought that, if I had MT and had these problems, Shelley's post would be helpful, and I thought it was a serendipitous coincidence that I found it on her site (which has been quiet for a couple of weeks now) tonight, just as Theresa is posting about spam attacks. Sorry I bothered.

If Burningbird's moniker were Flying Fallus and her name were Sheldon instead of Shelley, I wonder if your adjectives would have been different...

And no, I'm not having a PMS attack or anything, but sometimes something just kinda snaps.

#33 ::: Scott Lynch ::: (view all by) ::: January 13, 2004, 04:04 AM:

Yule Heibel wrote:

but it's because I'm pissed off that when a woman writes factually, she's *perceived* as reeking, old, and ...what else did you say? Oh, yeah: cranky!

If this isn't the single most fatuous and ignorant thing anyone's tried to pin on the guy since I've been hanging around these blogs, it's a shoo-in for first runner up, and it's competing for a special technical achievement award, too.

Honestly, it's like accusing him of being a potted cactus or a time-traveling Chinese spice merchant, which is to say it's left "merely fucking ridiculous" far, far behind (though both of those arguments are actually more defensible than yours).

Maybe, just maybe, you should consider reading a bit deeper into the blog-thoughts of someone "whom [you] also don't know from something in the ground" before you start conflating a simple statement of opinion with deep-rooted misogynism?


#34 ::: plover ::: (view all by) ::: January 13, 2004, 04:34 AM:

Their CVS page on Sourceforge seems to be down, or I'd take a look at it and see how it works

The FloodMT script (or one version anyway, I suppose) is posted here.

I got the link off the blog pandagon, whose comments have been forced offline for over a week now.

#35 ::: bryan ::: (view all by) ::: January 13, 2004, 07:13 AM:

well I came in late but are all those IP addresses the attackers used just the ones hitting this weblog?


The distinguishing patterns of the attacks were:

1. use of wide variety of IP addresses. Correct?
2. A large number of comments in a short period(this makes sense, if you're gonna automate an attack you probably want to be hitting harder than one caffeinated neocon on a rampage)
Question: How many attacks were logged per IP?
Were these attacks randomly distributed over MT posts. That is to say comment number 1 is on an old post (if your blog has been running for a while it follows that older posts will be hit before more recent if the attack is truly random) followed by comment two on a new post, comment three somewhere in between. I seem to remember the spamming phenom discussed earlier posts began in older comment threads and gradually moved up.

okay, those questions for now.

#36 ::: Shelley ::: (view all by) ::: January 13, 2004, 08:28 AM:

I'm sorry that Yule and you all got into a comment quarrel here over the tone of my posting. I've written about a dozen notes on this problem, and yes, I guess I am getting tired of repeating myself and getting ignored.

Blacklisting is dangerous and won't solve the problem, as the script kiddies are showing with the new crapflooders, just recently pointed out. I'm sorry for the folks not using MySQL with MT, but if you are using MySQL, I really do suggest you consider shutting down comments on older posts. Might help a little, but a rewrite of the MT comment system is desperately needed now.

#37 ::: Patrick Nielsen Hayden ::: (view all by) ::: January 13, 2004, 08:50 AM:

Look, I'm sure Shelley Powers, who writes Burningbird, is a perfectly fine human being, and as I said I certainly don't disdain any technical help or pointers. However, examples of the "score-settling tone" I was referring to include:

Did mt-blackllist work? No. As I've said before, spammers have better habits then so-called legitimate developers, because they listen to their 'customers' and adapt accordingly. [...]
The spammers have gotten smarter. Eventually if you restrict their access enough, you'll shut down comments to everyone. The only true solution to this problem is better comment management in MT. However, if you feel as clever as the spammers, perhaps you need to attend a smart people conference, come up with nifty, neato, just gee wiz smart solutions (put into the public domain of course, with the cutest little cc brand). [...]
For all the mt-blacklist users, if you're using global lists and not checking that legitimate URLs have been inserted, then chances are you're opening your system up for a poison pill attack -- causing your system to filter common, legitimate URLs, and hence making the mt-blacklist less reliable. The technique is common in email spam, as outlined by Ken Coar. Something to think of next time you import several hundred entries, depending on technology when the spammers depend on their brains.
However, makes no nevermind to me what you do. I'm just passing through.
Now, I've been crankier than this on a thousand occasions; it doesn't make Shelley Powers a bad person any more than it does me. What I said was that I was bemused by it--it seems clear that Powers is engaged in an ongoing philosophical dispute over the design and implementation of anti-spam measures, and it's a little hard for a newcomer to sort out the issues through the sarcastic comments about "so-called legitimate developers" and "gee wiz smart solutions (put into the public domain of course, with the cutest little cc brand)." For all I know Powers is 100% correct on all issues and 100% justified in being impatient and cranky. Nonetheless, the response of the newcomer is still bemusement.

Yule Heibel says "I'm pissed off that when a woman writes factually, she's *perceived* as reeking, old, and...what else did you say? Oh, yeah: cranky!" First, I never called anybody "reeking" or "old", and I applied the adjective "cranky" only to Shelley Powers's post. Yanking the word "old" out of the context of my reference to "old arguments" in order to impute that I applied it to Shelley Powers's person is a rhetorical trick that would be disreputable if it weren't so pathetic.

However, Yule Heibel's root charge, that I wouldn't have remarked on Shelley Powers's post if she were a man, founders on the plain fact that I didn't know Shelley Powers was a woman until Yule Heibel said so. The name Shelley is not exactly gender-determinative; the one Shelley with whom I'm currently socially acquainted is definitely a man. In other words, while I have nothing against Shelley Powers, Yule Heibel is invited to go fly a kite.

#38 ::: Shelley ::: (view all by) ::: January 13, 2004, 08:54 AM:

P.S. To the gentleman who thought this might be a crapflooders attack, not it wasn't. This attack was much more sophisticated than the rather primitive script kiddies one shown in Slashdot. The mt-blacklist code should stop this one, though it may not be able to throttle the requests fast enough to not impact on the CPU, temporarily.

Last night's attack was quite ingenious for its work around of mt-blacklists capabilities, and wasn't meant to necessarily take down a machine, though enough MT users on a machine, all of whom have comments emailed, would have been enough (hence this does make this into a DDoS attack). No, this was spammers trying for enough of a hit to get scrapped by google bot before the cleanup was finished.


#39 ::: Patrick Nielsen Hayden ::: (view all by) ::: January 13, 2004, 09:00 AM:

The actual Shelley Powers posted while I was writing the above. Welcome. As I said, I don't think your tone was anything to be bent out of shape about; I just couldn't quite figure out all the issues. I've certainly written more crankily than you when I felt I was "repeating myself and being ignored."

I seem to recall seeing, somewhere, reference to tools to automate the shutting down of posts on older comment threads. Are these tools only available for MT installations that use MySQL? As for a revamping of MT's comment system being "desperately needed," it was my impression that such a thing is slated for the upcoming MT 3.0.

#40 ::: Shelley ::: (view all by) ::: January 13, 2004, 09:01 AM:

Uhm, Patrick, Yule, let's leave this be, shall we? Yule, you're an angel and I appreciate so much what you did, but if folks don't read me they won't be aware of the past writing I've done, and the flack I've received because I've been critical of MT's comment system and using blacklisting technology because of the dangers of banning legitimate people.

Patrick, perhaps you might have tried making an assumption that there was a reason for whatever tone you disliked in the posting and focused instead on the fact that I was trying to help people out of a situation.

#41 ::: Patrick Nielsen Hayden ::: (view all by) ::: January 13, 2004, 09:03 AM:

To answer Bryan: Yes, that list of IP addresses represents those attached to a portion of the spam comments posted to Making Light and Electrolite last night, up to the point where I stopped bothering to note the IPs. The answer to your question "how many attacks were logged per IP" is in the list, where it says "2x" or "6x", etc. Yes, the attacks appear to have been pretty much randomly distributed across comment threads, new and old.

#42 ::: Shelley ::: (view all by) ::: January 13, 2004, 09:04 AM:

Cross comment posting, dangers thereof. Thanks Patrick, and yes, lets move on to helpful stuff.

Tools to do this, I betcha there is. I'm so used to working directly in MySQL. And I found some things:

http://www.rayners.org/2003/12/27/closing_comments_on_old_entries.php

There's others if you use the following Google search "mt comments shutting down older
.

#43 ::: bryan ::: (view all by) ::: January 13, 2004, 09:25 AM:

closing comments on older threads isn't the solution, since the comments can be on newer or older threads, if older threads are closed then newer threads are bound to be hit more often (unless, dare one hope, the robot doing the posting leaves a site as soon as it tries to post and gets refused).

Here's an idea (don't know if it can be done in MT and it's temporary anyway)

If the comment form on an older post was inside an area the display of which was set to none, an automated solution would still find the form, post number etc. and make a post.
(note that this makes problems for older browsers, more work can be done to take their problems into consideration though)
Any IP number posting to a form the display is set to none on, is a temporary bad IP number, to be closed down for a period of 1 day, if it does the same thing three times in a row it is shut down permanently unless the person trying to post from that number contacts you and explains the problem.

I consider this to be a temporary solution because if I was writing one of these things I would want it to figure out what were the newest posts and hit those exclusively. That they are not doing this yet though, suggests that they might have problems doing it.


How does it know where to post I wonder, is there a get done first which feeds info to the post process - what do you all think?

#44 ::: Feòrag ::: (view all by) ::: January 13, 2004, 09:33 AM:

For reference, here's my blacklist which already has yours and Kip Manley's imported into in, as well as my own batch of scumbags.

#45 ::: adamsj ::: (view all by) ::: January 13, 2004, 09:43 AM:

Most all my hits are on older threads--I have taken to turning off comments after I get repeated hits on a moribund thread--a compromise that I don't like, but can live with.

I know Yule from elsewhere as a nice, helpful person, Patrick--don't let one post brown you off to her permanently.

Shelley, there was a suggestion on this weblog some time ago that (paraphrasing here) we need a way of marking some links so that Google doesn't use them in Page Ranking. I sent the suggestion on to the only person I know who knows people--what do you think?

#46 ::: Shelley ::: (view all by) ::: January 13, 2004, 09:43 AM:

Bryan, closing down comments on old posts will help, at least until MT 3.0 is released. The comment spammers aren't hitting the same post with a 100 comments -- they're hitting individual posts with three each, lower than a lot of thresholds. Closing down older ones for now will throttle back the problem -- most of this last batch of comments was on comments over 10 days old.

How they're finding posts is using Google to query on weblog entries based on the fact that comment forms have the same labels, i.e. URL, Name, and so on, in addition to the word 'blog' somewhere. Easy.

There's another comment spammer or spammers at work using recent updates in weblogs.com, but this person is putting comment spam in manually, changing what they write to fit your topic. At least they're not overloading the system.

#47 ::: Shelley ::: (view all by) ::: January 13, 2004, 09:51 AM:

Adam, it would be great if Google did use meta tags so that we could mark up links this way, but as far as I know, I don't know if Google is considering this. This does put a burden on a very lightweight bot to get more sophisticated in its processing, which tends to defeat bot technology.

Plus, I like my good commenters to get the link buzz. I hate to have to turn off for all just to get the few spammers that hit (and they are few, but annoying).

Personally I like Sam Ruby's new approach -- all comments have to go to preview first, and the person has to then accept the preview and move on. This not only eliminates comment spam, but acts as a tiny cool down period for people writing nasty, nasty things. Really elegant solution.

( http://www.intertwingly.net/blog/1682.html )

If MT 3.0 comes out, soon, with really good comment management as well as decent comment throttling (protection against script kiddies) (and I'd like to vote on including a button to pus that forces comments to go to preview mode if we wish) I think that's our solution. We can then drop the blacklists and the tweaky stuff and get back to writing. But we really are at this point dependent on that new release. Anything else is a stop gap.

#48 ::: Shelley ::: (view all by) ::: January 13, 2004, 10:00 AM:

You all have to remember something here -- for most of the comment spammers, they would rather not to be noticed so they would prefer to write comments on older threads. Why? Because googlebot finds them regardless of the age of the posting, and if the comment is on an older thread, the weblogger may not be as inclined to delete it. So older posts are preferable, not newer ones.

However, last nights spam commenter has all the markings of an old friend we've tangled with before, who got really pissed at the actions of some webloggers who got fairly aggressive in pushing back at him. This recent episode was more in the line of thumbing his nose at us, saying "You can't stop me no matter what you do". Ouch. But we knew this was coming.

#49 ::: Jason ::: (view all by) ::: January 13, 2004, 10:32 AM:

It's a bit up-thread by now, but: thanks Reimer. I'll give that a try when I get a free moment.

#50 ::: Michelle ::: (view all by) ::: January 13, 2004, 10:46 AM:

Wow. For the first time I'm not completely unhappy that my copy of MT is busted and I'm writing my blog with straight HTML, no comments.

How do you force comments into Preview first, before posting? I'd like to do that anyway, but am not savvy enough with MT to figure it out myself (which explains why my blog has been busted for so long.)

Thanks.

And I hope that someone comes up with a good solution soon. I love reading here, and would hate to lose y'all.

#51 ::: travis ::: (view all by) ::: January 13, 2004, 11:14 AM:

Dave Shea (CSS Zen Garden/Mezzoblue) posted this (http://www.mezzoblue.com/archives/2004/01/12/mt_comment_s/index.php) yesterday. It may shed some light on what's going on with the sudden increase of comment spam.

#52 ::: Steve ::: (view all by) ::: January 13, 2004, 11:28 AM:

To the gentleman who thought this might be a crapflooders attack, not it wasn't. This attack was much more sophisticated than the rather primitive script kiddies one shown in Slashdot. The mt-blacklist code should stop this one, though it may not be able to throttle the requests fast enough to not impact on the CPU, temporarily.

That was me, Shelley -- a friend reported that a tool called "MTFlood" or "FloodMT", I forget which, was used to target his Typepad blog last night. It doesn't appear to be precisely the same tool as in that Slashdot post; there is, of all things, a SourceForge project for it. If this wasn't the same group of people that hit the Nielsen Haydens (and actually it seems to be an entirely different group of people than the ones who went after Pandagon), it's a rather unpleasant coincidence.

I was looking at fooljay's MT-blacklist code, and it seems like something that could be adapted to include throttling capability fairly simply; the problem is that MT-blacklist overrides the base comment functions, so you'd either need to patch MT-blacklist directly or choose between the plugins.

And Shelley's right that the real solution isn't going to arrive until MT3, although Sam Ruby's solution seems like a good one.

#54 ::: sennoma ::: (view all by) ::: January 13, 2004, 11:50 AM:

A SourceForge project for a spam tool???? The apocalypse is upon us.

From here, a possible fix:
---------------
Add this to robots.txt
User-agent: *
Disallow: /banme.cgi

Then put this on every page:

By banning any robot not honoring your robots.txt, you should get rid of every spambot, leaving only the average cretin dong it manually (against whom I'm afraid you really can't do anything anyway). More interestingly, set the default file to post comments to ban people accessing it and block it from robots.txt as well. There are several ways to do this, but the important thing is: enforce your robots.txt: if they can't play nice, don't let them play at all.
Posted by: Effovex at January 12, 2004 10:22 PM
-----------------
Here, a possible nightmare.

I like the "force preview" idea; can I do that in MT? I couldn't see a way to do it from the "blog author" interface (bear in mind that I am a tech-moron).

#55 ::: Mike ::: (view all by) ::: January 13, 2004, 12:29 PM:

The people who attacked my site are from many of the same IP's Patrick posted earlier. I removed a link that came in from a Russian ISP around 5am, and an hour later the flood happened using many many of the same host urls.

#56 ::: Mike ::: (view all by) ::: January 13, 2004, 12:31 PM:

The spammers are finding the sites by searching for:

post a comment name: email address: url: remember personal info?

That's how the initial spam found me.

#57 ::: Kate Nepveu ::: (view all by) ::: January 13, 2004, 12:42 PM:

I like the force-preview idea, fwiw, and not only because my fingers often type words they think I ought to mean (because hey, I type them more often!), instead of what I actually mean . . .

#58 ::: Patrick Nielsen Hayden ::: (view all by) ::: January 13, 2004, 12:46 PM:

Regarding David Raynes's plugin for closing down comments on older threads, I installed it, set the permissions, and told it to shut down comment threads on Electrolite more than 45 days old.

Absolutely nothing happened. I'm wondering if there's a limit to the number of days you can specify, and I've posted an inquiry to that effect to Raynes's site.

#59 ::: Patrick Nielsen Hayden ::: (view all by) ::: January 13, 2004, 12:51 PM:

I like the force-preview idea too, but I haven't yet found a site that explains how to implement it in a way that I quite understand.

Just to bring newcomers up to speed, I'm not afraid of code, command lines, HTML or CSS, but I start getting a bit worried when confronted with great wodges of incomprehensible Javascript with minimal explanation for us laggards.

#60 ::: Tina ::: (view all by) ::: January 13, 2004, 01:09 PM:

I haven't read through the rest of the comments, because my mouth is still hanging too far open after reading this:

[...] because I'm pissed off that when a woman writes factually, she's *perceived* as reeking, old, and ...what else did you say? Oh, yeah: cranky! How original!

And all I can think of is the sheer amount of gall it must take to say that to Patrick, of all people. On Teresa's weblog, no less.

So, stick this in your pipe and smoke it: I'm female, and I thought the writer was cranky, needlessly sarcastic, and holier-than-thou. How's that grab ya? "I told you so, I told you so, and you're too stupid to understand why!" is the overall tone I take away from that piece, and I don't care if the person who wrote it is male, female, or a trisexual creature from the planet Antares, that's still annoying.

Implying someone is obviously just bashing a writer because she's female just because the person with the complaint is male is, in my book, pretty darn sexist, which would, I believe, qualify as irony in this instance.

(Now I will go back and read the rest, and probably find this has all been settled peacefully while I was away, with my luck, but... honestly!)

#61 ::: Tina ::: (view all by) ::: January 13, 2004, 01:12 PM:

Ahh, Shelley, you seem much more reasonable here than in the post being commented on. See what reading the whole thread can get me?

That's okay. I've grown accustomed to the taste of my own foot. :)

#62 ::: pericat ::: (view all by) ::: January 13, 2004, 01:12 PM:

I had the same result from the mt-close plugin. So I started closing them manually. A chore almost as much fun as sorting grains of sand by colour. While I was thus occupied, I recieved, deleted, blacklisted and banned two spam comments and noted MTBlacklist deflect a third, all apparently aimed at an entry of mine titled "Countering Blogspam" from last October.

<puzzlement>Anyone spamming my weblog is seriously hard up for page rank.</puzzlement>

#63 ::: Patrick Nielsen Hayden ::: (view all by) ::: January 13, 2004, 02:52 PM:

By the way, for what it's worth, I read all of the older Shelley Powers posts to which she provided links, along with more recent posts to her weblog Burningbird, and it was all very interesting stuff. Good photography, too.

#64 ::: Avram ::: (view all by) ::: January 13, 2004, 03:11 PM:

I92m trying to figure out what kind of stuff Teresa would ask him to pick up for her if Patrick was a time-traveling Chinese spice merchant.

#65 ::: Patrick Nielsen Hayden ::: (view all by) ::: January 13, 2004, 03:32 PM:

Some of that Southeast Asian olive oil, no doubt.

You know: Mekong Light.

#66 ::: Tina ::: (view all by) ::: January 13, 2004, 03:46 PM:

Patrick, you're not only a tool of the patriarchy, you're a punner? I take back any premature leaping to your defense I've ever done!

(That really was very groan-worthy.)

#67 ::: Reid ::: (view all by) ::: January 13, 2004, 03:47 PM:

Changing the labels on your forms helps, but what I've done is change what the 'bot/spammers are looking for: "mt-comments.cgi." I've renamed mine something tangential to its actual purpose, and then used the provided place in mt.cfg to point to the new name.

This doesn't stop to individual cretin pasting in his spam into the comments one at a time, but it does seem to thwart these mass bombings.

#68 ::: Teresa Nielsen Hayden ::: (view all by) ::: January 13, 2004, 04:45 PM:

Y'know, Yule, I was inclined to cut you a lot of slack because I figured you were some variety of generally benevolent Programming Guy (not a gender-specific term) who wasn't necessarily familiar with the idea that tone can be read separately from the (pick one) [main argument] [ostensible content] [obvious surface information]; and besides, you were posting late at night, after the H.G. had gone to bed.

I'd half-drafted a post in which I explained to you that Patrick, who in a practical and constantly reality-tested fashion reads texts for a living, was predictably distracted by an article in which the strong emotional charge evident in some passages has no obvious connection to the straightforward exposition that constitutes the majority of the piece. He'd had a very characteristic editorial reaction: Hmm, there's something else going on in that piece.

And I figured that since (as I thought) you weren't familiar with the idea of reading tone separately from content, you'd mistaken Patrick's mildly snarky but essentially innocuous remarks for a slur upon Shelley herself. She's obviously your friend, and a friend is never a bad thing to have. I thought that perhaps you'd become frustrated by your mistaken attempt to map Patrick's (supposedly critical) remarks onto Shelley's main argument, where of course they didn't fit; and that this frustration had given rise to your second salvo, in which you accused Patrick of malfeasances unsupported by the text (f.i., misogyny), and contradicted by the text (f.i., old in connection with anything except arguments).

But today, when I went to double check, I discovered/remembered (I'd known it, but had somehow misplaced the information) that you're Dr. Heibel, a resident of Vancouver, and have an advanced degree in Art History (Harvard '91). The advanced degree in a non-technical discipline and the three-hour time difference have done my exculpatory explanations no good at all. I'm in the market for new ones.

Meanwhile, over at Burningbird, there's Shelley morosely saying that this "shows the dangers of writing in anything other than the most non-emotive manner," which I think is a far more dispiriting reflection than the situation warrants. In fact she's an excellent writer (I've admired her writing for a long time now), and here she is in the midst of a lively ongoing conversation (and much appreciated for it); so perhaps, just perhaps, last night wasn't the sort of occasion you imagined it was?

#69 ::: Mitch Wagner ::: (view all by) ::: January 13, 2004, 04:51 PM:

The top of my wishlist for an upgraded comments system would be a checkbox at the bottom of the post-comments form that said, "E-mail me when someone posts to this thread." That way I wouldn't have to check threads manually to see if there's been an update.

Datapoint: My blog is on the same hosting provider as TNH/PNH, I haven't had any uptick in spam. I get a few spams per week.

#70 ::: Teresa Nielsen Hayden ::: (view all by) ::: January 13, 2004, 05:00 PM:

I like the "force preview" idea too. In the meantime, if Mike's right about them targeting

post a comment name: email address: url: remember personal info?
changing the wording of those prompts is an easy way to make spammers' lives harder.

We don't have to make it impossible. We just have to make it more trouble than it's worth.

Oh, and Shelley? Thanks, in a weird way, for saying this wasn't just vandalism. I know it means we're dealing with a more sophisticated opponent, but I was disturbed by the idea that a mess of this magnitude was just some random dweebs amusing themselves.

#71 ::: Teresa Nielsen Hayden ::: (view all by) ::: January 13, 2004, 05:11 PM:

Mitch, I have two mechanisms for that: an automatic scrolling list of the 200 most recent comments posted, and readers that spot comment spams almost as fast as they happen.

#72 ::: pericat ::: (view all by) ::: January 13, 2004, 05:12 PM:

jill/txt suggests adding this domain to blacklists: e--pics[dot]com

#73 ::: Mitch Wagner ::: (view all by) ::: January 13, 2004, 05:21 PM:

Teresa, the subscribe capabilities I'm looking for aren't something I want as a spam-blocking measure; I want it as a convenient way to keep track of discussiosn I'm involved with on other people's weblogs. Right now, in order to keep track of them, I have a complicated system of URLs that I check daily, with older discussions that appear to have died out moved to weekly checks to see if they revive.

The Ultimate BBS allows users to subscribe to individual discussion threads by e-mail, I'd like to see the same capability added to blogging systems, including MT.

#74 ::: Xopher ::: (view all by) ::: January 13, 2004, 06:31 PM:

Some of that Southeast Asian olive oil, no doubt.

You know: Mekong Light.

Patrick, I hate you. And love you, which in this context is exactly the same thing.

#75 ::: Jim Millen ::: (view all by) ::: January 13, 2004, 06:35 PM:

Just as something I haven't seen mentioned in these comments yet...

http://james.seng.cc/archives/000152.html

Short version: Using Bayesian filtering to cut out the spam. No experience with using it for comments, but if it is as good as my email filter (Keir.net's K9) then 97% or better accuracy can be acheived. Worth a look as blacklists have their flaws, as others have pointed out.

#76 ::: novalis ::: (view all by) ::: January 13, 2004, 07:10 PM:

How exactly is registration supposed to stop spam? Won't they just spam your registration form, register a zillion accounts, and use them? How can you possibly detect legitimate registrations?

Generally, successful spam (generic) prevention techniques rely on finding and exploiting a limited resource. For example, CAPTCHAs require sighted human time to solve. Hash cash schemes require computation time. SPF requires domain names which haven't been sued for spam (er, just read the proposal).

Hm, what about a passport-like single-sign-in backed by phone-based human-run human tests, where signing up requires calling a 900 number. It's self-funding. It puts a price on accounts, and thus a limiting factor. And individuals are willing to pay because their signing will work across a whole network of blogs. The price could be high ($15, say), with a partial rebate ($13 back, with $2 to cover the cost of running the thing) for posting at least m comments which a blog owner in the network (of course, these would have to be vetted) "commends" as human-generated and insightful to the network operator.

Of course, it won't work for email, because email is the killer app; because there's no network of email receivers (everyone gets email). But for comment spam, it's a thought.

Also, it's horribly unfair to the poor, but I'm not sure that something couldn't be done about that. I'm not sure what, but something.

Wow, is this the wrong place to post this bad idea.

#77 ::: David Moles ::: (view all by) ::: January 13, 2004, 07:34 PM:

It's probably easier just to have the registration form impose some "cost" -- in terms of effort, or time, or complexity -- that's negligible for a human being but difficult or impossible for an automated system.

One of the most common tricks is to display as a graphic some characters that are easy for a human to read but hard for a computer to figure out without sophisticated OCR, and require that you retype those characters in the registration form. Even email confirmation would probably cut down on false registrations, since the spammer would also have to continually generate new working email addresses.

#78 ::: Graydon ::: (view all by) ::: January 13, 2004, 07:41 PM:

Novalis -

The problem with all such schemes is that they either don't work or they hand total control of the universe of discourse to the signing authority for the authentication keys.

There are two POST events with a comment; the one that says 'I want to post a comment' and the one that says 'this is my comment, please stick it up on the weblog now'.

Enforcing a time difference commesurate to amount of content in the post between those two events gets you something; enforcing non-simultanaeity of commenting from a single IP gets you something, too. Check the email address given for existence, and, if the email address hasn't commented before, require email confirmation of the intent to post.

Cache the IP associated with the email address; if the next post with that email address isn't from that IP range, refuse it. If any of the URIs in the post aren't valid, junk the post. If any of the URIs in the post are on a blacklist, junk the post.

This is a shedload of overhead, but it doesn't require a signle central authority or the idea that there's going to be a single elegant solution.

#79 ::: Shelley ::: (view all by) ::: January 13, 2004, 08:18 PM:

I was hesitant about coming back into the comments and responding to the original 'tone' thing associated with my post, but I did want to agree with you, Teresa, that Yule is a very good friend. More than that, a brave woman who has seen me get kicked around a bit by the tech community, and hasn't like it much. (Well, neither have I to be honest.) A good person.

But I think that the original topic of this post and all of our interests should return to center front and tone and lack or abundance thereof slide back into the obscure corner from which it belongs.

Graydon, my favorite comment spammer, whose signature I tend to recognize now, actually worked around the block of delayed time between accessing the page and posting the comment. And with the recent spam attack, the spammer spoofed different IP addresses between posts to a comment thread, which overrode this hand coded block.

As for caching an IP address with an email, that didn't sound right. Not everyone has a broadband always on connection with the same IP address. In fact, IP addressing schemes are pretty dead as a solution at this time, too easy to manipulate.

And David, yes we've talked about graphic challenge systems, as novalis also discusses. Unfortunately, these are not workable for the visually impaired.

Another thing to keep in mind is maintaining a sense of perspective about all of this, something I think novalis is hinting at. I think.


#80 ::: Shelley ::: (view all by) ::: January 13, 2004, 08:37 PM:

I didn't see this elsewhere in all the fooflah, but I guess LiveJournal has a challenge image (type what's in the image -- CAPTCHA) that also has an audio component for audio only browsers used by the visually impaired. Now that is supremely cool -- right up there with Sam's forced preview (which still appeals for the 'cool down' aspect if no other tech reason). These two combined could almost be the perfect comment spam killer. I won't say is the perfect comment spam killer -- but close.

#81 ::: Teresa Nielsen Hayden ::: (view all by) ::: January 13, 2004, 08:37 PM:

Okay, so I put in a graphic challenge system, and I put a note on it telling people who are visually impaired to drop me an e-mail and I'll register them for the site. It's damned inconvenient, but it's better than having no comments threads at all -- which is what I'm looking at if we can't find a way to keep this garbage out.

During the last wave, I got a look at a sort-of-weblog, some kind of corporate site, that had left its comment threads open but hadn't been cleaning out the spam. They had spam messages piled six or seven deep. It read like the late-night conversations of a window full of department store mannequins.

On another note, it seems to me that if the purpose of all this is to get googlejuice for the beneficiary site, it would be awfully helpful of Google to announce a policy of not rewarding comment spamming.

#82 ::: Teresa Nielsen Hayden ::: (view all by) ::: January 13, 2004, 09:01 PM:

I figure comment spam is like graffiti. You can make it harder to put up graffiti, but you can't make it impossible -- ask any archaeologist. So you do your best to make it harder to post, knowing as you do so that some will go up anyway.

If you leave it there, more graffiti and more elaborate graffiti will be added thereto. But if you make a point of scrubbing it off or painting over it as soon as it appears, its incidence will decrease. If a few minutes' work with some spraycans means your tag will ornament your neighborhood for years go come, there's maybe some point in doing it. But if you know it's going to be gone in a few days, the costs and risks outweigh the rewards.

We may never come up with a perfect mechanism for deflecting comment spam, but a bunch of imperfect mechanisms could help keep it under control.

#83 ::: Patrick Nielsen Hayden ::: (view all by) ::: January 13, 2004, 09:05 PM:

I'd still like to find someplace that explains how to actually implement the "forced preview" idea.

#84 ::: Steve ::: (view all by) ::: January 13, 2004, 09:25 PM:

The discussion has moved on a bit (I'm very dubious about using Bayesian spam tools in weblog comments, but some of these suggestions are really good), but I can report that my concern up the page about the ease of moving a Movable Type installatio from Berkeley DB to MySQL were, as far as I can tell, completely misplaced. Good news for those of you considering it.

Shelley, I'm curious as to how Spammer X got around time limits. I understand if you don't want to trumpet the technique to the world, but would you be willing to email me?

#85 ::: Brooks Moses ::: (view all by) ::: January 13, 2004, 09:29 PM:

Graydon: Cache the IP associated with the email address; if the next post with that email address isn't from that IP range, refuse it. If any of the URIs in the post aren't valid, junk the post.

Both of these would have blocked legit comments from me recently. I post from office and at home; sometimes I post while on vacation too. On the second one -- well, yesterday I was posting something with a link to a site that just happened to be down for a few hours when I was posting.

I'm not convinced that having a central authentication system is necessarily a bad thing, if it's done well -- it certainly produces advantages on Livejournal, and I think it's likely partly responsible for what the fact that, amongst my friends with similar blogs, the ones on Livejournal get more comments. On the other hand ... ah, now here's a neat trick: the problem, to a large extent, is that it places too much power in the hands of a single auth server. But, if we instead implement the authentication via public-key encrypted tokens, you only need the auth server once to get a token, and everything after that is out of the hands of the auth server altogether. (This makes it a bit hard to revoke tokens, though, but if they're not easy to get in duplicate, blacklisting might work again.)

Problem is, how do you keep someone from stealing tokens? Can we tell the difference -- not just in this scheme, but in general -- if someone decides to borrow names and email addresses from existing posts to post with? (With the tokens, you have to trust the blog owner's server with them, certainly, even though they wouldn't get publicly displayed.)

You could, I suppose, have some method whereby one's token is also a public key, and the comment submission contains something like the url of the blog or the IP of the submitting computer or the time of day or something encrypted such that that public key will decrypt it, so that a swiped token is essentially useless. This, however, requires extra hardware or software for each user to calculate the tokens, and this sort of infrastructure becomes quite unweildy even in near-homeopathic doses.

So, I'm not sure if that's any closer to a solution or not. Ah, well; maybe someone will find it interesting.

On a completely different note, it sounds like from what I've heard from various sources that Google's whole pagerank algorithm is being broken (or at least badly bent) by weblogs in general, even if one only looks at legit posts and comments.... It will be interesting to see what they come up with as a response to it; my guess is that whatever it is, if it handles that problem it will also probably stop rewarding comment spamming as a sort of inherent side effect.

#86 ::: Cassandra P-S ::: (view all by) ::: January 13, 2004, 09:30 PM:

I wonder if they're using Google to generate the spam, somehow.

#87 ::: Teresa Nielsen Hayden ::: (view all by) ::: January 13, 2004, 10:21 PM:

Ach, Shelley, I wasn't ignoring your post; it came in while I was composing my first post that followed it, and then I added a postscript without checking the accumulated thread. And I agree; a visual challenge plus optional audio, plus forced preview, could make it charmingly difficult to automate the commenting process.

Still thinking about the social surround. One of the reasons I'm willing to consider the combined use of multiple anti-spamming measures that individually will be only partially successful, or will only be effective for a limited time, is that they'll impede and collectively may even stave off the formation of a community of spam technicians and their clients and customers who once created will have a strong interest in the continuance of comment spam. It's like rioting and looting: you want to discourage it by any means possible, both to keep people who aren't currently rioting and looting from getting the idea, and to keep people who are already rioting and looting from getting good at it and starting to form networks.

#88 ::: Graydon ::: (view all by) ::: January 13, 2004, 10:40 PM:

Brooks -

Any central authentication authority is really, really vulnerable, and web-of-trust schemes -- while not technically difficult -- are more annoying than most people will tolerate and vulnerable to spoofing and fraud.

I wasn't thinking of authenticating a URI through trying to hit it, I was thinking of DNS for starters; if you try to hit the URI, you've created a DDoS tool.

Cassandra --

Of course they use Google to generate spam, just like telemarketers use the phone book.

There isn't any solution to that one, short of never having an index to anything.

#89 ::: Mitch Wagner ::: (view all by) ::: January 14, 2004, 01:34 AM:

This authentication-by-email scheme has promise. I envision something similar to challenge/response:

User "Bob" posts a message, including a valid e-mail address in the header. He receives an e-mail at that address, indicating he has to click a link to activate the message. When he clicks the link, the message is activated.

Next time he posts, if he gives the same name, same e-mail address and posts from a computer with the same IP address, the comment goes live immediately. If it's a different IP address, it's only a minor inconvenience: he gets another confirmation e-mail.

Left as a challenge for the student: How do we prevent spammers from automating responding to the e-mail challenges?

Brooks Moses: On a completely different note, it sounds like from what I've heard from various sources that Google's whole pagerank algorithm is being broken (or at least badly bent) by weblogs in general, even if one only looks at legit posts and comments....

In what way, broken? If lots of bloggers link to a page, and that page's pagerank is elevated, then isn't that a case of Google working as it should? Googlebombing doesn't seem to have any significant impact on real searches, it's a novelty, a game.

I have seen some pollution of Google search engine results recently, though, where I type in a search term and the top-ranked results are simply lame search engines or indexes festooned with ads (and now I can't think of any examples).

#90 ::: Mitch Wagner ::: (view all by) ::: January 14, 2004, 01:46 AM:

Oh, crap, I didn't know there was a test today. Now I wish I didn't get high just before class.

#91 ::: Mitch Wagner ::: (view all by) ::: January 14, 2004, 01:48 AM:

What I was going to say, before I was distracted by the opportunity to make a cheap joke, is that another problem with the challenge/response system I described is that e-mail is pretty unreliable. Challenges would get lost.

However, a challenge-response system would work if you simply left unapproved messages in a staging area, rather than deleting them outright. At intervals, the blogger could come by and review the messages in the staging area, releasing any legitimate messages for publication and approving the authors of those messages for future posts.

#92 ::: Kevin J. Maroney ::: (view all by) ::: January 14, 2004, 02:15 AM:

Stacking a batch of weak anti-spam defenses won't stop professional criminals, but it will stop casual vandals. The professional criminals have nothing better to do with their time than automate ways to navigate around the weak defenses.

Better to have two very separate layers of defense: One really good tool to block them (like registration systems which require some sort of human interactivity to navigate) and a really good tool to get rid of the vandalism fast when it gets in anyway. It sounds right now like MT has neither, which is a shame. (I doubt there are any off-the-shelf journalling rools which are any better, though.)

#93 ::: J Greely ::: (view all by) ::: January 14, 2004, 03:53 AM:

As MT bloggers go, I'm very small fry, but there are a few subjects where I seem to get sorted to the top on Google (not to mention the one comment thread that is linked to on a thousand popular image pages), and I've had precisely zero comment spams.

Why not? Because I set up MT with mod_perl, so my comment submission script isn't named mt-comments.cgi. I know some people have had problems changing an existing blog to use a different URL, but this common element is what made MT such an easy target.

Any CGI script name that gives 1.8 million hits on Google (even now, months after the first big attacks) is too good to pass up, if you're evil and lazy.

-j

#94 ::: Erik V. Olson ::: (view all by) ::: January 14, 2004, 08:07 AM:

Or, evil and smart -- which is often a good thing. "What are all good Sysadmins? Lazy."

T - I must pass on the marmalade, though it sounds like the perfect topping to a pre-century carbloading.

Changing the CGI name is security through obscurity -- but so is parking your beater bike near, but not next to, the guy who's parked a $1200 bike and locked it with a cheap lock. Not that I'd ever do that.

#95 ::: Teresa Nielsen Hayden ::: (view all by) ::: January 14, 2004, 09:25 AM:

Mitch, The Register--or maybe it's just someone at The Register--has been on a bit of a tear about weblogs distorting the Googleranking system. You can see examples here, here, and here. Here's Wired on the same subject.

Basically, Googleranking-type systems assumed a model of links from fixed sites. Then along came weblogs, which throw off links in all directions as part of the conversation. It gives us a lot of googlejuice. Type "electrolite" into Google, and four of the first ten results will be Patrick, which means he's beating out a popular REM song.

Huh. I just found out where you got the title of your weblog. That's weird.

The Register makes the whole googlewarping thing sound mildly apocalyptic, but I'm not sure that's justified. Nor do I think all weblog links and posts are trivial or ephemeral. For instance, that piece I did a while back on animal hoarding gets referenced by animal protection fixed sites.

Anyway, that's the story, insofar as I know it.

#96 ::: rea ::: (view all by) ::: January 14, 2004, 09:29 AM:

It's curious that two other sites I often visit--Pandagon and Brad De Long--have also been attacked through their comments in the last couple of days. Is there some connection among these attacks, or is this just the internet spaamming season?

#97 ::: Faren Miller ::: (view all by) ::: January 14, 2004, 09:42 AM:

I've been involved in an online discussion group focused on a fairly obscure '60s group (Quicksilver Messenger Service) and its guitarist John Cipollina -- mostly the modern equivalent of concert tape exchanges, but sometimes interesting dialog -- but I've had to stop getting their daily digests because of all the porn spam posts. They had a brief exchange of emails about the problem, but nothing happened to fix it (not a very techie bunch, I guess). That group belongs to AOL, and my question is: Why don't the big online companies try to develop their own fixes? Do they want to see their discussion groups melt down? Last summer, AOL's "solution" to spam was apparently to block everything from my own server, Juno, which caused me no end of trouble when Locus was using AOL as a temporary mail drop. Can't they do better than that? (I know the US govt. certainly hasn't, with its useless "anti-spam" legislation. I still get all the Viagra ads.)

Pardon my techno-ignorant bewilderment, but this seems like a problem for more than bloggers.

#98 ::: Patrick Nielsen Hayden ::: (view all by) ::: January 14, 2004, 10:49 AM:

I knew without even looking that all three of Teresa's Register links were to stories by Andrew Orlowski, a journalist who seems to have a massive hate-on toward blogs and their eeeeeeeeeevil effect on Google search results. I can't pretend to understand the issue in depth, and FOR ALL I KNOW ORLOWSKI IS ON TO SOMETHING (phrase emphasized so we can easily look back to it when his nearest and dearest turn up ten minutes from now to yell at me for this), but my impression (note that word again) has been that he's a little quick to dismiss the entire blogosphere as containing nothing but froth and "noise" that interfere with the Serious Work of search-enginery.

#99 ::: Michelle ::: (view all by) ::: January 14, 2004, 11:23 AM:

Patrick (et al),

If you find the code for the "forced preview", will you please let the rest of us know?

Please?

#100 ::: sennoma ::: (view all by) ::: January 14, 2004, 11:27 AM:

It occurred to spouse and I that you could just html-comment out the "post" button on the initial comments screen if you wanted to force a human user to preview (and I think I will do that), but if I understand the issue that would have no effect on a bot going after the comments.cgi script. If that's correct, no amount of force-preview is going to work if the bot can sniff out the names of your cgi scripts because one of them is going to have to post to the blog.

I think I will turn off comments on posts 30 (15?) days old, and maybe rename comments.cgi; if enough MT users do this, and use different alternative names for comments.cgi, the spammers' numbers game becomes significantly less rewarding.

I don't want to use any kind of log-in or email verification if I can avoid it, because I like casual comments. What I would like, though, is a CAPTCHA system which puts the text-graphic next to the comments box, so a poster only has to type a few extra keystrokes before hitting "post". That would not be too much trouble even for a lazy bastard like me. Does anyone know how I might go about acquiring such a thing?

#101 ::: Teresa Nielsen Hayden ::: (view all by) ::: January 14, 2004, 11:34 AM:

Rea, if you follow the links from the words "see previous round" in my original post, you'll get a good picture of the last round. These things come in waves. This latest one seems to have targeted fewer weblogs than the last one, though there've been more hits per weblog.

Faren, a lot of us wonder the same thing. How often do you see legitimate businesses advertising via spam?

#102 ::: Tina ::: (view all by) ::: January 14, 2004, 11:43 AM:

(Patrick wrote about Mike Orlowski:)
...that he's a little quick to dismiss the entire blogosphere as containing nothing but froth and "noise" that interfere with the Serious Work of search-enginery.

Right, because there are absolutely no actual web pages that aren't, for the most part, froth and noise. Only blogs.