Back to previous post: The Girls of Dublin Town

Go to Making Light's front page.

Forward to next post: Welcome to the War, Year Four

Subscribe (via RSS) to this post's comment thread. (What does this mean? Here's a quick introduction.)

March 17, 2006

Google’s fighting comment spam
Posted by Teresa at 08:46 PM *

This isn’t brand-new news, but I’m finding that a lot of people haven’t heard about it, and it’s useful. From GoogleBlog:

Preventing comment spam

If you’re a blogger (or a blog reader), you’re painfully familiar with people who try to raise their own websites’ search engine rankings by submitting linked blog comments like “Visit my discount pharmaceuticals site.” This is called comment spam, we don’t like it either, and we’ve been testing a new tag that blocks it. From now on, when Google sees the attribute (rel=”nofollow”) on hyperlinks, those links won’t get any credit when we rank websites in our search results. This isn’t a negative vote for the site where the comment was posted; it’s just a way to make sure that spammers get no benefit from abusing public areas like blog comments, trackbacks, and referrer lists.

We hope the web software community will quickly adopt this attribute and we’re pleased that a number of blog software makers have already signed on…

Follow the link if you want more technical info. Here and now, a simplified version: Google’s implemented a tag you can tuck into the code you use to make a link. It looks like this. Here’s a standard link:
<a href=”http://www.jesshutch.com/robotmain.html”>Link.</a>
And here’s one with Google’s new tag:
<a href=”http://www.jesshutch.com/robotmain.html” rel=”nofollow”>Link.</a>
Any link with the “nofollow” tag in it won’t count toward the linked-to site’s popularity in the Google ratings, a.k.a. its Googlejuice. This does away with the chief benefit (to the spammers) of posting comment spam.

As I understand it, it’s possible to set things up so that links posted into comment threads, guestbooks, trackbacks, and referrer lists will have the “nofollow” tag automatically added to their code. Commenters will still be able to post links there, but there’ll be no incentive to post commercial links en masse.

Meanwhile, the tag can also be used to avoid upping the traffic (and thus the Google ratings) of objectionable sites one may have occasion to link to. For instance, Warren Whitlock: he’s getting no Google hits from Making Light.

It’s a good thing.

Comments on Google's fighting comment spam:
#1 ::: dan ::: (view all by) ::: March 18, 2006, 01:20 AM:

Excellent news, Teresa; thank you!

#2 ::: Keir ::: (view all by) ::: March 18, 2006, 01:57 AM:

I understand that Slashdot has used this property for links posted to the comments for quite a while. Along with the moderation system, it seems to work quite well at keeping spam down.

Given that Slashcode is freely available, I'd imagine that it wouldn't be that hard to do it on your own blog/forum/whatever.

Of course, the Next Big Thing for spammers is going to be wikis. Especially Wikipedia like wikis, that aren't quite so large.

Highly placed on Google searches, often linked to, and easily editable. Wikimedia's wikis'll be OK, given the size of the user base, but smaller wikis'll get hit hard.

(The above isn't my idea; but I think that it is very likely.)

#3 ::: Christopher ::: (view all by) ::: March 18, 2006, 02:09 AM:

It happened to a Lovecraft wiki I spent some time working on. Very irritating.

#4 ::: Reinder ::: (view all by) ::: March 18, 2006, 04:15 AM:

It's happened, a lot, to comixpedia.org, which had to disallow editing for unregistered users (and a few months later got spammed by bots that registered themselves).

#5 ::: Cory Doctorow ::: (view all by) ::: March 18, 2006, 06:39 AM:

I don't like this solution, because it assumes that there's never any pagerank data to be gleaned from a message-board on a blog-post. I think Making Light is the best counter-example: if you folks are discussing subject X and one or more of the comments link to page Y, I think there's a good chance that page Y is relevant to subject X and it's silly for Google to throw it away.

#6 ::: rhandir ::: (view all by) ::: March 18, 2006, 07:11 AM:

Cory's right. Making Light is a high value target both to spammers and to google. If I want to find a non-obvious connection between two pieces of data, this is the place to come.

Relnofollow is a band-aid. In the long term, being an anti-search engine is going to be more helpful to us. What I mean is that there are a certain percentage of sites I never want to go to: malware/phishing sites, and a larger percentage of sites that are just plain useless, such as splogs.

There is a kind of mirror mirror universe of a web that no one would want to visit. (Perhaps where the usual characters wear goatees?)

I'm not talking about the usual filtering stuff. I'm thinking of something more elaborate than a blacklist, or the usual bayesian filtering stuff. Exactly how you could distinguish anti-search behavior (visit once, never want to go to again) from low-interest, but useful sites (I only needed something there once) I'm not sure.

*sigh* I had a better-formed thought on all of this that I was going to share. I'll try again later today.

-r.

#7 ::: Serge ::: (view all by) ::: March 18, 2006, 07:45 AM:

Always make sure your agonizer is fully charged, rhandir.

#8 ::: Kate Nepveu ::: (view all by) ::: March 18, 2006, 08:48 AM:

This is a good place to throw news of two new MT 3.2 plugins to:

HM Passphrase, which appears to ask the commentor a text question ("What's Brad's first name?") and require a text passphrase ("Brad") before posting the comment. It's a prove-you're-human test that sounds really good to me since it doesn't require images or sound.

MT Hacks Blacklist 3.2: for people who want to use MT Blacklist *in addition* to the built-in Spam Lookup of MT 3.2.

I also don't like nofollow for the reasons Cory states.

#9 ::: James D. Macdonald ::: (view all by) ::: March 18, 2006, 09:05 AM:

Trackback spam and comment spam are all over the place. Even though we have extensive blacklists here, twelve of the last twenty comments posted to Making Light were spam -- for Texas Holdem Poker, as it happens -- leading back to various pages on blogspot.com. Because blogspot can have good content I haven't just blacklisted all of blogspot, but ... it's tempting.

We've turned off showing trackbacks entirely. But because they're necessary for the "other posts by" functionality, they're still listed internally. We get around fifty trackback spams per day.

In my unending quest to find Things To Do About Comment Spam, I found this: http://www.linksleeve.org/

It's an interesting idea ... an identical link posted to some threshhold level of different blogs would be automatically removed. But I have a couple of concerns. One is that half the links on that site are themselves broken. Another is that if there were some hard-hitting, fast moving news that suddenly struck bloggerdom (Bush Shoots Cheney, Rumsfeld, Self in Tense White House Standoff) with lots of folks linking to the news story all at once, would it be deleted?

And ... big objection ... the way it seems to work is the author of the software wants me to type in my email address, username, and password to a webform. To which all I can say is riiiiiiight.

So. Right now we have a lot of fiddly hand-work. And a Band-aid. If what I'm looking at is arterial bleeding, a Band-aid isn't much good, but if that's all I've got I'm going to use it.

#10 ::: James D. Macdonald ::: (view all by) ::: March 18, 2006, 09:17 AM:

Between my last comment and this came yet another comment spam for Texas Holdem, leading back to yet another blogspot.com page.

Some days I'm tempted to blacklist the entire *.info domain. The only reason I don't is that trilobites.info is a useful and interesting site. Think of it as the one just man in Sodom who's allowing all the rest to live.

We have Blacklist 3.2 running here. We'll have to see about Passphrase.

#11 ::: Serge ::: (view all by) ::: March 18, 2006, 10:12 AM:

For the longest time, my employer's email service blocked everything that came from the ***.videotron.ca domain. Why? Videotron is apparently used a lot by spammers. That meant being unable to write to my friend Nicole up in Quebec City. Unfortunately, when I used my alternate less-frequently-checked-upon Comcast address, HER filter flagged my email as spam.

America vs Canada... The World at War...

#12 ::: JohnD ::: (view all by) ::: March 18, 2006, 10:25 AM:

I'm no fan of comment spam, but I have to agree with Cory that rel="nofollow" is a less than ideal solution when applied unilaterally. Some blogging software now imposes it automatically, meaning that relevant links and meaningful commentary are discarded along with the trash. This discounts what bloggers and their commenters add to the 'Net.

I'm not comfortable with the fact that bloggers' opinions don't always count toward Google pageranking, particularly now that so much breaking political news spreads via the blogosphere.

#13 ::: TomB ::: (view all by) ::: March 18, 2006, 11:18 AM:

I think it's the search engine's job to explore the mirror, mirror universe, the dark side of the web. I don't even see what's so hard about it. Just follow the links. If they lead to malware or known shady businesses or scams, apply a large negative to the page rank.

#14 ::: James D. Macdonald ::: (view all by) ::: March 18, 2006, 11:59 AM:

Making Light occassionally links to known shady businesses and scams (see the "fraud" listed right on the top of the page as one of the subjects we explore).

Should Making Light have a large negative attached to its pagerank?

#15 ::: Hob ::: (view all by) ::: March 18, 2006, 12:00 PM:

JohnD: When you talk about "what bloggers and their commenters add to the Net", that little phrase equates two things that aren't the same for this purpose. Bloggers are the publishers/editors of the site; their words and links reflect what they want the site to be, and in most cases they only have that power on one site. Commenters are graffiti artists (or, in some cases, graffiti-stenciling robots) and they can post the same thing to dozens or hundreds of blogs. That's not to devalue their opinions - lots of us read blogs for the comments. But it makes sense for a search engine to prioritize unique content from authors who have some stake in their venue.

#16 ::: Teresa Nielsen Hayden ::: (view all by) ::: March 18, 2006, 12:26 PM:

Cory, I don't intend to code "nofollow" into links from my comment threads. I do think that might be appropriate for venues that get a few comments per thread at most, and I'd definitely recommend it for seldom-used or abandoned weblogs where the accumulated comment spam never gets cleared out.

#17 ::: Xopher (Christopher Hatton) ::: (view all by) ::: March 18, 2006, 12:56 PM:

sites that are just plain useless, such as splogs

[*]?

#18 ::: rhandir ::: (view all by) ::: March 18, 2006, 01:33 PM:

Two technical suggestions re: spam supression.

1. Automatically rate all posts by known good users 1, and unknown users 0. 1's are displayed, 0's only have the first 26 characters displayed (using dynamic css tricks), and the html is neutered. Possibly a number of clicks on the 0's topic lines by known good users acts as a voting system; when a threshold is reached, the full comment is displayed. Known good users might be simply identified by a randomly assigned serial number in a cookie. (No userid info needed.)

Not ideal, but for this stuff, inconvenice / delaying tactics / increased complexity for the autospammer seems to be a better solution than endless blackist maintainance. [See greymilter, greylisting in discussions on email spam elswhere on the net.]

2. Use gpg encryption/decryption plus autodisemvowling:
Assign a gpg key to a poster (using a cookie?). Encrypt each post, using the stored key value when the post is written on the "write post" page.

On the server side, decrypt the written post using the other half of the key. Legit posts don't get their keys revolked and can be read as usual, Illegit posts get their keys revolked, and are disemvowled after decryption.* Forged keys don't work. Revoke keys as needed using traditional anti spam tricks.

The advantage? Not having to mess with a database of userlogins, but still being able to retroactively cancel a set of posts. Regular users would be able to post without more than one moderation review as long as their cookies lasted.

-r.

*or rot-13. Or give the client a universal decrypt key so you can read the "unfiltered" page at anytime. This is cryptographically weaker, but prevents any data from being lost.

#19 ::: TomB ::: (view all by) ::: March 18, 2006, 05:05 PM:

Should Making Light have a large negative attached to its pagerank?

Not for links with rel="nofollow". Or maybe even rel="scam".

#20 ::: Clifton Royston ::: (view all by) ::: March 18, 2006, 07:05 PM:

The problem with Google's answer as a solution, is that it presumes that comment spammers will be just smart enough to perceive that this makes their efforts less useful and will simply give up posting to blogs that use nofollow. Based on the last 10 years of struggle with email spam, I'd say the more likely reaction will be to post 10 times as many comment spams to 10 times as many blogs and hope some of them are on sites that don't use it.

On one of the anti-spam mailing lists I'm on, we're just discussing comment spam and whether there's any relevance to email spam techniques. There is a lot of wheel re-invention going on in the anti-blog-spam world.

One of the problems with bringing email people into it is that there are a large chunk of people who simply don't get the blog idea and so don't see why this is a problem. "Why would you want to let random strangers post on your web site? Isn't that just inviting trouble?" Well, of course it's for much the same reason that one wants to accept mail from random strangers, in principle. But I digress.

At any rate, from the people who do grok both issues, two important techniques you can reuse from email spam prevention:

  • URI block lists: check all posted URIs (both in text or as hyperlinks) with a DNS lookup against the more reliable URI blocklists, such as those at www.surbl.org(QV) If I recall correctly, you can also use the Spamhaus list as a URI blocklist.

  • If possible block even web-server connections, and definitely block all post attempts, from IP addresses listed in the botnet- and trojaned PC-oriented DNS BLs. In many cases, these same networks of PCs are being used to send email spam and post comment spam. The CBL blocklist would be a good one to look at using.

  • Bayesian software seems to work for some, too.
Perhaps you're already doing all of this; if so, good for you. Unfortunately I don't know enough about blog software configuration to help you with wiring this all in to your webserver.

#21 ::: Clifton Royston ::: (view all by) ::: March 18, 2006, 08:24 PM:

Xopher: splogs = "spam blogs" - sites that are supposed to look to a search engine like blogs, but contain nothing but incestuous messes of links to each other, random text, and search keywords which they want someone to enter and stumble onto their site. Maddening hateful stuff when you are actually looking for information. Sepending on how well Google is doing in tweaking their algorithms, sometimes you'll wind up with several pages of these as the top results when you search for some popular keyword, but lately Google seems to have been doing pretty well at dumping them.

#22 ::: Clifton Royston ::: (view all by) ::: March 18, 2006, 08:27 PM:

BTW, Teresa: You currently are putting "nofollow" automatically into links posted in the comments. Take a look at the HTML source for this comment thread, for instance. I see nothing wrong with this, but your comment just upstream suggests that's not intentional.

#23 ::: James D. Macdonald ::: (view all by) ::: March 18, 2006, 08:36 PM:

rel="nofollow" in intentional, and has been in the comment threads here since about fifteen minutes after it became available as a MT plugin.

rel="nofollow" doesn't appear in the main posts except by deliberate design.

#24 ::: Karen Funk Blocher ::: (view all by) ::: March 18, 2006, 08:59 PM:

As a non-spamming Blogspot blogger, I hope you'll continue to resist the tempation to block that domain.

AOL Journals have had rel=no follow for many months, perhaps years. That hasn't prevented several massive comment spam attacks there.

Question: does no follow affect Technorati functions?

#25 ::: Epacris ::: (view all by) ::: March 18, 2006, 09:52 PM:

As another blogspotter, I know that it is having problems with many spamblogs of assorted kinds, as well as having to institute comment spam blocking, but it's also the better in quite a few ways of the (free) blogging spots I've tried. Hence my main blog is there, as you can see from the link here — though there is a backup LJ.

I had been considering looking at a paid site, but another serious medical problem has arisen, so funds are going to be short again.

#26 ::: Vicki ::: (view all by) ::: March 18, 2006, 11:32 PM:

Also in .info, two transit systems whose sites I find useful: mta.info is the NYC-area transit system (including MTA's commuter rail and bridges and tunnels) and mtr.info is the Montreal metro and commuter rail.

#27 ::: Terry Karney ::: (view all by) ::: March 19, 2006, 05:35 AM:

I am of a mixed mind. There are a lot of things in the comments that I go back to. I find them using Google.

From a purely selfish standpoint, I never got around to bookmarking the description of interrogation, and my venting on torture in Electrolite. I can still find them, but only by hitting from secondary links,

It's not so much my ego (though having my name be a high ranking hit is nice) but that I like that piece of writing, and I want it to be read as often as torture is praised.

But, on the flip side, comment spam is nasty stuff, and killing it matters. I just hope a better band-aid comes along.

TK

#28 ::: winna ::: (view all by) ::: March 19, 2006, 10:22 AM:

I just installed Scode for MT 3.2 and I'm pretty happy about it, especially since typekey was keeping a lot of people from posting. Now they just enter an automatically generated code along with their comment et voila!

Some comment spam mysteriously gets through, but it's one or two a day, not hundreds.

#29 ::: Ceri ::: (view all by) ::: March 19, 2006, 11:36 AM:

Vicki: Did you mean stm.info for the Montreal transit site? Mtr.info redirects me to a .de site, but I use the stm.info site all the time (Tous Azimuts is your friend).

#30 ::: Kevin Marks ::: (view all by) ::: March 19, 2006, 04:43 PM:

Yes, Technorati pays attention to rel="nofollow" (Heck, I wrote it's formal spec). In fact, there is a more general way to express your approval or disapproval called Vote Links which we supported before that.
At Technorati, rel="nofollow" is treated the save as rev="vote-against", ie the link is included in our searches for the linked-to URL, but it is not counted towards the Authority of the linked-to page (where Authority is our metric based on links from distinct blogs over the last 6 months, used for ranking and filtering).

#31 ::: Eve ::: (view all by) ::: March 19, 2006, 04:56 PM:

Google’s fighting comment spam

Doo-dah, doo-dah!

#32 ::: Mike Kozlowski ::: (view all by) ::: March 19, 2006, 11:01 PM:

It's not an observation original with me, but it's worth noting that rel="nofollow" solves Google's problem (spam links making search results less useful), but not bloggers' problems (spam comments making comment sections less readable and/or time taken to clean up comment spam).

I've been using nofollow for a while now, and haven't noticed any decrease in the amount of comment spam I get.

#33 ::: Kevin Marks ::: (view all by) ::: March 20, 2006, 12:04 AM:

Oops, what I meant to say was rel="nofollow" is treated as rev-"vote-abstain".

#34 ::: Skwid ::: (view all by) ::: March 20, 2006, 02:12 PM:

Jim, The Humblest Blog has been getting hammered by (presumably) the same blogspot spammers over the last few days. These arseholes are vicious. They're using an IP randomizer, so I can't block them that way. They cycle to a different blogspot "splog" every 6 to 8 posts or so, so blacklisting them only lasts a couple of hours at best. They put in only one link per comment, so I can't filter them that way. I don't want to block any comments related to poker and casinos, especially since I'm about to review Last Call, and I can envision several other circumstances in which those might be legit comment topics...

Whatever script they're using is persistent, and runs frequently, so much so that I'd say I've gotten 3 to 5 times my usual volume of comment spam in the past few days, and deleting them is a major pain in Greymatter (each comment takes 3 or 5 clicks to delete, depending on circumstances). If it keeps up through the end of the week, I'm going to have to blacklist blogspot just to stay sane, and apologize to my commenters up-front.

#35 ::: Christopher Davis ::: (view all by) ::: March 20, 2006, 08:23 PM:

I was getting hit by blogspot linkers, but the regexp -[a-z][a-z][a-z][a-z].blogspot.com caught 'em all. I hope it stays that way for a while.

#36 ::: nickk ::: (view all by) ::: February 05, 2007, 11:31 AM:

one thing that I see for your site is that it reveal the email address on the comment poster, do you think spam bot will be able to pick those emial address up and spam the user??

#38 ::: Teresa Nielsen Hayden ::: (view all by) ::: February 06, 2007, 12:10 PM:

Might, might not. Never wrong to point them out.

Yo, Nickk: are you real?

#39 ::: Xopher ::: (view all by) ::: February 06, 2007, 12:22 PM:

It was the url "dubdubdub dot eight coupons dot calm" that made me think it.

#40 ::: James D. Macdonald ::: (view all by) ::: February 06, 2007, 02:03 PM:

People who worry about that munge their addresses, or don't give them.

The URL is definitely spammish.

#41 ::: Spam deleted ::: (view all by) ::: June 06, 2008, 11:16 PM:

Spam from 89.113.78.6

#42 ::: P J Evans sees another ::: (view all by) ::: June 06, 2008, 11:44 PM:

I'd like some. Better yet, give it to the spambot.

#43 ::: Spam deleted ::: (view all by) ::: June 07, 2008, 04:35 PM:

Spam from 89.113.78.6

#44 ::: Serge sees valium spam ::: (view all by) ::: June 07, 2008, 04:40 PM:

What a bitter pill to swallow.

#45 ::: Spam deleted ::: (view all by) ::: June 08, 2008, 03:55 AM:

Spam from 89.113.78.6

#46 ::: Rob Rusick reports percocet spam ::: (view all by) ::: June 08, 2008, 06:54 AM:

Valium...

Hello?

Valium...

Who's there?

Percocet...

You can't fool me, you're the Land Shark!

#47 ::: Spam deleted ::: (view all by) ::: June 08, 2008, 12:21 PM:

Spam from 89.113.78.6

#48 ::: Spam deleted ::: (view all by) ::: June 08, 2008, 12:24 PM:

Spam from 89.113.78.6

#49 ::: Spam deleted ::: (view all by) ::: June 08, 2008, 12:41 PM:

Spam from 89.113.78.6

Choose:
Smaller type (our default)
Larger type
Even larger type, with serifs

Dire legal notice
Making Light copyright 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016 by Patrick & Teresa Nielsen Hayden. All rights reserved.