Varney opens the treasure-house of his knowledge….—Varney the Vampyre; or, The Feast of Blood
Would you like to look behind the scenes, to peer in upon the doings of the gnomes high in the glass-and-steel headquarters of Making Light? Come then, with me, to view their doings. Come to view An Hour In Spam.
The example that you see on the left (click it to see it at a larger, readable level) does not come from a particularly active hour. (The hours from three to six in the morning Eastern time can see many times this many spam attempts.) Let me go over what you’ll see.
On the left, the little square check box allows the gnomes to indicate which post or posts are to be operated upon. Choices include Publish, Unpublish, Delete, and Mark As Spam. Moving to the right, the little orange triangle marks these comments as Unpublished; Held For Moderation. (A green triangle marks a Published Comment, while a purple one marks Known Spam.) All that we are looking at here is the Moderation Queue.
Next, to the right, comes a block of text; the comment itself. Links aren’t shown here (nor are paragraph breaks, italics, blockquotes, and so on). Oftentimes the gnomes can tell just by inspection whether a comment is a spam or ham. They can check the box and do a group action.
Next line down on each comment, we see four columns. The first, on the left, is Edit. When the gnomes click there, they move to the editing screen (example to the right).
The second link is the Commenter. Sometimes it’s obvious that this is a spammer: Few people go around with names like Cheap Viagra No Prescription or Auto Scratch Remover. Sometimes, however, it’s a human-sounding name, like Caleb Hutchcraft or Marina Gordon. A click there brings up the Show All By screen.
The third column shows the name of the thread where the comment appeared. That link goes to the editing screen for each particular post.
The fourth column shows how long ago the comment was posted. That isn’t a clickable link.
The last column, on the far right, shows the IP address whence the comment was posted. This is a live link to a Show All posted from that IPN. This is less useful than it could be: Nearly all spam is posted from compromised addresses.
After we’ve checked the boards for posts that are labeled Spam by the commentariat, and those spammish posts are Unpublished, the next thing that the gnomes do is go to the moderation queue and start reading the comments one-by-one with the Edit link.
The gnomes have a pretty good memory for prose, and a decent eye for patterns. As each post is examined, they look for patterns in the e-mail addresses, in the commenter’s names, in the URLs of the websites being advertised, and in the text of the comment itself.
When a real comment from a real person appears, the gnomes instantly publish it, after checking to see which filter was triggered that moved the post to moderation. Sometimes, it’s a filter that they’re not going to remove (e.g. malformed URLs) because, even though those filters occasionally hold up a real person, they tend to stop dozens if not hundreds of spam posts every day. Other filters which less-often stop spam are removed.
The other posts — the ones where the filters didn’t stop the spam — the gnomes use to build new filters.
The way that works, the gnomes find what look like key phrases. They Google those phrases, to see if they mostly show up in spam comment posts (e.g. “center to heart”). Then they look at the phrases immediately before and after the key phrase.
Spammers have gone to mad-libs style comments. I suspect that the word-and-phrase lists they use are either comma-delimited or single-quote delimited, from the bizarre ways in which commas and apostrophes are used in many spam comments. A comment with no space after a comma, or one with no apostrophe in a standard contraction, is, more than nine times out of ten, a spam comment.
Let me show you what a filter looks like:
/a (useful|informative|helpful|educational|benificial|beneficial|) (and|along with) (funny|interesting|amusing) (publication|write.?up|essay|post|article|submitting|submission|script)/i
One of the words in each set of parentheses, separated by vertical bars, is substituted into each slot. Thus, that filter will stop “A useful and funny publication” or “a helpful along with amusing article” or any other combination you can build out of that list. The “/” character tells the filter where the phrase starts and stops, the small-letter i after the second slash means that both small letters and capital letters will trip the filter. And the .? mark means that any one character will match: write-up or write up or writeup will trip the filter.
I regret that comments on the sorts of things that the gnomes gnome is likely to get that comment gnomed. But … the gnomes will release those comments soon enough.
So ends our tour of one of the floors in the landmark glass-and-steel tower. Please stop by the gift shop on your way out.