Akismet Works For Me

I wanted to follow up on my previous post about dealing with blog spam.

I installed Akismet and it has worked flawlessly. It has caught dozens of blog spam posts so far. I have had no false positives and no false negatives yet (i.e. it has correctly identified every comment). The only downside is that for it to be really helpful you have to have 100% confidence in it. You don’t want to have to check the “Akismet Spam” list every day.

I guess the unfortunate thing about computers being so great at automated tasks is that a true test of whether or not a post is from a Human or a Computer is really hard. I had previously thought of things like JavaScript challenges and the like to try and stop auto-spammers, but the more I think about it, the more I think that doesn’t make sense. In the end, the requirement is not whether the comment was automated by a computer or entered by a human, the test is whether or not the comment is Spam.

So, give Akismet a try if you don’t want to deal with comment spam all the time. I think you’ll be impressed by how well it works.

What would Turing think of all this?

How Do You Deal With Blog Spam?

I assume that many of my readers are technical people and as such are likely to be bloggers themselves, so I post the question to you: How do you deal with blog spam?

I’m currently getting on the order of 30-40 blog spam posts a day. While WordPress does a good job of catching them, and I can mark them as spam (so no one ever seems them), like email spam, I’d prefer not to have to deal with them at all. In the past I’ve tried reporting blog spammers based on IP addresses to their hosts, but that’s even more work and it seems to have little or no effect on the volume.

Do that many people need Viagra? I didn’t even know what Hoodia was until I looked it up? (Oh and if you use any of these terms in replys, you’re going to get caught in the spam trap. So don’t.)

I’ve considered using a CAPTCHA but have not found a good plugin that will do it for me, and I generally find them annoying. Registration seems like too high a barrier to entry for casual posting. I’ve seen, but haven’t tried Akismet which is service that checks out your comments before it posts them. Anyone use Akismet? Does it work?

Are we stuck with blog spam? What other things can I do so that I don’t have to deal with these annoying posts? (Oh the irony, we’ll see if I get spam attempts before I get real comments.)

Thanks for the help…

Update:
In addition to CAPTCHA, I’ve heard of people doing things with JavaScript under the idea that the automated blog spammers don’t use tools that understand JavaScript. Some people have the actual form submit happen with JavaScript. Another option is to have JavaScript do some simple algorithm for the user and check the result. That’s the idea behind
Hashcash. I haven’t tried it, but it sounds like an interesting idea.

So, CAPTCHA, Hashcash and Akismet … and no more spam!

WordPress Code Formatting

I finally got tired of dealing with reformatting that WordPress does in its attempt to be “user friendly”. In general it does the right thing, but when you deal with code snipits inside tags a lot it can quickly become a problem.

I wanted to accomplish two things

  1. Have whitespace matter
  2. Not have WordPress add extra linebreaks or evaluate my >s as HTML

1. Is easily done with CSS


code {
white-space: pre;
}

This CSS will render the text including showing the spaces and line-breaks however they are in the source. This is just the right things for code.

2. Getting WordPress to not muck with code blocks

This on the other hand requires coding a solution. Luckily someone has done the hard work for us. There is an existing plugin called Preserve Code Formatting that handles this. Basically it looks through the HTML source of a posting and looks for and

 blocks. When it finds those blocks it removes all of the extra WordPress formatting and handles escaping HTML entity characters.

The other thing I was running into was that WordPress was “closing” things that looked like HTML. I ran into this when I was writing code snipits that contained Generics syntax.

I tracked that down to a writing setting in WordPress. Under Options -> Writing there is a checkbox that says: “WordPress should correct invalidly nested XHTML automatically”. When this option is enabled, WordPress will erroneously see certain things as HTML markup and try to create closing tags.

With this option selected I would get:

List

addresses;

Instead of the correct output I would get when I unselected the option:

List

addresses;

With the plugin in place, a bit of CSS and turning off one option, I can now copy-and-paste code snipits into WordPress and not have to deal with formatting.

Next step…syntax highlighting.

Update:
The other thing that I found in the functions-formatting.php file there is a method called ‘wpautop’. This method has a call to remove breaks from

 tags. So I copied the line and changed it to do the same thing to  tags.


$pee = preg_replace('!()(.*?)!ise', " stripslashes('$1') . stripslashes(clean_pre('$2')) . '' ", $pee);