fíam

(rhymes with liam)

  • Preventing Spam

    May 13, 2008 at 03:00:48 CEST

    Latelly I've been hammered with a lot of spam in this blog, so I decided to implement something to prevent it.

    As I've previously mentioned, I don't like Akismet because it's too simple. It only tells you if they think the comment is spam, so the best you can do is skip writing comments to the database. It would be nice if it returned a probability, so you could act accordingly. For example, consider the following:

    • If spam probability is 50% or below, accept the comment.
    • If it's between 50% and 80%, present some validation method to the user. It could be a CAPTCHA or even something more simple like a message telling the user to resubmit the form before 30 seconds, since most of the spam bots wouldn't get that right.
    • If it's more than 80%, discard the comment.

    But Akismet can't do that, so I will never use it. My initial idea was implementing my own spam detection system but, since developing ffloat.it keeps me busy enough, that's not something I can do for now. However, after reading the suggestion from Scott Lawton and reading the page he mentioned, I found I could write something to prevent most of the spam in less than an hour.

    My approach uses two form classes, which you must subclass in your application. And that's all you need! Your forms won't even have any visual impact, since those two classes only introduce two hidden fields and the correspondant validation methods. The process is a follows:

    • When you create the form (empty or with data) you need to pass two new variables to it: the remote address which is requesting the page and an identifier. For example, in Blango I use the primary key for the entry.

    • The form encrypts the requester IP, the identifier and the current time using a stream crypher and your settings.SECRET_KEY as key and puts it in a hidden field.

    • The form adds a textfield (author_bogus_name) with a maximum length of 0 without label and with style set to display:none. Users won't see it, but spam bots will try to put something there.

    • Upon form verification, the hidden field is decyphered and the requester address and the identifier are checked for equality. If they match, a time verification is performed: if the user took less than 5 seconds for posting it (wow, too fast typing, isn't it?) or more than an hour (preventing bots for reusing the token in the future), the form won't validate.

    I know this method is not perfect, since a spambot could be instructed to circunvent it. But the game consists on being ahead of the spammers, and currently this technique will get you there.

    As for the code, it's currently commited to the Blango tree, in the file magicforms.py, but for your convenience I've made it avaible here. Let's see an example from Blango itself:

    Before:

    class CommentForm(forms.ModelForm):
    ...
    ...
    comment_form = CommentForm()
    if request.method == 'POST' and entry.allow_comments:
        comment_form = CommentForm(request.POST)
    

    After:

    from magicforms import MagicModelForm
    class CommentForm(MagicModelForm):
    ...
    ...
    comment_form = CommentForm(request.META['REMOTE_ADDR'], entry.id)
    if request.method == 'POST' and entry.allow_comments:
        comment_form = CommentForm(request.META['REMOTE_ADDR'], entry.id, request.POST)
    

    Just remember to use MagicForm if your form inherits from forms.Form and MagicModelForm if your forms inherits from forms.ModelForm. Note also that this code depends on PyCrypto (python-crypto package in Debian and friends).