The Anti-Captcha Challenge

Recap: the problem with current Captcha solutions

The general purpose of Captcha’s are to prevent the automation of form submission. For example, to protect a guestbook from filling up with spam-entries or to prevent hundreds of bogus users registering to a forum. Until recently, image-based Captcha’s have been a reasonable solution to combat this problem. However, with Object Character Recognition techniques getting better and better, Captcha’s too have to continuously increase in complexity. Just look at these gems and imagine yourself being color blind:

Some everyday unreadable captcha’s

Ironically, it’s come to the point that computers are better at deciphering Captcha’s than humans are, simply because computers have infinite patience. To illustrate: evildoers trying to beat your Captcha are probably satisfied with a success ratio of 1/100 – because in just a few hours of repetition this can add up to hundreds of successful passes. A typical human user on the other hand probably throws in the towel after three consecutive failed attempts – at which point they’ll most likely leave your website altogether.

Who can blame them? The average user doesn’t understand why they should enter a random string of letters in the first place. It’s not their problem and they do not care what it is for. For them it’s some sort of annoying puzzle that stands in the way of doing what they want to do. Not being able to pass it, makes them feel inadequate and frustrated.

The Anti-Captcha challenge

The basic idea behind it is simple.

“Create a captcha solution which does not require any end-user interaction”

As a first attempt, I have concocted a working Anti-Captcha based on the reasoning that only browsers can interpret javascript well. Making it a question of “Has a browser been involved at form submission?” instead of “Has a human been involved”. In general the answer ends up to be equal (note: see “Caveats” section below).

Check out the online demo.

How it works

In the head of the html document an external javascript-file is called, this file is in fact a php file which is designed to:

  1. Generate a random token
  2. Store a checksum of this token in a cookie
  3. Generate some obfuscated javascript code which (when interpreted) adds a hidden input-field to every form element on the webpage using the token as a value

After form-submission, the checksum of the post value should equal the checksum stored in the cookie. As a bonus, this technique should also provide adequate protection against XSRF.

Installation

Download Anti-Captcha and put it in the head of your html document:

<head>
    <script src="anti-captcha-0.3.js.php" type="text/javascript"></script>
</head>

After form-submission match the input value with the sha1 checksum stored as a cookie:

<?php
    // Verify the token using the checksum stored in cookie
    if (sha1($_POST['anti-captcha-token']) == $_COOKIE['anti-captcha-crc']) {

    // Reset token (preventing form resubmission)
    setcookie ('anti-captcha-crc', sha1(rand()), time() + 3600, '/');

    // Continue form validation
    die('Captcha accepted');

} else {

    // No Anti-Captcha checksum received
    die('Error, please enable javascript and/or cookies');

}

Looking for the WordPress plugin? Click here

Requirements

The Anti-Captcha script is written to be PHP4+ compatible and should run on most hosting platforms. It has been tested and verified to work on most browsers, including the dreaded IE6. Note: the user does need to have javascript and cookies enabled for form submission to succeed.

Caveats

Obviously this technique isn’t perfect, at some point bots might gain the ability to interpret javascript or simply read-out the obfuscated code instead. At that time a different approach, with a similar concept, would be needed. It should also be possible to fool the Anti-Captcha with the use of “automated mouse-clicking software”. However this should be very hard to combine with botnets – thus making additional security layers (for example: maximizing form-submission on a per-ip basis) more feasible. Another major drawback is the need for javascript to allow form-submissions, which is something you should ponder over yourself. Personally I feel it outweighs the disadvantages image-based Captcha’s bring in, but this probably depends on the project at hand.

Credits

Part of the obfuscation technique used is based upon Dean Edwards JavaScript’s Packer which is ported to PHP by Nicolas Martin, and made compatible with PHP4 by Mark Fabrizio Jr.

License

The Anti-Captcha is licensed under LGPL 2.1

Why wouldn’t you combine the two technologies? Show a capcha if someone doesn’t have javascript enabled (he’s probably used to being mistreated on the net as loads of websites only work with JS enabled), and use the above technique if JS is enabled.

BTW there are more capcha-like methods: I’ve seen a website where you’re being asked to click the biggest circle in an image. WE can all do this, a spambot needs to dig into language interpretation to know what the assignment is. Just a random thought.

Basically you’re using a nonce (http://en.wikipedia.org/wiki/Cryptographic_nonce), but you’re injecting it into your form through javascript instead of just slapping it into your html which is more common practice. I’m sure it works quite nicely, but for me personally requiring javascript for form submissions is a bit too much …

On forms that attract a lot of automated submissions I sometimes add an empty input field and hide it from the users through CSS. When the form is submitted I check if the field is empty on the serverside. If not I reject the submission. It’s probably best to add a ‘please leave this field empty’ comment next to the input and give a descriptive error message when the form submission is rejected.

It sounds really stupid, but it’s quite effective!

I like your idea! It’s simple but effective :)

I think it`s a good idea these captcha`s drive me insane!

@Josse @4rn0 It should be possible to combine a regular Captcha with the Anti-Captcha (making it a whole lot more complex). Tho, I wonder – who actually doesn’t use javascript? Google obviously, but they don’t need form submissions. In fact, the entire website could still be build with unobstructive javascript in mind.

Second, I agree that there are some excellent concepts out there to seperate humans from bots. However in my mind, that’s not the real question. Captcha’s are ment to prevent the automation of form submission, not to validate the submitted data itself. I.e. a spammer can crack any captcha by hand (including the anti-captcha), this can not be prevented. So I argue, let’s not bore the end-user with Captcha’s at all.

Comment spam is a tough nut to crack. You’re doing a very worthy project.

However, I’ve seen another Javascript based approach, WP Captcha Free,
http://wordpress.org/extend/plugins/wp-captcha-free/

How does it compare with your approach? Also on the plugin page, iDope the author discussed the comparison with WP-SPAM FREE, which is cookie based. http://wordpresssupplies.com/wordpress-plugins/captcha-free/

Well none of these are perfect, only one step ahead of the bots. If all the techniques are combine, we’ll be a few steps ahead.

What do you think?

test with javascript enabled

if this show, than it works

yes it works
i will use it now :D

are anti capcha will work in mobile browser??

@ibnux Good question, if the mobile browser supports javascript and cookies it should work. Try it out and let us know!

@Jiwei Wang By the looks of it, the “WP Captcha-Free” is quite similar in idea and technique. Some difference with the anti-captcha plugin: it labels rejected comments instantly as spam instead of removing it, additionally it also protects the login/register/password-lost forms of your blog and with the use of obfuscated javascript it’s more difficult for spammers to automatically read-out the javascript and insert the hash themselves. Finally I find it strange that iDope uses a completely different (visible) captcha on his own blog, why is that?

Let me answer that. I also wrote the Clickcha plugin I use there and it got the space since its the newer one.

BTW, do check out Clickcha ( http://clickcha.com ). Its new kind of Captcha.

One more thing, WP Captcha-Free doesn’t need obfuscated javascript as the hash is never stored in the script. It gets the hash from the server (using ajax) only when the comment is posted and it expires soon after.

Of course this assumes that bots cannot execute JS. If they can run JS, obfuscation is not going to make any difference.

@iDope thanks for the clarification :)

Just for the sake of argument, wouldn’t it then be theoretically possible for a spammer to simply request the hash from server (using curl or wget) and then insert it the same way you do? No javascript required.

PS. I do like the idea and simplicity of Clickcha. However since it also requires javascript, I prefer an invisible technique like WP Captcha-Free or my own Anti-Captcha.

You are right, that is a weakness (although it is slightly mitigated since the hash is unique for the IP, post, etc. and expires shortly after). The problem with anti-spam techniques that don’t require human input is that the moment the spammer starts specifically targeting the technique, you are toast. You make it harder for the spammers by constantly changing things around but thats about it.

Clickcha doesn’t really require Javascript. Its just its current implementation as the WordPress plugin uses JS for some optimization (loading Clickcha only when needed).

How you actually salt the hash (IP, timestamp etc) doesn’t matter if you’re allowing the spammer to just ask for the end-result from the server. Solely for that reason Anti-Captcha outputs randomly generated and obfuscated javascript code to ensure that the bot actually HAS to interpret javascript. This makes it that much harder to crack, albeit not impossible.

You’re right though – as soon as they start to target your technique you’re practically toast. In my opinion not Clickcha nor any other captcha technique is safe from this. Just look at how difficult image-captcha’s have become.

With Anti-Captcha i’m trying to make a captcha solution which obviously works but is also user-friendly. You probably agree that all three of our solutions have that same intent. We do have the same goal – only with a slightly different approach.

The online demo complains with “Error, please enable javascript”. But I do have javascript enabled!

@Senthil Nathan What browser (+version) are you using? Your comment here also ended up as being spam. Are you sure javascript is enabled? Is it possible that you’re blocking cookies?

How about you generate a field and field label. The label says what to type in the field. For instance: Enter the number 1 in field. Script compares field value with what script put in label.

@David What you propose has been done numerous times before. In my view it’s not very user-friendly because you’re distracting your visitors from what they want to do (i.e. buy an item or contact you for more information). It therefor doesn’t comply with the basic idea behind this post “Create a captcha solution which does not require any end-user interaction”. Furthermore, it’s also much easier to foil.

I tried it with Internet Explorer 8 (64 bit version) and it returned the “Error, please enable javascript” message.

Javascript is definitely on, so it seems like a bug.

@IE8user Is it possible that you blocked the cookie? In my IE8 it works. Maybe the error message needs adjustments.

just a note: Using NoScript on FF 3.5.6/Mac. When fili.nl is ‘blocked’ the comment fails but no message is displayed to enable javascript.

Good work though. Installed the wp plugin – some botnet has been hitting my blog with hundreds of spam messages since Christmas eve.

Mike

Best anti-captcha module is used on JDownloader software to get rid of captchas on rapidshare, megaupload, hotfile like file-hoster web sites.
It is working very well but I could not find anti-captcha’s home page yet, if I find it I am planning to use it in my software to skip captcha problems!

the problem is, you can write a spamming bot which is using your browser so…

I have just got a blog and use about 50 different plugins. Thank you very much for your plugin. It complete my website

nice job thanks

really good job.. nice man,…

Will your anti-captcha prevent them from mining my contact e-mail address?

@Susie Sorry no, for that you’ll need another plugin

Typical case of security through obscurity. This is not a solution.

@a Strictly speaking you’re right of course, it sure doesn’t “fix” the internet. It does however cut down commenting spam on a WP blog by about 99% :)

ok, so this is really easy to beat with a bot that knows javascript.

in the demo, the following line of javascript will get around this.

“document.forms[0].submit();”

and if the bot only needs to find all of the forms, and then test a submit, you have failed.
it might be possible to obscure which form it is by adding a bunch of blank forms, but then one of them would need to be visible to the user. so you could search for the visible input fields and then return the forms that have visible inputs.

unless i am missing something, this does not solve the problem. I do think this would stop bots that ignore JS and just use post/get injection to create accounts or comments.

@jdavid Like I said before to the other ‘the glass is half empty’ guy, I’m aware that this technique is not bullet-proof. In fact I state that myself under the heading ‘Caveats’. In my view the Anti-Captcha provides as much protection as a regular captcha – without the annoyance to your visitors. Most spambots are not equipped with a javascript parser, simply because before now there was no reason to. In practice, the Anti-Captcha prevents most automated post requests – you can’t argue with results!

Sorry I didn’t understand how to include this on other projects. I downloaded it, installed both php-files on FTP and added the code into . What shall I do with 3. ? The code there ist incomplete, i.e when I include it to my form tha I receive a 500 error.

@D.C. It’s hard for me to say anything about anything without more specific information. Maybe some code or debug information? A 500 error could mean a syntax error – but that depends on your server config.

Admittedly, I have coded many bots for others to accomplish varying things from spamming, making money, winning contests, etc. Throughout which I have gotten quite adept at breaking securities. If you want security, I can give some tips as to what helps and what doesn’t.

1.No clear text. If it is not obfuscated uniquely on each form load, it is generally easy to break.

2.Enable no-cache + links expire after 1 use + randomize order of code in the body. This prevents the use of simple techniques for acquiring images to process such as loading from the cache, re-downloading, or simply copying by image ID. If you could randomize the surrounding text and placement on the page as well this would help, although that could make using the site very unfriendly to users. This tip obviously pertains mainly to image based solutions. “Lyla captcha” is considered unbreakable by many coders because it follows the first two points in this bullet, however it can be broken quite easily by anyone with knowledge of which dlls to call or controls to import.

3.Javascript isn’t safe at all… I have personally found it to be one of the most useful tools to crack securities.

4.Never transmit/receive clear text code for submits. If you can think of a way to apply regex or other parsing techniques to isolate any particular line of code, then it is very easy to identify and manipulate data. This I have no idea how to completely accomplish while still using a script the browsers legitimate users have would be able to handle.

5.Detect Get/Post calls in navigation. To mask user-agent data in the .net framework (used very widely for bot development) by default the navigation is done as a Post call. Regularly the postdata will consist of a single-line byte array with a 0 value. This won’t stop any pro, but it will stop the very large majority of coders out there. Obviously, detecting the integrated browsers used with many common development packages is necessary as well. Note however, that with a bit of knowledge and practice on the crackers behalf you will have no trace of evidence that the browser they are using is in any way different than a regular user.

6.Embeds are harder to break than javascript/html/any exposed script. Using a Flash based security makes more advanced knowledge of cracking necessary rather than the simple invokes/value setting someone can do in the first attempt at botting.

7.Use a system that involves image recognition outside of text/basic images. Microsoft’s Assira is a great example. This type of system is easier on a legitimate user than it is a computer unlike any other methods I have seen.

8.Accept that no system works. If a human can do it, with enough dedication to coding a computer can do it at least as well if not better. You very clearly have taken this to heart already though as you seem to be doing what you can to prevent legitimate users from even knowing it is happening. If legitimate users are driven away by your security then you might as well just go offline.

Oh, I also wanted to note but forgot too… I am actually partly color blind. That was part of the reason I initially began learning how to break securities as I have an incredibly difficult time reading many Captchas :P I just thought it was interesting that you mentioned that in your post.

em ?? anti captcha pro spam filter ?? :|

@SomeCoder First off thanks for your comments – I do appreciate a thourough and thoughout reply. In many points you are quite right and I won’t even try to rebuttal. One thing though, your entire focus is on security. I for one never would trust *any* captcha solution as part of a security measure and neither should anyone else. They are breakable and so is mine – this is not the point behind this exercise.

I feel the concept “Create a captcha solution which does not require any end-user interaction” is legitimate, however maybe my “solution” is not. In that case I challenge you to put off your black hat and to put on a white one – to create instead of destroy. It’s easy to bash something someone else has made, it’s harder to build something up.

Determining via javascript support whether it’s a bot or a human user is an interesting approach, but it has its problems. You are quite right that a lot of botfarms are as stupid as it gets, but there are plenty of plug-ins that work directly within your browser – and thus come with javascript support naturally.

Personally, I’ve used server-side measures rather successfully. You can safely assume that a human makes a HTTP request to the page first, before sending off data. In addition to that, humans need a certain period of time to read an article before making a comment, fill out a form before registering and so on. For instance, it’s be safe to say that a user takes at least 10 seconds between requesting the form and sending it off. If the first request for the blank form is missing or the time between fetching it and sending it seems way too short, you can assume it’s a bot. Making that assumption, you can sort out A LOT of the spam without any human-interaction. Not only that, but it’s cross-platform compatible and doesn’t rely on technology installed on the client – thus making it compatible with any browser and device you can think of.

Just stumbled across your script and had a dumb question after looking through the script before I try and install it on my site. How do I integrate this into being able to send the form submission to an email? I have a submit script installed already and wasn’t sure exactly on how to tie this in with it. Thanks fro any help! Cheers.

Jason

This plugin does the job. I was looking for a solution to prevent bogus bot registrations of which I was averaging a few per day, and I haven’t had a single case now in two months now.
But, there are cases where it is blocking a legitimate login. If a user is currently logged in to my site and then opens a window and goes to the wp-login.php page, the anti-captcha will not permit the login. I must log out and then refresh the wp-login.php page (just hitting back does not work) before it will permit a login. Before installing anti-captcha, there was no problem logging in again when already logged in (e.g. I might want to log in as a different username).
This is a membership site (we use s2Member) and this anti-captcha behavior is sometimes causing a problem for new members. After they pay, their WP role is changed, so they need to log in again to assume the new role to access restricted content. However if they don’t log out of their previous session first (still open in another window/tab), anti-captcha will not allow them to log in.

@Rich Thanks for your comment. I’ve been planning to take the login procedure out of the anti-captcha. It was a bonus when I first developed it, but there have been more reports that in some scenarios it can work against legitimate users. I have yet to reproduce the bug and thus have trouble debugging it. As soon as I can find some time I’ll release a new version without the troublesome login-procedure checks. Please bare with me.

Fili, I can reproduce this at will. You can email me if you’d like help. But yes, eliminating the login check would also work. If you prevent bogus registrations in the first place, bogus logins would seem to be a non-issue anyway.

@Rich I’ve committed a new version of Anti-Captcha to the WP repository. Please update your local install and let me know if any more problems emerge.

Thanks much for the rapid solution. My reproducible scenario no longer produces the login problem. So far, so good.

Hi Fili, now I included the code on another position and got no 500-error. The only problem I have is that, after sending a message from the contact form the following message comes, although cookie and JS are enabled: Error, please enable javascript and/or cookies. Any idea why ?

Hi Fili, now I got it. I included the code at the really end of the contact.form php, so after all code of the file. the place of including you code was getting a 500 error or a cookie problem error. So now all is running correct (I hope).
Thxs. for all.
Conny

Looks like there’s something wrong with the latest update. Now, any attempt to reset password is blocked by anti-captcha. URL: wp-login.php?action=lostpassword Help!

@Rich There is a new version out, this should fix your problems

Installed (Version 20110129) and it appears everything is now working as expected. Thanks!

very good.

I really appreciate your explanation of how this works. I recently had my website suspended because a Bot was taking up so much of the server capacity they had to shut me down to stop it. Fortunately the site does not generate any income or I would have been financially affected. I hope this plug in will help me and others with this Bot problem.

pretty successful system

thank you gooooddd.

Thanks for supplying this anti-captcha plugin for free. I’ve just installed it on my new blog and hope that it keeps the spammers out!

Keep up the good work!

Cliff

I would like to use this anti-captcha on my site and so grateful it is free. My site is new so it has not been hit by the spammers yet so I will load it before they find us!

Thank so much. I’m install this plugin in my blog…
It’s good security.

Most modern bots are indeed equipped with a javascript parser and run a real browser with a real cookie jar. Yes, simple scripts relying on curl or python’s urllib won’t work with this, but any decent bot isn’t coded this way anymore. This is because most modern sites use oodles of javascript and most simply won’t work when turned off. The bots can be written in watir/selenium (about any language), or phantomJS, or even using browser objects in microsoft .net suite of programming.

You’re not wrong. It’s about finding a balance between how hard it is for a spammer to spam and at the same time for a user to post something legit. From experience I’ve seen that most bots still fall within the Curl/Wget category. Maybe because it’s a lot easier to install a script then it is to a headless browser on a zombie machine.

Add a remark