Last month, I took a short break from my computer and went on a holiday. When I came back I was surprised to find that, while I was on the beach, Google sent quite a few people looking for underground Korean adult movies to my web log. I don’t know what is so special about the Korean illegal film industry, but considering that they also eat dogs there, it must be something very interesting to watch. I guess that you can find anything on Internet these days, but why were they looking for it on my web site? The answer to that question turned out to be another great example of why inputs should be sanitised no matter how unimportant.
I use WordPress for my blog, and so far I am relatively satisfied with it. As a very popular online software package, it does get attacked a lot and security updates are released every once in a while. My site was hacked last year and the bastards dumped a bunch of hidden porn site links in about twenty articles, which took me a few days to clear up. So I learned the hard way that when the admin console suggests an upgrade, I should take its advice. I also added a cron task to check for a few keywords in the database and alert me if someone starts advertising limb enlargement devices for free, and since then I had no real problems. That is, until my site became a hot spot for south-east Asian smut aficionados overnight.
My first guess was that someone was simply spamming me with fake referrer headers, since there was absolutely no reason why my web site would actually appear in Google’s search results for adult movies, Korean or with a different geographic origin. Web sites use request referrer headers to identify where the visitors are coming from. A web browser will send the address of the site where you click on a link to the linked web site, if you have not turned that off. It is not a 100% reliable mechanism to identify visitor sources, as some people turn that feature off and some browsers have bugs and send rubbish, but in general it works OK. With the recent surge in the number of blogs, a new kind of spamming started to take place online. Spammers send fake requests to web sites, putting the address of the web site they are advertising into the referrer header. The rationale behind it is, I guess, to make the site owners to click on the referrer link to see who is sending people to their web site.
I tried out the query on Google, just for fun, to be absolutely amazed that my site was the third on the list. Sure enough, my search page was there. I simply had to click on that to see what happens, and a few seconds later I was looking at a spam web site. My web logs showed a hit from Google again, but I was not looking at my site. Clicking on the “cached” link on Google led to the same outcome. I grabbed the page using wget, which definitely would not jump out directly, and there I found the words “korean underground adult movies”, but only after the “There are no results for…” phrase. More interesting, after that, there was a HTML image tag with “-1.com” as the source, and an onError event redirecting people to the spam web site. When the page loaded, the browser could not find -1.com to load the picture, and fired the onError event, sending the visitors from my web site to some place they could probably watch something more to their liking. Not a bad trick at all!
God knows how they got Google to index my web page with both their keywords and the redirection tag as a search phrase, but they did. And it’s not only my blog, there’s a few thousand other sites with the same problem. Search on google for “onerror freeimagew” to see them. The results containing </title> in the site name will probably redirect you automatically to the spam site.
The problem was that my blog just dumped out whatever people put into the search form when it could not find any relevant posts. The input string was properly sanitised before it was sent to the database, and WordPress generally cleans up all user submitted comments from hostile content, but it looks like they did not think of someone using the search form to hack the web site. In any case, I just changed the theme search.php file to print “Sorry, no posts matched your criteria” when there are no results, and that fixed the problem. A proper solution would be to strip out HTML tags from the search but I was too lazy to look for all the places where the phrase could be set.
In any case, this is one more example how important it is to filter and sanitise everything put in by web site users, regardless of how safe it may seem, and never ever printing it back on the web site without checking for potential problems.
Image credits: Sonja Gjenero
I'm Gojko Adzic, author of Impact Mapping and Specification by Example. My latest book is Fifty Quick Ideas to Improve Your Tests. To learn about discounts on my books, conferences and workshops, sign up for Impact or follow me on Twitter. Join me at these conferences and workshops:
How to get more value out of user stories
- Stockholm, SE, 16 October
Specification by Example Workshops