Protecting your website from spam and misuse

4

If your a owner of website that is generally popular and your website features functions like commenting, then you are bound to get random spam comments that appear everywhere, other functions that are often targeted include contact forms, random bot users that register to your website (If it has some form of member system) and more functions that spammers generally attack. Spam bots love attacking the above features but this article will help you fight off the spam!

First of all if your clueless to what spam, leeching and other terms mean lets start with explaining these terms.

Spam – This can be in the form of a email, comment on a website, and even on Instant Mesenging clients. It is basically a unwanted message that can be doing on of the following.

  • Trying to scam you (e.g. Winning the Swiss Lottery, which by the way I have won about 100 times according to ‘SWISS LOTTO NETHERLANDS’ Company)
  • Advertising some form of product (e.g. Viagra is a popular one!)
  • Or just to generally annoy you and your spam filter, which eventually breaks down, cries and decides to let spam in rather than block it.


Nine times out of ten spam is sent by some form of bot.

Bot – A bot can be either a computer program or script which is programmed to send out spam to a thousands of email address everyday. It is all done automatically and controlled by the script that is running the bot. Computers while they make some day to day tasks really easy to achieve, they can also be programmed to be evil and thats what spam bots are evil. Not all bots are programmed for evil however, for example search engines use bots to crawl websites, which then will update search listing, there cool but spam bots aren’t.

Leeching – Leeching is becoming more common these days, it is an act where by someone takes multimedia that is hosted on another persons website and the posts it on another website but links off the other persons webspace instead of uploading the content to there own webspace. This raises bandwidth usage.

Bandwidth – Every webhost has a bandwidth limit, it is usually displayed on the frontpage of your webhost admin panel, depending on your webhost and the hosting package you chose, your bandwidth limit will vary to everyone else. Though what is nice is that your bandwidth limit is rest each month, so if used about half of your bandwidth by the end of the month it would go back to 0. Though if you do go over your bandwidth limit it usually means your webhost will suspend your account which then you can do the following:

  • Suck it up and wait till your bandwidth limit is reset
  • Buy additional bandwidth to raise your limit
  • Get a shotgun and go kill those annoying leechers that tipped your bandwidth limit over the edge (You didn’t read that here *wink* *wink*)


Okay so great I know what all of those fancy words you were using mean but what can I do to protect my website from all of that. Glad you asked. Lets start with spam.

The good news about spam is that it’s been around for a while now, so there alot of prevention methods for you to use, I’ll list some of my personal favorites

Akismet Spam Filter

akismet

Akismet is a very comprehensive plugin which is able to detect what is a spam comment or trackback while at the same time letting geniune comments through to your articles. Though it’s not 100% on catching spam it is about 95% so it’s pretty damn near. It is used widely on websites and is most popular with WordPress users but it also avaliable in 20 other platforms which include other popular content management systems such as Joomla and Drupal, so if you running a content management system and don’t have Akismet installed you might want to head over to the development page where all compatiable modules and systems are listed to see if you can give a big f**k you to the spam bots.

Go ahead and check out Akismet here

CAPTCHA Systems

Captcha example on Facebook

CAPTCHA is those weird looking letters which you may be told to put in a text field when filling out some submission form online. While it may seem annoying for them being there in the first place, it’s actually to authenticate yourself as a human being rather than some bot. Generally with websites that have some form of members area and have registration forms publically accessible and viewable e.g. Social Networking websites. Before CAPTCHA was around it was found that bots were automatically filling in web forms and registering accounts which weren’t actually created by a human this causes a build up of non-active accounts which simply sit there. This was why CAPTCHA was put in to authenticate people when filling out online registration forms. While the common CAPTCHA system is to enter letters displayed in a box, other methods such as adding sums or writing a word backwards can also be found. Though the CAPTCHA system wasn’t a instant fix to this problem. It was found that certain programs could actually read the outline of the letters that the CAPTCHA system sends out and then have a script input the outlined letters it’s read, this is why revisions of the CAPTCHA system were made to make it nearly impossible for a program to read the letters and the reason why other methods such as adding number together were used instead. You’ll find now on many websites a very strict CAPTCHA system is being used where by even a human finds it hard to read, but if spam is raising it’s game, so must websites.

reCAPTCHA (Free Service)

reCAPTCHA Example

The good news is though, that CAPTCHA systems are generally the norm and if you use some form of script or plugin that requires a form submission nine times out of ten you’ll find that there will be spam prevention in place. For people that use forms but have no CAPTCHA system placed in with it, it is pretty hard to intergrate systems that have not been designed for the script but there is one service which will intergrate with most forms and that is a services known as reCAPTCHA and is very simple to use. It requires minimal set up (literally only a few lines) and is also avaliable in plugin versions and can be used amongst popular services such as WordPress, but die hard website codists, this one’s for you too.

Bandwidth Leechers

My arch enemy. Seriously. If you have a website that is providing a resource for someone e.g. tutorials, videos etc, then you are bound to have your content leeched by someone, it happens to all over us, but it’s annoying. Not only are they techincally ripping content right off the wall but there are also making our bandwidth usage fly up like wallabies on Opium (True story, search it on the BBC) while it is mostly bots that simply copy the source code of a page that has some resource on and then posts it somewhere else, real breathing people also do it manually which begs the question should they be aloud to breathe? Anyway regardless of my personal views on Bandwidth leechers, your going to want to protect your content and making sure your bandwidth usage is stable. Here’s some tips on keeping your traffic up while keeping bandwidth down.

Hotlinking Protection

If you have images on your website especially if you use images alot in your content, you will probably want to look at preventing hotlinking. This is where someone will take a image from your website, stick it somewhere else but continue to link of your webspace. BASTARDS. Luckily however we can protect ourselfs from hotlinking and there a couple of simple methods to use.

Blocking right click on your webpages:

A quick and simple method, often blocking the right click function is to stop people from using the view source function in browsers but you can also use these types of scripts to stop people from being able to obtain the full path of your images, video or other media that may be featured on a page. There are many no right click scripts out there, here’s one thats made for images specifically because images are generally the victim of hotlinking:

<script language="JavaScript1.2">

/*
Disable right click script II (on images)- By Dynamicdrive.com
For full source, Terms of service, and 100s DTHML scripts
Visit http://www.dynamicdrive.com
*/

var clickmessage="Right click disabled on images!"

function disableclick(e) {
if (document.all) {
if (event.button==2||event.button==3) {
if (event.srcElement.tagName=="IMG"){
alert(clickmessage);
return false;
}
}
}
else if (document.layers) {
if (e.which == 3) {
alert(clickmessage);
return false;
}
}
else if (document.getElementById){
if (e.which==3&amp;&amp;e.target.tagName=="IMG"){
alert(clickmessage)
return false
}
}
}

function associateimages(){
for(i=0;i<document.images.length;i++)
document.images[i].onmousedown=disableclick;
}

if (document.all)
document.onmousedown=disableclick
else if (document.getElementById)
document.onmouseup=disableclick
else if (document.layers)
associateimages()
</script>

If you looking for just a general no right click script this one is also quite useful:

<script language=JavaScript>
<!--

//Disable right click script III- By Renigade (renigade@mediaone.net)
//For full source code, visit http://www.dynamicdrive.com

var message="";
///////////////////////////////////
function clickIE() {if (document.all) {(message);return false;}}
function clickNS(e) {if
(document.layers||(document.getElementById&&!document.all)) {
if (e.which==2||e.which==3) {(message);return false;}}}
if (document.layers)
{document.captureEvents(Event.MOUSEDOWN);document.onmousedown=clickNS;}
else{document.onmouseup=clickNS;document.oncontextmenu=clickIE;}

document.oncontextmenu=new Function("return false")
// -->
</script>

Note: I did not make either of these scripts, therefore I can’t help you if you have problems with either of them. They are both featured on Dynamic Drive and you should contact the author on there website.

Though these two scripts do there job well, there are work arounds and you can still access a webpages source code by going to View > Page Source (Or something similar to that) but these scripts should stop bots from completly ripping your content and automatically posting it elsewhere, though a human could probably get round them.

Hotlinking Protection (Using .htaccess)

Another effective way of stopping hotlinking is by using the .htaccess file that is located within your websites public_html (If you don’t have one you must create it manually) in the .htaccess file you can use a certain method which will block any outer source server using your images, which the option of displaying a custom image when a website is trying to hotlink your images. I’ve had great fun with using a custom image to display instead of the image that the person thought they were hotlinking. For example I used a simple 600 x 300 image which simply had “THIS PERSON IS A HOTLINKING ARSE” in very large text. I actually found a website that had completly ripped a tutorial of mine and seeing my custom image in place of the attempted hotlinked image was most amusing, it wasn’t long before the site owner quietly removed the post.

Before I show the code you can use however, it is important to mention that if a .htaccess file has data in already it is wise to leave it be and not move or delete it. If your running some form of Content Management System you will find that data maybe already in your htaccess file relating to your content system, so changing or removing any current code inside the file will almost defiantly bring your site crashing down to the ground with a nice 500 Internal Server Error. So when modifiying the htaccess file always be careful and back it up before you make changes if your not sure on modifying it. Anyway the .htaccess code to us is simply this:

RewriteEngine On
RewriteCond %{HTTP_REFERER} !^http://(.+.)?mysite.com/ [NC]
RewriteCond %{HTTP_REFERER} !^$
RewriteRule .*.(jpe?g|gif|bmp|png)$ /images/nohotlink.jpe [L]

This script stops any other website linking or display images that are hosted on your website. But if you having problems with a specific website you can also use this to stop specific websites:

RewriteEngine On
RewriteCond %{HTTP_REFERER} ^http://(.+.)?myspace.com/ [NC,OR]
RewriteCond %{HTTP_REFERER} ^http://(.+.)?blogspot.com/ [NC,OR]
RewriteCond %{HTTP_REFERER} ^http://(.+.)?livejournal.com/ [NC]
RewriteRule .*.(jpe?g|gif|bmp|png)$ /images/nohotlink.jpe [L]

If you want to have some fun like I did, you can set a custom image to replace attempted hotlinked images by editing this line:

RewriteRule .*.(jpe?g|gif|bmp|png)$ /images/nohotlink.jpe [L]

Or if you just want to screw someone over big time you can make them suffer the great 403 Forbidden error by simply changing the last line to:

RewriteRule .*.(jpe?g|gif|bmp|png)$ - [F]

- Code Examples taken from AltLab.com

There are a couple of important notes to take in here though, when using the htaccess file in this matter there is a chance you can block legitimate traffic as it is common that bots maybe on the same IP subset e.g. 82.105.11 as a general human, so there is always this consideration to be made, and do not use .htaccess to redirect image hotlinks to another HTML page or server that isn’t your own (such as this web page). Hotlinked images can only be replaced by other images, not with an HTML page.

Risky bandwidth leech protection: (Using .htaccess)

I say risky, because this method involves blocking a actual IP address, so you can imagine if you start blocking legitimate users then you are going to be hurting your traffic. But IP blocking is sometimes required when serious leeching is occuring, for example a couple of months ago James’ Blog suddenly was using 4GB of bandwidth a day and it wasn’t long before this got my account suspended twice (You can read about it here) it turns out that a certain IP Address was reloading most of .png images very frequently which of course was increasing loads and causing bandwidth to shoot up. This was overcome however by a Server Technician identifying this certain IP Address and then I proceeded in blocking the IP from any content, page etc what was part of the james-blogs.com domain. Again I used a code in the .htaccess file. Here’s how you can block IP Addresses in your .htaccess file,

order allow,deny
deny from 192.168.44.201
deny from 224.39.163.12
deny from 172.16.7.92
allow from all

This code basically means allow all IP Addresses to access the website but deny the following that are listed. You simply place deny from followed by the IP Address to block access. You can also use domains and even subdomains as well as IP Address e.g.

deny from evilwebsite.com
deny from annoying.evilwebsite.com

When denying a IP Address you don’t have use the full IP address you can actually block the entire IP subset which can be:

deny from 192.168.
deny from 10.0.0.

Though it is important to understand that blocking a entire IP subset could mean you block legitimate users too. It could be that a leecher shares the IP subset which legitimate users may also be on, so it’s best to pin point who exactly is leeching and just stop that certain IP.

Well there’s some methods of keeping your website spam free, your images not hotlinked and your bandwidth usage low. I hope this article was informative, interesting and will help you in the future!

Share This:

  • http://www.dalehay.com Dale

    Hey James,

    Cheers for that, I’ve been looking around mainly for a .htaccess leeching tutorial. One question though, can the “RewriteRule .*.(jpe?g|gif|bmp|png)” bit be for MP3, WAV, etc.. extentions too?

    • http://blog.jmwhite.co.uk James

      Hi There Dale,

      I don’t know if you can protect multimedia files other than images using a simple .htacccess code. I’ve seen some try it. The results have been it works but none of the media can be viewed or downloaded so It’s pointless, what you can try is looking around on your web host admin panel for any hotlink protection features. I know cPanel has some. Failing that you can contact yor website host for additional help.

      Sorry I can’t be more of a help. I’ll keep looking for any successful way to leech protect other multimedia.

  • Pingback: property and casualty insurance companies

  • Pingback: naughty mates