Semalt.com – What it is and How to Stop it Skewing your Stats

May 1, 2014

in Digital

Post image for Semalt.com – What it is and How to Stop it Skewing your Stats

What is Semalt.com?

I recently noticed referral traffic from a domain known as “Semalt.com” inside many of my Google Analytics accounts. After some research I discovered that the site appears to be some sort of rank tracker that gets its data by crawling websites. Unfortunately, the crawler is set up in such a way that it skews Google Analytics data inflating figures, increasing bounce rates and whatnot. This obviously = bad.

Why is this happening?

Having never used or even heard of the service before, I figure they’re either attempting to crawl the entire internet (good luck) or someone’s added some of my sites in their own accounts as a competitor. Whatever the case, it’s really annoying to have skewed data. Typically this type of software works more intelligently and therefore doesn’t skew data – not the case with this one.

The Solution

There are a few ways to eliminate the domain from your crawling your website and therefore preventing skew of your data. Here are the easiest to implement:

  1. Add an “Exclude” filter in Google Analytics.
  2. Block the “semalt.com” referrer in your .htaccess file.

Both are relatively simple solutions however if you haven’t had much experience with server configurations, it’s probably best you go the Google Analytics route.

Exclude semalt.com as a referrer in Google Analytics

Screenshot of excluding the semalt.com domain referrer in Google Analytics.

Exclude semalt.com in GA

  1. Go to Admin > Filters > + New Filter
  2. Name the filter “Exclude semalt.com”
  3. Select “Custom” as the filter type
  4. Select “Exclude”
  5. Select “Referral” as the filter field
  6. Enter “semalt.com” as the filter pattern
  7. Add the filter to the selected view and hit save.

Block semalt.com in your .htaccess file

Add the below code to the .htaccess file located in the root directory of your website. If you’re not experienced in this area I strongly suggest you refer to the Google Analytics method above.

RewriteEngine on
# Options +FollowSymlinks
RewriteCond %{HTTP_REFERER} semalt\.com [NC]
RewriteRule .* - [F]

Unfortunately there is no way to “undo” the damage that is done but using the above methods is a sure fire way to protect yourself from this dodgy crawler in the future.

{ 24 comments… read them below or add one }

Ian May 2, 2014 at 8:55 am

Nice one. Ive also just noticed a referral traffic source “sh.st” popping up as well – it’s a URL shortening service. Exclude it for cleaner analytics!

Reply

Matthew May 2, 2014 at 8:12 pm

Thanks Ian!

Reply

Simon May 2, 2014 at 3:10 pm

Also blocking link analysis user agents that are nothing but a drain on your resources is a good idea. Simple enough to do in htaccess with something like this:

# BEGIN

RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} ^rogerbot [OR]
RewriteCond %{HTTP_USER_AGENT} ^exabot [OR]
RewriteCond %{HTTP_USER_AGENT} ^MJ12bot [OR]
RewriteCond %{HTTP_USER_AGENT} ^dotbot [OR]
RewriteCond %{HTTP_USER_AGENT} ^gigabot [OR]
RewriteCond %{HTTP_USER_AGENT} ^AhrefsBot
RewriteRule .* – [F]

SetEnvIfNoCase User-Agent .*rogerbot.* bad_bot
SetEnvIfNoCase User-Agent .*exabot.* bad_bot
SetEnvIfNoCase User-Agent .*mj12bot.* bad_bot
SetEnvIfNoCase User-Agent .*dotbot.* bad_bot
SetEnvIfNoCase User-Agent .*gigabot.* bad_bot
SetEnvIfNoCase User-Agent .*ahrefsbot.* bad_bot
SetEnvIfNoCase User-Agent .*sitebot.* bad_bot

Order Allow,Deny
Allow from all
Deny from env=bad_bot

# END

Reply

Matthew May 2, 2014 at 8:12 pm

Thanks Simon.

Your example works for bots that send proper identifiers (such as those you’ve listed) and as such don’t skew your Google Analytics. Unfortunately the Semalt bot identifies itself as a “Google Chrome” so unless you block all “Chrome” user-agents (not recommended!) your suggested method won’t work 🙁

Reply

Conrad July 2, 2014 at 8:52 pm

Thanks for the helpful informations. All our sites in Germany are polluted by semalt.

Reply

Matthew July 2, 2014 at 11:50 pm

Glad to be of help Conrad!

Reply

Nataliya July 25, 2014 at 8:12 pm

Let me tell you about Semalt.
Semalt bots harvest statistics for web analytis service and cause no harm. I don’t think this can be an issue, since nobody complains on bots that belong to Google, Bing and other search engines. Semalt crawler bots have 100% bounce rate and don’t click on advertising banners (cpc, cpa, cpm systems) or extend links. All the visits are automatic and random.
If you want to exclude your site from Semalt database, please follow this link: http://semalt.com/project_crawler.php

Reply

Matthew July 26, 2014 at 6:55 am

Hi Nataliya,

My post outlines how the bot causes harm. Nobody complains about other bots because they do their thing without skewing data.

The solutions I’ve provided work fast and well. The link you provided just sends users to a page where they can submit a request for removal with no way of knowing if it truly has been removed from your “database”. Even the “Crawler Check” you provide doesn’t do anything. When I provided obscure URLs as a test it said they had successfully been blocked:

http://semalt.semalt.com/crawler.php?u=http://www.fbsafbaskr1421.net
Obscure URL Check Semalt

On behalf of the digital marketing community – please setup your crawler correctly or leave our sites alone.

Reply

Ivan Rebrov February 26, 2015 at 2:17 am

This is BULLSHIT! Don’t interact with semalt.com CRIMINAL GANG! Don’t visit their website! They are CRIMINALS! Semalt.com wants your hits on their website, they are sending Trojans, spam and other hell. semalt.com is a CRIMINAL GANG!

Reply

Matthew March 27, 2015 at 11:55 am

They’re not the friendliest bunch. Block them, tell them to have a nice day and be done with it 🙂

Reply

David August 6, 2014 at 8:59 pm

Nice work Matt great to see some Australian marketing folks on top of it 🙂

Reply

Matthew August 8, 2014 at 1:57 am

Thanks David! Surprised it’s still going on 🙁

Reply

Diego Elio Pettenò August 6, 2014 at 11:53 pm

For what it’s worth, I fixed this with ModSecurity, and also got my share of threats/pleads from them on twitter 😉

Here’s my post, and the link to the ModSecurity ruleset that I used to fix this. It’s not *perfect* yet (because referrer coming from a site using the full domain in the URL will also be filtered out), but does the job: https://blog.flameeyes.eu/2014/08/antibiotics-for-the-internet-or-why-blocking-semalt-crawlers

Reply

Lariba August 8, 2014 at 5:18 am

I have tried from cpanel through block ip. But still crawling my site and give a number of bounce rate. I think i have to try yours.
Many many thanks

Reply

Matthew August 13, 2014 at 2:59 am

Glad to be of help 🙂
Let us know how you go!

Reply

Paul Middleton August 9, 2014 at 5:24 pm

You might also want to filter:

1. youtube-downloader.savetubevideo.com
2. musicas.kambasoft.com

These do largely the same thing and I think May even be controlled by Semalt.

Reply

Matthew August 13, 2014 at 2:58 am

Thanks Paul!

I haven’t seen these ones show up yet – have they been effecting many of your sites?

Reply

Angry Webmaster August 17, 2014 at 1:07 am

From the way that log entries for savetubevideo.com and kambasoft.com have appeared, those two domain names are almost certainly from the same criminal enterprise known as Semalt. Those two domains and semalt.com are all registered at the same domain-name registrar, which is another strong clue in favor of Semalt being the culprit. Also, Semalt’s slimebag botnet operators are now using numbers as subdomain names in a further attempt at evading filters. It’s a damn shame that we can’t hire some goons to go to the Ukraine to break their knees.

Reply

Christophe August 18, 2014 at 7:16 am

Thanks for these tips, I’m currently polluted by this kind of crawl (semalt, youtube downloader…).
It will help me for filtering such stats…

Reply

Dayna August 27, 2014 at 11:07 pm

Thanks for this info.

Just going back to Ian who mentioned the url shortening site sh.st – is there a way of blocking a shortened url in the .htaccess? A series of sh.st urls have been set up going straight to our sample pack forms and massively skewing our form fills!

I’ve tried:

RewriteEngine on
RewriteCond %{HTTP_REFERER} sh\.st [NC]
RewriteRule .* – [F]

Without success… any ideas please?

Reply

Will September 14, 2014 at 3:08 am

Any way to filter / stop them on a hosted domain such as Homestead.com ? That’s were we host our site and the traffic report is mostly semalt.com

Reply

Ed September 18, 2014 at 2:15 pm

I used to have Semalt.com and saveyoutube.com showing up in my referers at least once a day. I modified my .htacess file and it seems to have completely stopped them. These are a couple of the lines I’m currently using:

SetEnvIfNoCase Referer semalt.com spammer=yes
SetEnvIfNoCase Referer savetubevideo.com spammer=yes

Order allow,deny
Allow from all
Deny from env=spammer

Reply

Keith February 10, 2015 at 2:06 pm

I’m wondering if the GA method actually stops them or just hides them from appearing in the reports.

Some of the smaller websites I manage get the majority of their traffic from these sort of sites and they are showing up in their SEO reports, I am unsure if the GA method will simply hide it or actually fix it.

I can get someone else to use do the htaccess method for me, but if it’s just as effective to do it myself via GA, then I’ll do that.

Thanks

Reply

Matthew February 11, 2015 at 11:53 am

Hi Keith,

The GA method will only stop the data from being reported within Google Analytics – Semalt.com will still be able to crawl your website. If that’s a concern then I’d definitely get someone to setup the .htaccess for you and block them completely!

Good luck!

Reply

Leave a Comment

Previous post:

Next post: