Semalt: A Super Guide To Blocking Referrer Spam In Your Google Analytics

Nik Chaykovskiy, the expert from Semalt, assures that referral spam is one of the problems that webmasters currently face. The situation has been getting worse over the years, meaning that someone somewhere makes a lot of money from creating referral spam.

Ghost and Referral Spam

Spam has now made its way to Google Analytics reports. Spammers look for vulnerabilities in the system so that they can appear in the website's data reports. They do this with the of hope that they spark enough curiosity to the point that the webmaster visits their website to see why they are in the report. The problem is that they do not increase traffic. They do not even make it since they are bots. They use the JavaScrip tracking code used by Google Analytics to create a notification that there was a visit. They end up skewing vital statistics like bounce rates and other elements used to analyze engagement. It is imperative to block referral spam if one needs accurate data especially if they rely on it to make marketing decisions.

It becomes hard to block referral spam especially since the spammers work very fast, increasing the rate of spam hits as well as the sources. It means that webmasters need to improve on the effort they put in eliminating and blacklisting these sources. It is particularly troublesome to people who have new sites who do not receive much legitimate traffic. An increase in spam rates on such sites would present more skewness which might even be more than the daily hits it receives.

How Easy Is It?

One page load records as a single visit. Ghost spammers use the Google Analytics tracking code and send traffic data straight to the reports, thereby forging a visit. It may take 0.001 seconds to load a single page on a server somewhere. However, they may have forced over 100 of these forged visits onto the Google accounts of many other sites all over. It is quite easy to buy a single host. As long as the spammers are sure of ROI, there is a lot of damage they can do with them.

Solutions that Come Up Short

Some techniques are sometimes so advanced that the solutions employed to block referral spam do not work. One of them is the mysterious online service called Darodar. The following methods did not clear it from GA.

  • The .htaccess file. It does not work since ghost spam does not touch the site
  • The referral exclusion list. It lacks updates.
  • Exclusion filters. It is outdated method since it only focuses on future spam and not retroactive for past spam databases.

The Exclusion filter almost came close to eliminating the Darodar referral spam. Its only limitation was that it does not have a constant and consistently updated referral spammer list.

The Missing Puzzle Piece

An actionable solution to identify and block referral and ghost data should be very updated, come from a broader database, and retroactive to past information. Based on the three elements for an optimal solution, here is one that works.

Step 1: Using Segments to Exclude Spam

It is better to use segments since they do not alter data permanently. If one accidentally filters out real referrers while using filters, there is no way of getting them back. It is possible to build on old data using segments, despite how long it has been there. One can also apply them retroactively.

Step 2: Maintaining the Exclusion List

Slack is a tool that webmasters can use to monitor referral sources. It notifies the user concerning any new referrals and gives them a prompt: whether to whitelist or blacklist a suspicious referral source.

1. Slack receives all referrals, and

2. It uses a PHP to sort all the results by order of count, and then loops the final list to the webmaster to see if any looks familiar. If not,

3. It forwards all the suspected spam to a slack channel which offers the user a choice between a whitelist or a blacklist. Whichever option they choose, it leads to step 4,

4. It redirects to a page that verifies the verdict as a selection confirmation.

5. Slack then stores and locks all identified spammers in the database

6. The final display of clean data will be in regex format. Copy and paste it in Google Analytics.

Slack allows the webmasters to update the exclusion list at least five times a day.

In Reality, Several Solutions Can Work:

Despite this being a proven method, it would work even better if the webmaster supplements it with other techniques, just to make sure they cover all bases. In addition to the said solution:

  • Click on the checkbox that prompts Google Analytics to exclude known bots and spiders,
  • Apply an "include hostname filter,"
  • Use cookies

The inclusive filter mentioned above is efficient sometimes, but not the best solution in the long run because:

  • Hostname spoofing is not difficult to do, and analytics spammers are increasingly using it as a vulnerable.
  • If the setup is wrong, it might end up filtering out real referrers.