Axo Analytics

Keeping Your Web Analytics Clean and Accurate: The Fight Against Bots

Published on:

Written by:

Dimitri König

Category:

Technical
Keeping Your Web Analytics Clean and Accurate: The Fight Against Bots

Last week, I stared at a traffic spike for one specific site in Axo Analytics: around 99% bots, 1% humans, all from one country. Sorting out that mess is why I'm obsessed with accurate data. In Axo, I use all the tricks of the trade to keep the data clean and accurate while respecting user privacy to the max.

Common Headaches in Web Analytics

Here's a non-exhaustive list of what can screw up your data.

Bots (AI, Scrapping, Security Scanners, etc...)

All known bots (Googlebot, Bingbot, etc...) are filtered out by default. Also all known AI bots (ChatGPT, Claude, etc...) are filtered out by default.

Then there are not so well known bots, or even unknown bots. Some of them are easy to identify, some are not.

Testing tools (Lighthouse, Webdriver, etc...)

Some testing tools let you know they are bots, but some do not. This is almost an endless quest to find out what the latest tools are and how to identify them.

Opened pages in another tab, but never seen it

This one is easy. Axo Analytics does not track hidden pages. Only if the page becomes visible, it is considered trackable.

There are other Web Analytics tools that track hidden pages, but I just don't consider it useful. If the user never saw the page, imho it is not a valid pageview.

Malicious users trying to game the system

There are two kinds of malicious users: those who are really good at what they are doing and those who are not.

But even the not so good ones can often only be identified by their behavior and filtered out retrospectively. So it is important to have a good retrospective analysis system in place.

Real users behind VPNs, Proxies, Tor, etc...

These users are hard to identify, especially if they are a mixture of real users and bots. Simply blocking them by their IP or ISP is not a good idea, as it can lead to a lot of false positives.

Here it depends on how bad the data damage can be for the specific use case. In some cases, it is better to have a bit of noise in the data than to block real users.

My approach is to rather err on the side of real users, even if it means having some noise in the data. Filtering out noise can be done retrospectively.

Ad Blockers

Ad Blockers can block the tracking script and so prevent tracking at all. Especially Google Analytics is often blocked by Ad Blockers.

In this case there is not much that can be done, except to use a custom hostname, preferably on the same domain as the site being tracked, to serve the tracking script and collect the data.

The feature is already implemented, is being tested with beta users and will be available in the next major release of Axo Analytics.

What to do with very short visits without any interaction? (Like short peek with URL preview)

This one is hard to detect since the data indicates that there are many similarities between bots and real users who just quickly peek at a page. But the volume of such quick peeks is usually quite low for real humans but very big for bots. So Axo Analytics uses a combination of techniques to identify and filter out such visits.

Valid traffic from otherwise problematic sources (like IP Ranges known for hosting bots)

This is a very tricky one. Simply blocking such traffic can lead to a lot of false positives, as there are real users behind such IPs as well. Some Web Analytics tools simply block such traffic, others don't do anything at all about it. Axo Analytics uses a combination of techniques to identify and filter out such traffic, but also tries to err on the side of real users, and not block too much traffic.

In certain cases some filters will be automatically implemented for certain Countries, ISP and IP Ranges known for a high volume of bots.

For example: a couple of days ago I saw a sudden surge of traffic from a certain Country, with a very high bounce rate and very low session duration. The pattern just didn't look like real users. For a short period of time I completely blocked traffic from that country to that specific Site in Axo Analytics, but the traffic continues to this day. So: definitly bots.

In that case it's very hard to say how much real users are behind some of that traffic from that country. So I rather, painfully, block them all then have a lot of noise in the data.

Sudden surges of traffic due to marketing campaigns, social media, etc...

Most of the time these cases are easy to identify, as the traffic usually comes from known sources (like Facebook, Twitter, etc...) and the behavior of the users is usually quite different from bots.

Axo Analytics won't charge you with sudden temporary surges of traffic.

Solutions

Finding solutions that really work is hard, especially when you have to consider one big and very important constraint: Privacy laws and concerns (GDPR, CCPA, etc...) Axo Analytics is designed to be privacy-first, so all the solutions have to respect privacy laws, user privacy and not rely on invasive or unlawful tracking methods.

Here are some of the solutions implemented in Axo Analytics:

  • A combination of techniques to identify bots (User-Agent analysis, behavior analysis, IP and ISP reputation, etc...)
  • Tracking only valid pageviews (visible, interacted with, certrain real-user-verifications, etc...)
  • Firewall rules to block malicious traffic
  • Retrospective analysis to identify and filter out bots
  • Cloudflare for general bot mitigation and DDoS protection
  • Constantly update and improve bot detection methods

Reach out

If you have any questions or suggestions, feel free to reach out to me. And if you find your data suspicious, let me know. I can help you analyze it and find out what is going on.

I'm also planning on adding features to let users

  • report suspicious traffic directly from the Axo Analytics dashboard
  • filter out suspicious traffic themselves

After all, you know your audience best.

In closing

Looking at data logs can sometimes have some very strange form of meditative quality. Reminds me several scenes in the movie franchise "The Matrix", where Neo and others are looking at the green code flowing down the screen to somehow make sense of what is going on in the Matrix. Only that in this case it's not green code, but a lot of anonymized data points.

Avoid inaccurate data and compliance risks - thrive with clear, ethical insights.
Make informed decisions faster, keep your site speedy, stay compliant effortlessly, and focus on growth. Axo provides maximum data ethically for your online efforts.
Easy setup - Free 14-day trial - No obligation

Previous Article

Email Reports Have Arrived!

2025-09-05T00:00:00+00:00

Changelog

Email Reports Have Arrived!

Stay updated with the new email reports feature in Axo Analytics. Get insights delivered straight to your inbox.

Next Article

Going Live Release

2025-09-01T00:00:00+00:00

Changelog

Going Live Release

Axo Analytics is now live! Discover the features, improvements, and future plans for this cookie free and privacy focused web analytics tool.