Visitor identification using the robot detection component

Last updated Wednesday, November 30, 2016 in Sitecore Experience Platform for Administrator, Developer

The Sitecore robot detection component detects robots and unwanted interactions from automated browsers and robots. The component is enabled by default and consists of a pipeline processor, an event handler, a JavaScript file (that detects human behavior), and several robot detection classes.

Every time a page is requested on your website, the following pipeline processor is activated:

  • Sitecore.Analytics.RobotDetection.Pipeline.InitializeTracker.Robots

The processor first checks that robot detection is enabled by checking the value of the Analytics.AutoDetectBots setting in the Sitecore.Analytics.Tracking.config file. You can disable the component by changing this setting to false.

Contact classification

The ContactClassification class contains the classification constants and helper methods. The following helper methods take the contact classification as a parameter and return a Boolean value indicating whether the contact is a human or a robot.

  • IsHuman
  • IsRobot
  • IsAutoDetectedRobot

Initial visitor classification

The SC_ANALYTICS_GLOBAL_COOKIE contains the IsClassificationGuessed field, which is set to true or false.

When a new visitor comes to the website, it is set to false by default because the classification of the visitor has not yet been determined. At this stage, the visitor could be a human or a robot. When the visitor classification has been determined, this field is set to true.

The visitor identification control

When a visitor views a page on your website, the VisitorIdentification control is rendered on the page. It first checks whether the VisitorIdentification.ascx control is present in the layouts/system folder. If the control is present, the content of the VisitorIdentification.ascx user control is rendered on the page, and it:

  • Saves the current UTC time to the VICurrentDateTime meta tag.
  • Adds a reference to the layouts\system\VisitorIdentification.js to load the visitor identification JavaScript file.

JavaScript detection of human behavior

When a visitor views a page, the VisitorIdentification.js JavaScript file is loaded. Robots do not usually load CSS or JavaScript files.

There are two events that the script subscribes to:

  • OnMouseMove event – triggered when a computer mouse is moved.
  • OnTouchStart event – triggered when the screen on a tablet or mobile phone is touched.

If the computer mouse is moved or if the visitor touches the screen of a tablet or mobile phone, code is executed that requests the VisitorIdentificationCSS.aspx page. A URL to this page is created (not a direct request). If the visitor is a robot, it is unlikely they will load this CSS stylesheet so this indicates human behavior, as a human visitor will attempt to load the stylesheet into a browser. When this happens the VisitorIdentificationCSS.aspx page is requested, which generates an empty style sheet. This page also contains code that is executed every time a request for the page is made.

If a human visitor has caused the page to run, the code in this page makes the following changes:

  • Visitor classification code is set to 0, which means the visitor is classified as human.
     Current.Session.SetClassification(0, 0, true); - 
  • The IsClassificationGuessed boolean value of the cookie set to true. This means that the visitor has now been classified so the robot detection logic no longer needs to be executed
    cookie.IsClassificationGuessed = true;
  • The ASP.NET session timeout setting is reset back to the default for human visitors (20 minutes).

Timeout setting comparison

The timeout setting comparison is a final check that is carried out after the other robot detection measures have been performed. The execution of the JavaScript function is scheduled to take place after 30 seconds (the default setting):

timeoutSleep (30000, placeCheckerRequest);

This function reads the UTC time from the VICurrentDateTime meta tag and makes a request to the VIChecker.aspx page sending the retrieved time in the tstamp parameter.

The VIChecker.aspx page checks the difference between the current UTC time and the time in the tstamp parameter. If the visitor is a human visitor, this code is executed 30 seconds after the page is loaded. Robots can execute JavaScript quicker than 30 seconds, so if the request is executed in under 30 seconds, the contact is detected as a robot. As a result, the visitor classification is set back to a robot.

Tracker.Current.Session.SetClassification(925, 925. True);

The Media Request event handler

In earlier robot detection logic, if a visitor made a request to download a media item, then the visitor was identified as human. In the xDB robot detection component, this approach is not enough.

In the Sitecore.Analytics.Tracking.RobotDetection.config file, the following event handler enforces this:

  • Sitecore.Analytics.RobotDetection.Media.MediaRequestEventHandler

When this event handler is loaded, it processes the tracking field of the media item but does not change the classification to human if a visitor downloads a media item.

To be able to change the classification, you need access to the session. In Sitecore, the custom media request session module (a C# class file) enables a session for requests to media items that contain something in the tracking field. If there is nothing in the tracking field, a session is not required, which in turn speeds up the processing time of the requests.