Index update strategies

Last updated Thursday, August 31, 2017 in Sitecore Experience Platform for Administrator, Developer

You use index update strategies to maintain indexes. You can configure each index with a unique set of index update strategies. You should not specify more than three update strategies per index for performance reasons.

Sitecore provides a varied set of index update strategies, and you can extend this set with more strategies. All the strategies that are delivered with Sitecore are defined under the following node in the Sitecore.ContentSearch configuration files:

sitecore/contentSearch/indexConfigurations/indexUpdateStrategies
<manual type="Sitecore.ContentSearch.Maintenance.Strategies.ManualStrategy,           Sitecore.ContentSearch" />

Sitecore comes with the following strategies:

  • RebuildAfterFullPublish
  • OnPublishEndAsync
  • IntervalAsynchronous
  • Synchronous
  • RemoteRebuild
  • TimedIndexRefresh
  • Manual

RebuildAfterFullPublish strategy

This strategy is defined in the following way in the configuration file:

<rebuildAfterFullPublish type="Sitecore.ContentSearch.Maintenance.Strategies. RebuildAfterFullPublishStrategy, Sitecore.ContentSearch" />

During initialization, this strategy subscribes to the OnFullPublishEnd event and it triggers a full index rebuild.

In a distributed environment, the index rebuild is triggered on all remote servers where this strategy is configured. In this case, you must enable the event queue.

In environments where a full publish is required to run regularly, you should not trigger incremental index rebuilds because this uses a lot of resources. Instead, this strategy triggers a full index rebuild when a full publish process has completed.

When you attach this strategy to an index, you will see the following message in the CrawlingLog file when it is initialized:

Initializing RebuildAfterFullPublishStrategy for index '<index_name>'

When this strategy is triggered, you will see the following message in the CrawlingLog file:

RebuildAfterFullPublishStrategy triggered on index '<index_name>'

Attaching the RebuildAfterFullPublish strategy to an index

Attach this strategy to an index in the following way:

<index id="sitecore_index" type="Sitecore.ContentSearch.LuceneProvider.
          LuceneIndex, Sitecore.ContentSearch.LuceneProvider">
   <param desc="name">$(id)</param>
   <param desc="folder">$(id)</param>
   <strategies hint="list:AddStrategy">
      <strategy ref="contentSearch/indexConfigurations/indexUpdateStrategies/rebuildAfterFullPublish" />
   </strategies>
   <Analyzer ref="search/analyzer" />

Best practice

You should not combine this strategy with the Synchronous Strategy, but you can combine it with any of the other strategies.

Because this strategy causes a full index rebuild, you should combine it with the SwitchOnRebuildLuceneIndex or the SwitchOnRebuildSolrSearchIndex.

When you use this strategy in combination with the OnPublishEndAsync strategy, you need to register it as the first entry in the list for it to be triggered first:

<index id="sitecore_index" type="Sitecore.ContentSearch.LuceneProvider.LuceneIndex, 
          Sitecore.ContentSearch.LuceneProvider">
   <param desc="name">$(id)</param>
   <param desc="folder">$(id)</param>
   <strategies hint="list:AddStrategy">
      <strategy ref="contentSearch/indexConfigurations/indexUpdateStrategies/rebuildAfterFullPublish" />
      <strategy ref="contentSearch/indexConfigurations/indexUpdateStrategies/onPublishEndAsync" />
   </strategies>
   <Analyzer ref="search/analyzer" />

OnPublishEndAsync strategy

This strategy is defined in the following way in the configuration file:

<onPublishEndAsync type="Sitecore.ContentSearch.Maintenance.Strategies.
           OnPublishEndAsynchronousStrategy, Sitecore.ContentSearch">
          <param desc="database">web</param>
          <CheckForThreshold>true</CheckForThreshold>
</onPublishEndAsync>

During initialization, this strategy subscribes to the OnPublishEnd event and triggers an incremental index rebuild.

If you have separate CM and CD servers, this event is triggered via the EventQueue object. This means that you must enable the EventQueue object for this strategy to work in this kind of environment.

Note

There is an additional database parameter that is passed to the constructor of the OnPublishEndAsynchronousStrategy class. This parameter defines the database to look up the item changes from.

When you attach this strategy to an index and it is initialized, you will see the following message in the CrawlingLog file:

Initializing OnPublishEndAsynchronousStrategy for index '<index_name>'.

When this strategy is triggered, you will see the following message in the CrawlingLog file:

"<index_name> OnPublishEndAsynchronousStrategy executing."

Processing

This strategy uses the EventQueue object from the database it was initialized with:

<param desc="database">web</param>

This means that this strategy depends on a number of things:

  • This database must be specified in the <databases /> section of the configuration file.
  • The EnableEventQueues setting must be true.
  • The EventQueue table within the preconfigured database must have entries that are dated later than the last update timestamp of the index.

The strategy forces a full index rebuild when the number of entries in the history table exceeds the number defined in the Indexing.FullRebuildItemCountThreshold setting. This is to prevent excessive processing of the event queue. In most cases, this happens because there was a large publishing or deployment, and this should always trigger a full index rebuild.

This behavior is only triggered when the following property in the configuration file is set to true (and this is the default):

<CheckForThreshold>true</CheckForThreshold>

If you specify this setting as true, we recommend that you also use the SwitchOnRebuildLuceneIndex or the SwitchOnRebuildSolrSearchIndex implementation for any index that uses this strategy.

The value of the Indexing.FullRebuildItemCountThreshold setting has a default of 100000.

Attaching the OnPublishEndAsync strategy to an index

Attach this strategy to an index in the following way:

<index id="sitecore_index" type="Sitecore.ContentSearch.LuceneProvider.LuceneIndex, 
         Sitecore.ContentSearch.LuceneProvider">
   <param desc="name">$(id)</param>
   <param desc="folder">$(id)</param>
   <strategies hint="list:AddStrategy">
      <strategy ref="contentSearch/indexConfigurations/indexUpdateStrategies/onPublishEndAsync" />
   </strategies>
   <Analyzer ref="search/analyzer" />

Best practice

Do not combine this strategy with any of these strategies:

  • Synchronous
  • IntervalAsynchronous

You can combine it with these strategies:

  • RebuildAfterFullPublish
  • RemoteRebuild

You should use this strategy for multiserver/multi-instance environments, where you have already enabled the EventQueue.

IntervalAsynchronous strategy

This strategy is defined in the following way in the configuration file:

<intervalAsyncMaster type="Sitecore.ContentSearch.Maintenance.Strategies.
          IntervalAsynchronousStrategy, Sitecore.ContentSearch">
   <param desc="database">master</param>
   <param desc="interval">00:00:10</param>
   <CheckForThreshold>true</CheckForThreshold>
</intervalAsyncMaster>
  • You specify the database to look up item changes for the processing from with the database parameter.
  • You specify the frequency of the strategy trigger with the interval parameter.

When you attach this strategy to an index and it is initialized, you can see the following message in the CrawlingLog file:

Initializing IntervalAsynchronousUpdateStrategy for index '<index_name>'.

When this strategy is triggered, you can see the following message in the CrawlingLog file:

IntervalAsynchronousUpdateStrategy triggered on index '<index_name>'

Processing

This strategy is triggered by a time interval and not the OnPublishEnd event. It depends on the History Engine Store to process item changes. The following conditions must be fulfilled for the strategy to work:

  • The referenced database must exist.
  • The referenced database must have History Engine enabled.
  • The History Engine must have entries that are dated later than the last update timestamp of the index.

The strategy uses an internal timer that is initialized with a predefined interval value. The strategy is triggered when the timer fires. In this example, the timer is set to fire every 10 seconds:

<intervalAsync type="Sitecore.ContentSearch.Maintenance.Strategies.
         IntervalAsynchronousStrategy, Sitecore.ContentSearch">
   <param desc="database">web</param>
   <param desc="interval">00:00:10</param>
   <CheckForThreshold>true</CheckForThreshold>
</intervalAsync>

The strategy forces a full index rebuild when the number of entries in the history table exceeds the number you specify in the Indexing.FullRebuildItemCountThreshold setting. This normally means that a substantial publishing or deployment has taken place, and this should always trigger a full index rebuild.

This behavior is only triggered when you set this property to true (which is the default):

<CheckForThreshold>true</CheckForThreshold>

If this setting is set to true, you should use the SwitchOnRebuildLuceneIndex or the SwitchOnRebuildSolrSearchIndex implementation.

The Indexing.FullRebuildItemCountThreshold setting is not enabled in the configuration files that Sitecore delivers. It defaults to 100000.

Attaching the IntervalAsynchronous strategy to an index

Attach this strategy to an index in the following way:

<index id="sitecore_index" type="Sitecore.ContentSearch.LuceneProvider.LuceneIndex, 
         Sitecore.ContentSearch.LuceneProvider">
   <param desc="name">$(id)</param>
   <param desc="folder">$(id)</param>
   <strategies hint="list:AddStrategy">
      <strategy ref="contentSearch/indexConfigurations/indexUpdateStrategies/intervalAsync" />
   </strategies>
   <Analyzer ref="search/analyzer" />
   ...

Best practice

Do not combine this strategy with these strategies:

  • SynchronousStrategy
  • OnPublishEndAsync

You can combine it with these strategies:

  • RebuildAfterFullPublish
  • RemoteRebuild

You should use this strategy for the master database indexes and for single-server environments where you want to use as few resources as possible.

This strategy is also useful for less critical indexes that you do not need to be updated frequently. You can adjust the interval to fit your needs.

This strategy is created for the core and master databases in the setup that Sitecore delivers:

            <intervalAsyncCore type="Sitecore.ContentSearch.Maintenance.Strategies.
                     IntervalAsynchronousStrategy, Sitecore.ContentSearch">
               <param desc="database">core</param>
               <param desc="interval">00:01:00</param>
               <CheckForThreshold>true</CheckForThreshold>
            </intervalAsyncCore>
            <intervalAsyncMaster type="Sitecore.ContentSearch.Maintenance.Strategies.
                     IntervalAsynchronousStrategy, Sitecore.ContentSearch">
               <param desc="database">master</param>
               <param desc="interval">00:00:10</param>
               <CheckForThreshold>true</CheckForThreshold>
            </intervalAsyncMaster>

Synchronous strategy

This strategy is the index update strategy closest to real-time. It is also the most expensive strategy in terms of CPU and I/O.

Before you use this strategy, you must be familiar with the best practices.

You specify this strategy in the following way:

<sync type="Sitecore.ContentSearch.Maintenance.Strategies.SynchronousStrategy, Sitecore.ContentSearch" />

When you attach this strategy to an index and it is initialized, you will see the following message in the CrawlingLog file:

Initializing SynchronousStrategy for index '<index_name>'.

When this strategy is triggered, you see this message in the CrawlingLog file:

SynchronousStrategy triggered on index '<index_name>'

Processing

This strategy subscribes to low-level DataEngine events, such as ItemSaved and ItemSavedRemote. When you use it on a single-server instance, it guarantees an index update immediately after an item update.

In a multiserver environment, the strategy uses the EventQueue that broadcasts remote ItemSavedRemote events. When an item is published and the ItemSavedRemote event is raised, the strategy is triggered.

Attaching the Synchronous strategy to an index

Attach this strategy to an index in the following way:

<index id="sitecore_index" type="Sitecore.ContentSearch.LuceneProvider.LuceneIndex, 
         Sitecore.ContentSearch.LuceneProvider">
   <param desc="name">$(id)</param>
   <param desc="folder">$(id)</param>
   <strategies hint="list:AddStrategy">
      <strategy ref="contentSearch/indexConfigurations/indexUpdateStrategies/sync" />
   </strategies>
   <Analyzer ref="search/analyzer" />
   ...

Best practice

Use this strategy if you need immediate index updates and you have a dedicated indexing server infrastructure that has plenty of processing resources. You should only use the Synchronous strategy on CM servers for the indexes that process the master database and where the timing of the index update is critical.

If you use this strategy on a CM server where many entries are added and changed, it can degrade system performance severely. In most cases, the IntervalAsyncronous strategy configured for the master database is sufficient.

You can only combine this strategy with the following strategy:

  • RemoteRebuild

The strategy has these prerequisites:

  • You must enable the EventQueue.

RemoteRebuild strategy

This strategy subscribes to the OnIndexingEndedRemote event. This event is triggered when a particular index is rebuilt. The strategy is only activated when a full index rebuild takes place.

You use this mechanism to rebuild remote indexes when you force an index rebuild. You specify this strategy like this:

<remoteRebuild type="Sitecore.ContentSearch.Maintenance.Strategies.
          RemoteRebuildStrategy, Sitecore.ContentSearch" />

Attaching the RemoteRebuild strategy to an index

Attach this strategy to an index in the following way:

<index id="sitecore_index" type="Sitecore.ContentSearch.LuceneProvider.LuceneIndex, 
         Sitecore.ContentSearch.LuceneProvider">
   <param desc="name">$(id)</param>
   <param desc="folder">$(id)</param>
   <strategies hint="list:AddStrategy">
      <strategy ref="contentSearch/indexConfigurations/indexUpdateStrategies/remoteRebuild" />
   </strategies>
   <Analyzer ref="search/analyzer" />
   ...

Best practice

You can combine this strategy with any other strategy. You use it in multiserver environments, where each Sitecore instance maintains its own copy of the index. You can then trigger a full rebuild from one CM server, and all remote servers where the index is configured with this strategy will rebuild.

The strategy has these prerequisites:

  • The name of the index on the remote server must be identical to the name of the index that you forced to rebuild.
  • You must enable the EventQueue.
  • The database you assign for system event queue storage (core by default) must be shared between the Sitecore instance where the rebuild takes place and the other instances.

TimedIndexRefresh strategy

When you use observable crawlers, you need to consider the implications of an index operation. Observable crawlers constantly listen for new items. When a crawler receives notice of a new item, it will cache the item in memory until an index operation is processed. This has the following implications:

  • Rebuilding: If you rebuild your index (clear all existing content and index new items), only the items currently held by the crawler will be inserted.
  • Index Frequency: As each crawler caches each item crawled in memory, you should index at a consistent frequency to make sure that items are flushed from memory before the memory usage becomes too large.
  • Update only: As your source of data is only a feed and you do not have access to all the data at once, you should only call update methods on your index.

You can use the TimedIndexRefresh strategy on an index to resolve these issues. This strategy refreshes an index with data from the crawlers, but it does not cause the index to be reset.

Configuration

Add the following to the configuration file to configure this strategy:

<configuration xmlns:patch="http://www.sitecore.net/xmlconfig/">
  <sitecore>
    <contentSearch>
      <configuration>
        <indexes hint="list:AddIndex">
          <index id="sitecore_atlas_index" type="Sitecore.ContentSearch.LuceneProvider.LuceneIndex, Sitecore.ContentSearch.LuceneProvider">
            <!-- ... other configuration ... -->
            <strategies hint="list:AddStrategy">
              <timed type="Sitecore.ContentSearch.Analytics.TimedIndexRefreshStrategy, Sitecore.ContentSearch.Analytics">
                <param desc="interval">00:01:00</param>
              </timed>
            </strategies>
            <locations hint="list:AddCrawler">
              <crawler type="Sitecore.ContentSearch.Analytics.AnalyticsObserverCrawler, Sitecore.ContentSearch.Analytics">
                <ObservableName>DefaultObservable</ObservableName>
                <CrawlerName>Lucene Crawler</CrawlerName>
              </crawler>
            </locations>
          </index>
        </indexes>
      </configuration>
    </contentSearch>
  </sitecore>
</configuration>

You must provide the interval parameter. In this example, it is configured to run every minute.

Manual strategy

This strategy disables any automatic index updates. When you use this strategy for an index, you must rebuild this index manually.

You specify this strategy like this:

<manual type="Sitecore.ContentSearch.Maintenance.Strategies.ManualStrategy, 
          Sitecore.ContentSearch" />

When you attach this strategy to an index and it is initialized, you see the following message in the CrawlingLog file:

Initializing ManualStrategy for index '<index_name>'.

Index will have to be rebuilt manually

Attaching the Manual strategy to an index

Attach this strategy to an index in the following way:

<index id="sitecore_index" type="Sitecore.ContentSearch.LuceneProvider.LuceneIndex, 
         Sitecore.ContentSearch.LuceneProvider">
   <param desc="name">$(id)</param>
   <param desc="folder">$(id)</param>
   <strategies hint="list:AddStrategy">
      <strategy ref="contentSearch/indexConfigurations/indexUpdateStrategies/manual" />
   </strategies>
   <Analyzer ref="search/analyzer" />
   ...

Best practice

Do not combine this strategy with any other strategy. It is reserved for special situations where you have to outsource the whole indexing process to a dedicated server and you do not want any index updates on other Sitecore instances.