Multiple indexes (sharding)

Last updated Friday, February 26, 2016 in Sitecore Experience Platform for Developer, Administrator

Index sharding is a process that splits the documents in an index into smaller partitions. These smaller partitions are called shards. The result is that instead of all documents being in one large index, documents are distributed between shards. The sharding process has logic (the "sharding strategy") that decides how the documents are allocated to the shards.

The basic sharding process is this:

One index satisfies the needs of most Sitecore solutions but multiple indexes offer better scaling when needed.

Sharding and Solr

When you use Solr, Sitecore does not handle the sharding. Instead, the SolrCloud feature of the Solr application handles the sharding.

Solr can automatically assign documents to shards (similar to what Sitecore can do for Lucene) and it has extra features, such as replicated shards. Replicated shards are useful for handling failure and failover scenarios.

The Sitecore implementation of Solr handles a sharded endpoint in the same way it handles an unsharded endpoint. You do not need any extra configuration to work with Solr sharded indexes.

Note

Sitecore does not fully support failover. Specifically, Sitecore (as a Solr client) cannot switch between Solr servers (Solr replicas) if the current server (leader) goes down.

For more information about the configuration of the SolrCloud, go to https://cwiki.apache.org/confluence/display/solr/SolrCloud

Sharding and Lucene

When you use Lucene, the data from each of the three Sitecore databases (master, web, and core) is, by default, stored in a single search index. As your search index grows, you can implement a sharding strategy to store the data from each database in its own separate search index.

You can also shard in other ways. For example, you can have a separate index for the media library.

If you use buckets and have thousands or millions of items, sharding is an approach you can use if you want to continue using Lucene. If your search indexes continue to grow and become too large for this strategy, you should switch to using Solr.

Note

If you use sharding, you must turn off the other Lucene configuration files because leaving these enabled will create redundant indexes.

Configure multiple search indexes

Sitecore provides the following example configuration files that help you create an index for each database:

Sitecore.ContentSearch.Lucene.Indexes.Sharded.Core.config.example

Sitecore.ContentSearch.Lucene.Indexes.Sharded.Master.config.example

Sitecore.ContentSearch.Lucene.Indexes.Sharded.Web.config.example

These files are stored in the Include folder (wwwroot\<site name>\Website\App_Config\Include).

If these configuration files are not sharded enough, you can change the configuration to fit your needs.

Use the following code sample and table to see what you need to add:

<configuration xmlns:patch="http://www.sitecore.net/xmlconfig/">
  <sitecore>
    <contentSearch>
      <configuration type="Sitecore.ContentSearch.LuceneProvider.LuceneSearchConfiguration, 
       Sitecore.ContentSearch.LuceneProvider">
        <indexes hint="list:AddIndex">
          <index id="sitecore_core_index" 
           type="Sitecore.ContentSearch.LuceneProvider.LuceneIndex, 
            Sitecore.ContentSearch.LuceneProvider">
            <param desc="name">$(id)</param>
            <param desc="folder">$(id)</param>
            <!-- This initializes index property store. Id has to be set to the index id -->
            <param desc="propertyStore" ref="contentSearch/databasePropertyStore" 
             param1="$(id)" />
            <strategies hint="list:AddStrategy">
              <!-- NOTE: order of these is controls the execution order -->
              <strategy ref="contentSearch/indexUpdateStrategies/intervalAsyncCore" />
            </strategies>
            <commitPolicy hint="raw:SetCommitPolicy">
              <policy type="Sitecore.ContentSearch.TimeIntervalCommitPolicy, 
               Sitecore.ContentSearch" />
            </commitPolicy>
            <commitPolicyExecutor hint="raw:SetCommitPolicyExecutor">
              <policyExecutor type="Sitecore.ContentSearch.CommitPolicyExecutor, 
               Sitecore.ContentSearch" />
            </commitPolicyExecutor>
            <locations hint="list:AddCrawler">
              <crawler type="Sitecore.ContentSearch.LuceneProvider.Crawlers.DefaultCrawler, 
               Sitecore.ContentSearch.LuceneProvider">
                <Database>core</Database>
                <Root>/sitecore</Root>
              </crawler>
            </locations>
          </index>
        </indexes>
      </configuration>
    </contentSearch>
  </sitecore>
</configuration>

Name

Description

Example

<Root>

Specify the root node of the content tree to be included in the index.

<Root>/sitecore/media library</Root>

<name>

Name of the search index.

<param desc="name">$(id)</param>

<Database>

Database name.

<Database>core</Database>

<strategies>

List of index strategies to run.

<strategies hint="list:AddStrategy">

<strategy

ref="contentSearch/indexUpdateStrategies/intervalAsyncCore" />

</strategies>

<CommitPolicy>

Controls when the index commits what it has in memory or in temporary files to disk. This can be time based or document count based.

<commitPolicy

hint="raw:SetCommitPolicy">

<policy

type="Sitecore.ContentSearch.TimeIntervalCommitPolicy,Sitecore.ContentSearch"/>

</commitPolicy>

<commitPolicyExecutor>

The class that executes the commit.

<commitPolicyExecutor hint="raw:SetCommitPolicyExecutor">

<policyExecutor type="Sitecore.ContentSearch.CommitPolicyExecutor, Sitecore.ContentSearch" />

</commitPolicyExecutor>

Index context switcher

If you use sharding, Sitecore uses the <Root> element in relation to the Context.Item to determine which index to use. This index switching is automatic.

Note

The more specific your <Root> is, the higher it needs to be listed in the configuration file. The index context switcher uses the indexes in the order that they are listed.

For example, if you have an index <Root> element of /sitecore/content/Home, it should be located below the index for a <Root> element of /sitecore/content/Home/Flights:

<configuration xmlns:patch="http://www.sitecore.net/xmlconfig/">
  <sitecore>
    <contentSearch>
      <configuration type="Sitecore.ContentSearch.LuceneProvider.LuceneSearchConfiguration, 
       Sitecore.ContentSearch.LuceneProvider">
        <indexes hint="list:AddIndex">
          <index id="sitecore_core_index" 
           type="Sitecore.ContentSearch.LuceneProvider.LuceneIndex, 
            Sitecore.ContentSearch.LuceneProvider">
            <param desc="name">$(id)</param>
            <param desc="folder">$(id)</param>
            <!-- This initializes index property store. Id has to be set to the index id -->
            <param desc="propertyStore" ref="contentSearch/databasePropertyStore" 
             param1="$(id)" />
            <strategies hint="list:AddStrategy">
              <!-- NOTE: order of these is controls the execution order -->
              <strategy ref="contentSearch/indexUpdateStrategies/intervalAsyncCore" />
            </strategies>
            <commitPolicy hint="raw:SetCommitPolicy">
              <policy type="Sitecore.ContentSearch.TimeIntervalCommitPolicy, 
               Sitecore.ContentSearch" />
            </commitPolicy>
            <commitPolicyExecutor hint="raw:SetCommitPolicyExecutor">
              <policyExecutor type="Sitecore.ContentSearch.CommitPolicyExecutor, 
               Sitecore.ContentSearch" />
            </commitPolicyExecutor>
            <locations hint="list:AddCrawler">
              <crawler type="Sitecore.ContentSearch.LuceneProvider.Crawlers.DefaultCrawler, 
               Sitecore.ContentSearch.LuceneProvider">
                <Database>core</Database>
                <Root>/sitecore</Root>
              </crawler>
            </locations>
          </index>
        </indexes>
      </configuration>
    </contentSearch>
  </sitecore>
</configuration>

Default sharding strategy

Sitecore provides a default sharding strategy called the LucenePartitionShardingStrategy. This strategy takes a document and calculates a hash of the ID to determine which shard to put it into. This hashing is very fast and does not rely on any shared state or ID generation. This approach does not give you a completely even distribution (for example, 100 documents are not split 50/50) but it improves performance considerably.

This strategy only has one option: the shardDistribution parameter. You must set this parameter to be a factor of 2 (2, 4, 8, 16, …) and this specifies how many shards the index is split into.

Create a new sharding strategy

If the default strategy is not what you need, you can implement your own strategy. You do this by using the Sitecore.ContentSearch.Sharding.IShardingStrategy interface, and passing the implementation into the index.

You should rebuild your index after applying a strategy. It is not essential, but it will give the index a more even distribution of documents.

Send feedback about the documentation to docsite@sitecore.net.