The Sitecore Azure Search provider integrates the Sitecore Search engine with the Microsoft Azure Search service.
Azure Search is a Solr replacement in distributed installations for on-premise Azure IaaS and Azure PaaS solutions. Although Azure Search supports the Lucene query syntax, the behavior is different. Therefore, solution migration might require additional actions.
This topic describes:
- Functional differences in configuration
- Fields with the same name and type
- Fields limitation in the index
- Queries and the behavior of predicates
- Paging support
- Language support
- Recommended migration steps
The Azure Search service has a few limitations that are not present in Solr or Lucene. Therefore, ensure you are familiar with the following topics:
Functional differences in configuration
There are a few differences in functionality between Lucene, Solr, and Sitecore Azure Search; for example, unlike Lucene or Solr, Azure Search requires a defined schema for all indices. The Sitecore Azure Search provider automatically creates these schemas while indexing. Unlike Solr, you do not need to create a schema manually.
If the configuration for a field is not present, it is resolved by the incoming field values. Therefore, to ensure predictable behavior, define the fields that you are indexing in the configuration.
You can configure fields in two ways:
The field node in both cases can include the following cloud-specific attributes:
cloudFieldName- this defines the name used in the stored document. You can only define this attribute for the field node in the
sortable- these attributes instruct the Azure search service how to handle the field.
cloudAnalyzer- the type of analyzer to apply. There are currently only two predefined analyzers supported:
lowercase_keyword- the same as the Lucene analyzer.
language- the analyzer for culture-specific data.
<field fieldName="_fullpath" cloudFieldName="fullpath_1" searchable="YES" retrievable="YES" facetable="YES" filterable="YES" sortable="YES" boost="1f" type="System.String" settingType="Sitecore.ContentSearch.Azure.CloudSearchFieldConfiguration, Sitecore.ContentSearch.Azure" cloudAnalyzer="lowercase_keyword" />
Fields with the same name and type
Lucene groups fields that have the same name in a document and effectively stores them in array. However, Solr and Azure Search store fields that have the same name and type only once (skipping the duplicate and saving only one). We recommend that you avoid having field names with same name in an item or document.
Fields limitation in the index
Azure Search is limited to 1,000 fields per index, which applies to all available service tiers. The Sitecore Azure Search component includes an exclude field list for content indexes that you can extend with specific items, for example:
You can also move solution-related content from the master/web indices into a dedicated one.
Queries and the behavior of predicates
When using queries and predicates with Azure Search, consider the following:
Wherepredicates transform into the same query strings. Currently, there is no way to force the
- Filters against values containing phrase tokens can return more documents than expected. Index, for example, has a
Languagefield that contains:
en-gbacross multiple documents. To return all documents that are
en-gb, use the following query:
queryable.Filter(item => item.Language.Equals(“en”))
- For language-specific queries, you can use the field
language, for example:
queryable.Filter(item => item.ParsedLanguage.Equals(“japanese_japan”))
time boostingis only supported within the query, so you must move boosting to the query when defining it in the configuration process.
- The predicates
Containscan return more records than expected for conditions with multiple words. This is because conditions are translated into regex statements. For example, if there are three documents that contain the text apple, pineapple, and pineapple is not actually an apple respectively, then a query with the condition
Text.EndsWith("an apple")will return all three documents.
- Fuzzy query semantics are different in Azure Search, for example:
Using like as a query for pattern or similarity, interprets the
similarityparameter as the Damerau-Levenshtein distance with a value between 0 and 2. This differs in Sitecore, where Lucene implements the
similarityparameter by using the BM25 similarity.
From Sitecore 8.2-Update-7 and Sitecore 9.0 Update-2, a single search query returns 1,000 documents by default. If the query finds more documents then
CloudSearchResults automatically iterates through all of them.
You can limit the number of documents that are returned by a single request to 50 by setting
ContentSearch.Azure.LimitSearchResultsPerRequest to true or by implementing your own iterator with the
Skip LINQ extensions.
The maximum value supported by
Skip is 100,000.
To search with a language-specific context, you must use a corresponding language analyzer during indexing. For fields that you want to index with a language context, you must set the
cloudAnalyzer="language" attribute during configuration. The list of supported languages is limited by the number of Azure Language analyzers.
Azure Search automatically picks up specific language analyzers during indexing by using the configuration defined in the
cloudCultureBasedAnalyzerConfiguration section of the
Recommended migration steps
When migrating from Lucene or Solr to Azure Search, ensure you:
- Add the necessary field definitions to the search index configurations in Sitecore for all related Sitecore instances, for example, Content Management and Content Delivery.
- Review the fields that are stored by default to avoid reaching the 1,000 fields limit.
- Review your search queries and verify the behavior of queries by using
Containswith multi-word queries. You can also consider rewriting the queries to refine the results that are returned on the client side.
For example, if you want to pre-filter results on the service side, you would change the following query:
var results = from item in index where item.Field.StartsWith("sitecore example") select item
var intermediateResults = from item in index.Take(100) where item.Field.Contains(“sitecore example”) select item
Or, if you want to return precise results using in-memory filtering on the client side, then change the query to:
var results = intermediateResults.AsEnumerable().Where(item => item.Field.StartsWith(“sitecore example”));
- Avoid using
StartsWithto match the
FullPathprefix when you want to return the descendants of an item. Instead use the built-in
Pathfield and item ID, for example:
var descendants = from item in index where item.Path == rootItem.ID select item;
- Review queries using fuzzy search and replace the similarity ratio, (the floating point between 0 and 1), with the Damreau-Levenstein distance (an integer between 0 and 2).
- Review queries that iterate over large numbers of results (for example, when using the List Manager API). To avoid iterating over more than 100,000 results in a single query, rewrite the queries to partition results using one of the search fields.
- If your search page supports a language selection of its results, review queries to ensure that precise results return for regional dialects, for example, as
- If you must routinely search over a large set of domain-specific items, such as news articles, products, or events, consider moving indexing and search into a dedicated index and define a precise index schema in the Sitecore configuration. Use
ExcludeTemplateFieldto control which items and fields are included in the index. This helps keep the number of fields in the index under 1,000.
- If you use index-time boosting on the fields, and want to obtain the equivalent behavior using Azure Search, consider rewriting the queries so they include boost weights.