Components of Enterprise Search - Share Point 2010

Enterprise search comprises many components—some were mentioned briefly in earlier; others have been specifically designed to provide enterprise-class search functionality and are covered here.

Before we delve into the components that we, as developers, are most likely to use to meet the specific requirements of our application, let’s take a brief look at how enterprise search works in SharePoint. Every search solution has three main elements: the front end web server, query architecture, and crawl architecture. In that respect, SharePoint 2010 is not remarkably different from MOSS 2007. However, as you’ll see when we drill down a bit, the way in which these three elements are implemented by SharePoint Server 2010 offers a higher degree of flexibility.

The first element in which most of our development work will be done is the front-endweb server. The web server acts as the presentation layer for our search solution and hosts the pages and controls that will be used to capture queries and display results.

The next element in the solution is the query architecture, which consists of one or more query servers, each responsible for directly servicing all or part of a search query. This is the business logic layer of our search solution.

The final element is the crawl architecture. While the query architecture is responsible for servicing end user queries, the crawl architecture is responsible for scanning connected data sources and producing indexes of the content found. In addition to producing the index, the crawl architecture also generates a properties database. As you’ll see when we get into how queries are executed, there’s a big difference between the index file and the property database.

Enhancements in SharePoint 2010
Within MOSS 2007, although enterprise search was capable of supporting a large corpus and tens of thousands of users, the overall topology suffered a few problems. For example, each shared service provider could use only a single index server. Notwithstanding the hardware requirements for this single server on very large farms, this was a major issue because the index server became a single point of failure. Another major drawback was the physical size of the index files. Although this was somewhat mitigated by using a number of smaller shadow index files, ultimately all the index files had to be merged into a single master file, and the master file had to be present on all query servers in the farm. The physical hardware required to support this was significant.

With SharePoint 2010, these problems have been addressed by subdividing the query architecture and the crawl architecture into a number of smaller, more scalable components. For example, rather than a single index server, SharePoint 2010 introduces the concept of a crawl component. Using crawl components, you can add multiple index servers to a farm, each running one or more crawl components.

Indexing Components
Now that you understand how an enterprise search solution is implemented using SharePoint Server 2010, let’s look at some of the configurable components—starting with how the crawl architecture can be extended by adding indexing components.

Search Connector Framework
From our perspective as software developers, one of the most significant aspects of the crawl architecture is the Search Connector Framework, the preferred mechanism used by the crawler component when accessing data to be indexed. The Search Connector Framework should be familiar to you at this point in the book, since it’s based on Business Connectivity Services (BCS).

A number of different properties can be attached to Business Data Connectivity (BDC) metadata when you’re creating a search connector. At a minimum, however, the following additions need to be made to configure a search connector properly. First, to make sure that the BDC adaptor is visible in the Search user interface, the ShowInSearchUI property must be set on the LobSystemInstance object.

  1. In the BDC Explorer pane, select the LobSystemInstances node for the model, and then click the ellipsis next to Custom Properties in the Properties window, as shown:
  2. Search Connector Framework

  3. In the Property Editor, add a new property named ShowInSearchUI with a Type of System .String and a Value of x, as shown:
  4. Search Connector Framework

The next property that needs to be added is RootFinder, which is attached to the finder method that will be called to enumerate the items to be crawled. For example, if our search connector were crawling data in a database table, the RootFinder method would return a list of identifiers for all items to be crawled. The crawl process would then make use of the SpecificFinder method to perform the actual crawl of each field in the row.

  1. In the BDC Explorer pane, select the ReadList node (or whichever finder method you’re planning to use for enumeration), and then click the ellipsis next to Custom Properties in the Properties window.
  2. In the Property Editor, add a new property named RootFinder with a Type of System.String and a Value of x:
  3. property editor

To support incremental crawls, entities should include a LastModified TimeStamp column. So that the crawler knows which column is the time stamp, the LastModified Time Stamp Field property should be added to the finder method instance.

  1. Select the appropriate finder method instance, and then open the Property Editor.
  2. Add a RootFinder property as above, and then add an additional property named Last Modified Time Stamp Field with a Type of System.String and a Value of x.

You can then configure a Search Connector for a content source as follows:

  1. In Central Administration, navigate to the Search Service Application management page. Select Content Sources from the Crawling menu on the left-hand side.
  2. Click New Content Source.
  3. In the Add Content Source page, enter a suitable name for the new content source, and then select Line of Business Data from the list of Content Source Type options.
  4. Select the appropriate Business Data Catalog Service application, and then select the external data source, as shown:


Protocol Handlers and IFilters
In earlier versions of SharePoint, the index server made use of components known as protocol handlers to connect to content to be indexed. Although protocol handlers are still used in SharePoint 2010, their inclusion is mainly for backward compatibility. The preferred solution for accessing external content is via the Search Connector Framework.

TIP:In SharePoint 2010, when you’re configuring search connectors, the terminology used in the user interface can be confusing. For example, when you’re adding a new content source, the options available include Line of Business Data and Custom Repository. It would be reasonable to assume that selecting Custom Repository would be the correct way to use the Search Connector Framework; however, this isn’t the case. The Custom Repository option is used to reference custom protocol handlers. To make matters more confusing, the user interface often refers to protocol handlers as “custom connectors.” The thing to bear in mind is that the Search Connector Framework is a set of extensions to BCS, and from a user interface perspective, we’re still using BCS. As you’ll see later, the correct way to use the Search Connector Framework is to select a Content Source Type of Line of Business Data.

Working with Content Sources
Building on our understanding of how connections are made to index content physically, let’s look at what happens to that content as part of the indexing process.

You know that the Search Connector Framework can be used to crawl and index content from a wide variety of sources. Each source is defined as a separate entity within SharePoint known as a content source. As well as defining content retrieved via specific connectors, content sources can also be used to subcategorize content within the wider SharePoint farm. For example, a farm may use a content source to define a set of data from a particular site collection.

You should be aware that it’s impossible to create overlapping content sources. For example, it’s impossible to create a content source with a start address and then create another content source with a start address.

TIP:When configuring a search on larger farms, it’s important that you determine which content is most likely to be updated frequently. Since content crawls run on a schedule, it’s good practice to split the corpus into a number of smaller content sources. Doing this will allow greater control over how frequently particular content is indexed and therefore how current search results are for that content. As well as content sources, which define the starting point of any search crawl, SharePoint also allows us to define crawl rules. You can use crawl rules to exclude certain files or folders, or to specify that particular credentials should be used when accessing particular files or folders. An important new feature in SharePoint 2010 is the ability to use regular expressions when defining crawl rules.

Working with Managed Properties
As mentioned earlier, when content is crawled, an index of the content is created along with a property database. Generally speaking, the index contains the main body of the content, whereas the property database contains metadata. So to give an example, when a Word document is crawled, the contents of the document are included in the index, and any properties of the document such as the title, author, or the creation date are added to the property database. The property database contains details of all metadata properties for each item indexed by the crawler process. However, since different items may define the same metadata in different ways, SharePoint incorporated the notion of managed properties.

A managed property is a logical grouping of crawled properties from one or more indexed content types. For example, when an Excel spreadsheet is crawled, author metadata will be retrieved and stored in the property database; however, if an MP3 file is crawled, artist metadata will be retrieved. Logically, both artist and author could be grouped into a managed property named Creator, for example. By making the grouping, it becomes possible for you to search multiple content types using a common set of attributes without your having to understand how those attributes map to the underlying metadata of the content.

Mapping crawled properties to managed properties is particularly important when you’re indexing SharePoint content, since each column in a list or library is stored as a crawled property. When it comes to properties such as Title or Created By, the mapping is straightforward, since these properties are present on every item and therefore the mapping is simply one-to-one. However, as you create custom content types to accommodate your application data structure, the mapping becomes a bit more involved. Mapping crawled properties to managed properties does not occur automatically. If, for example, you have a list named Product containing a Product Name column and a second list named Orders also containing a column named Product Name, these two columns will not be automatically mapped to a managed property. You would physically need to map both crawled properties to a new managed property.

The key thing to be aware of with respect to managed properties versus crawled properties is that only managed properties can be displayed in search results or used for filtering or refining results.

Working with Scopes
We’ve covered how content can be split up using content sources and how metadata can be used by created managed properties; let’s move on to consider one important use of content sources and managed properties: the creation of scopes. For now let’s build up an understanding of what Scopes are and why we might use them.

When a query is executed using the Query Object Model, it’s performed against the entire search index. Sometimes this behavior doesn’t make sense for a number of reasons: if we already know the type of content that we’re looking for, it makes sense to search content of only that type, or if we already know which web site contains the content that we’re looking for, it doesn’t make sense to search all web sites.

Search scopes allow us to define subsets of the search index based on a series of rules. These rules can include only content from a particular content source, only content where a managed property has a specific value, or only content from a specific URL. Additionally, complex combinations of rules can be created to restrict the scope to the content that is appropriate for our search application. For example, if we were implementing a search feature for retrieving technical specification documents, and we knew that these documents existed only within the engineering department web site, we could define a scope that included only content of type technical specification and included only results from the engineering department content source.

We could refine this example further if necessary. Let’s say that some of the technical specifications were flagged as confidential. We could exclude those from search results by creating a managed property that referred to the confidential flag, and then using that managed property in a rule that specifically excluded those documents from the scope. As you can see, by using scopes, you can increase the relevance of search results by restricting the search area to an appropriate subset of the entire index.

Query Components
As you can imagine, the most important requirement for a search solution is the ability to perform queries. Let’s move on to take a look through the various components available to us as developers. As you’ll see, SharePoint 2010 delivers a number of interfaces covering a range of scenarios.

Query Object Model
Enterprise search in SharePoint 2010 provides a Query Object Model that allows developers to use the capabilities of search programmatically within custom applications. The core class of the Query Object Model is the Query abstract class, which has two concrete implementations: the FullTextSqlQuery class, which can be used to issue full-text SQL syntax queries to the search provider, and the KeywordQuery class, which can be used to issue keyword syntax queries. The Query Object Model can be used to query any SharePoint Search application, whether it’s a default SharePoint Search provider or a FAST Search for SharePoint provider.

One thing to bear in mind is that SQL syntax queries are supported only when using SharePoint Search. The examples that follow focus on keyword syntax queries.

Using the Query Object Model is relatively straightforward, as this example illustrates:

Notice a few interesting things about this code sample. First, take a look at the Result Table Collection object that’s returned by the Execute method. The Results Table Collection is an IEnumerable collection of Result Table objects. Each query can therefore return multiple result sets as defined by the Resul tTypes property of the Query class. In this code sample, only Relevant Results are selected, but multiple result sets can be retrieved by performing a bitwise combination of two or more Result Type enumerations, as shown:

Result Types Returned by the Query Object Model Let’s take a look at the various types of results that can be included in the ResultTableCollection:

  • None The query is performed but no results are returned.
  • RelevantResults A result set containing the main search results from the content index matching the search query is returned.
  • SpecialTermResults A result set containing best bet results matching the search query is returned. Best Bet results are manually configured mappings between keywords and specific results. For example, users may frequently search for “permission” to find documentation on how to obtain permissions for a particular resource. Since many documents may contain the word “permission,” it may not be easy to find the relevant document. Best Bets allow the administrator to specify that a particular document is always returned in the search results for a particular keyword.
  • HighConfidenceResults High-confidence results are generated when the keywords entered exactly match items in the search index.
  • DefinitionResults A result set containing definitions for keywords matching the search query is returned.
  • VisualBestBetsResults A result set containing Visual Best Bets matching the search query is returned. Visual Best Bets work like Best Bet results, except that an image is displayed rather than a text result. Visual Best Bets are available only when FAST Search is configured.
  • RefinementResults A result set containing refined results matching the search query is returned. Refinements are a new addition in SharePoint 2010 and make use of property filters to refine search results further. The important difference in SharePoint 2010 is that refinements now have a specific user interface, whereas previously property filters had to be included as part of the search query..

Common Query Language
One of the benefits of the Query Object Model is the availability of a common query language that works across all services supported by the Query Object Model. In practice, two query languages are available to the Query Object Model, keyword syntax and SQL syntax, although only keyword syntax is supported across all services.

Keyword syntax is relatively straightforward and will be familiar to users of any search engine. In its simplest form, a query consists of one or more keywords. For example, to return all documents containing the words “SharePoint” or “Search,” a user would enter this:

sharepoint search

If the results contained links pertaining to “MOSS 2007”, the user could exclude these results by changing the query to this:

sharepoint search -"MOSS 2007"

If the result set contained documents relating to Google search, for example, the user could alter the query to return only documents containing the words “SharePoint” and “search” by changing the query like so:

+sharepoint +search -"MOSS 2007"

As you can see, basic keyword syntax is pretty intuitive.

Using Property Filters
As mentioned, when crawling content, an index and a property database are created. From the database of crawled properties, you can create managed properties, which, as you discovered earlier, are logical groupings of crawled properties. One of the main uses of managed properties is for filtering search results. As well as the basic keyword syntax, the common query language allows you to use property filters to return only results in which a managed property is set to a particular value.

So to pick up on the earlier example, if you wanted to return only Word documents matching your keywords, the query could be changed to this:

+sharepoint +search -"MOSS 2007" (FileExtension="doc" OR FileExtension="docx")

This example uses the FileExtension managed property to filter the result set. One important thing to note about property filters is that they apply to the entire result set. So, referring back to the original keyword syntax example, this query

sharepoint search

returns all results matching either “sharepoint” or “search”. However, if the Word document property filter is applied, here’s how it would look:

sharepoint search (FileExtension="doc" OR FileExtension="docx")

You might expect, given the syntax of keyword queries, that this query would return results matching “SharePoint” or “search”, or having a FileExtension of doc or docx. Instead, the query actually returns results containing either “SharePoint” or “search” where the FileExtension is doc or docx. Effectively, the property filter is applied to the result set of the keyword query.

There are many default managed properties in SharePoint 2010 that allow search results to be filtered using metadata such as CreatedBy, ContentType, Department, or even things like PictureHeight. By using these built-in properties and defining domain-specific properties, you can easily build targeted search queries.

Federation Object Model
Earlier I discussed how the query architecture is responsible for servicing end-user queries. As mentioned, this is done using one or more query components, right? Usually yes, but I have to admit that I wasn’t telling the whole truth. It is true to say that queries performedagainst content that’s crawled by SharePoint are serviced via query components. However, SharePoint 2010 also incorporates the concept of federation, meaning that search queries can be serviced directly by external search providers.

Strictly speaking, the functionality of the Federation Object Model is implemented on the web front end, but logically it dictates how queries are performed and therefore I’ve listed it as a query component. The Federation Object Model provides a layer of abstraction between the front-end web parts, used to process search queries, and the physical destination of the query server. In plain English, this means that the front end web parts can be used to query and retrieve results from any search engine that is supported by the Federation Object Model. For example, it’s possible to use the out-of-the-box search web parts to perform queries against the product. To take the example even further, it’s possible to perform queries against the product catalog as well as the content index from our SharePoint farm and return the highlights of the combined result set in a single web part. Figure 1illustrates where the Federation Object Model sits relative to other components.

Figure:The Federation Object Model relative to other search components

The Federation Object Model relative to other search components

As illustrated, the Federation Object Model exists as an abstraction layer between web parts that implement search functionality and the Query Object Model. As you’ve seen, the Query Object Model provides a standard mechanism for communicating with search service applications within SharePoint. The Federation Object Model takes this abstraction a step further by allowing external search engines to be used to service search queries.

Out of the box, SharePoint 2010 allows you to connect to three types of locations:

  • SharePoint Search The default search provider that’s installed with SharePoint.
  • FAST Search An add-in search provider that enhances the capabilities of SharePoint Search.
  • OpenSearch 1.0/1.1 This is where the real power of federated search comes in. OpenSearch is an open standard for communicating with search engines.But is now used by hundreds of search engines under the terms of a creative commons license. By using OpenSearch, you can query and retrieve results from practically every major search engine. As an aside, OpenSearch is used in Internet Explorer 8 to add search providers to the Instant Search list.

As discussed earlier, the Query Object Model is a common interface for all SharePoint Search applications. Although the Federation Object Model makes use of distinct runtime classes for SharePoint Search and FAST Search, namely the FASTSearchRuntime class and the SharePointSearchRuntime class, both of these classes use the Query Object Model to communicate with the underlying search application. It’s possible to use the runtime class directly from within our code. One of the benefits of this is that federation location settings are defined at the search service application level and these settings are used automatically by the runtime. This is much simpler than manually configuring each property when using the Query Object Model.

Query Web Service
As shown in Figure, the Query Web Service communicates with SharePoint Search via the Query Object Model. As a result of this, search federation functionality is not available when issuing queries via web service.

As well as the Query Web Service, SharePoint 2010 also provides a Really Simple Syndication (RSS) application programming interface (API) for retrieving search results. Again, since the RSS API uses the Query Object Model, federation functionality is not available. Using the RSS API is relatively straightforward. It’s simply a case of building a query string that contains the appropriate search criteria. Following are the main values for the query string:

  • k The search query text (that is, +sharepoint +search(FileExtension=docx))
  • s The search scope to use (that is, All Sites)
  • start The number of the first result to return (that is, 10)

The RSS API can be accessed. For example, the following URL can be used to create an RSS feed for results matching “sharepoint” and “search”:

Custom Ranking Model
When determining the order of search results, SharePoint uses two types of ranking: querydependent ranking, also known as dynamic ranking, and query-independent ranking, also knownas static ranking. However, one thing that does merit some discussion is the ability touse custom ranking models in SharePoint 2010.

In MOSS 2007, it was possible to alter the ranking model using the SearchAdministration Object Model. In effect, this meant altering the weights for particularmanaged properties in the case of query-dependent ranking. The problem with thisapproach was two-fold: it was possible to make changes only programmatically, and it appliedacross the board to all searches performed using a particular Shared Service Provider.

With SharePoint 2010, it’s now possible to create custom ranking models using XMLand apply them on an individual query basis. For example, when using the Core ResultsWeb Part, setting the DefaultRankingModelId to the identifier for a custom ranking modelwill apply that model to all results rendered in the web part.

Front-End Components
To make it quick and easy to generate a search user interface, SharePoint 2010 provides a number of web parts out of the box. All of the web parts target the Federation Object Model and therefore support many different types of search results.

Capturing Search Queries
The following web parts provide a user interface that allows the user to build search queries:

  • SearchBoxEx This web part provides a basic search query interface. It provides a scopes drop-down and a textbox for entering keywords. A link can also be provided to a page containing an AdvancedSearchBox web part.
  • AdvancedSearchBox This web part expands on the user interface of the SearchBoxEx web part to allow the user to create complex queries by selecting from a range of options, including language and result type. The AdvancedSearchBox web part also supports the addition of property filters.

Both query web parts work in a similar fashion—they build up a query string that is then used when redirecting to a results page.

Displaying Search Results
Earlier we looked at the Query Object Model and the various types of results that are returned. The main factor in deciding which web part to use to display search results is the type of results to be displayed and the default formatting of the results. The following web parts can be used to display search results:

  • CoreResultsWebPart Used to render results of type RelevantResults.
  • FederatedResultsWebPart Used to render results of type RelevantResults. The key difference between the Federated Results Web Part and the CoreResultsWebPart is that results in the latter can be paged by including a Search Paging WebPart on the page. Also, the Federated Results Web Part requires that a location is specified, whereas, to provide backward compatibility, the Core Results WebPart automatically uses the default search provider defined by the Search Service Application.
  • PeopleCoreResultsWebPart Used to render the results of people searches. Derived from the CoreResultsWebPart, results are displayed in a specific format and have different sort options more appropriate to a people search.
  • TopFederatedResultsWebPart Returns an aggregated set of top results from a number of federated locations.
  • VisualBestBetWebPart Displays results of type VisualBestBets. As described earlier, Visual Best Bets are a feature of FAST Search, and although this web part can be added to sites without FAST Search enabled, no results will be displayed.
  • HighConfidenceWebPart Displays results of type HighConfidenceResults as well as SpecialTermResults.
  • SearchStatsWebPart Displays information about the last query executed in the CoreResultsWebPart.
  • SearchSummaryWebPart Includes a summary of the search query. In effect this implements “Did you mean” functionality, whereby if you start entering a keyword, suggested keywords will be shown that will generate more results.
  • SearchPagingWebPart Supports paging of the results displayed in a CoreResultsWebPart.

Shared Query Manager
One significant change between MOSS 2007 and SharePoint 2010 is in the way search web parts are implemented. With SharePoint 2010, each page containing search web parts has a single instance of the SharedQueryManager object that is shared between all web parts on the page. Through this object we can easily create custom web parts that can hook into the query pipeline at a few different points.

The following code snippet shows how to data bind search results to a repeater control rather than using the traditional XSLT-based rendering approach. Such a technique is useful if controls are required within search results that have post-back functionality. For example, in a shopping cart application, an Add to Basket button may be required.

Refining Search Results
The RefinementWebPart can be used to generate a series of refinement filters automatically for a given result set. Refinements are basically property filters that are automatically derived from a result set. So, for example, if a result set contains results of different types such as web pages and Word documents, the refinement web part will show refinement option for web pages or Word documents.

Let’s look at how the RefinementWebPart works in a bit more detail since a good understanding of the inner workings will make it easier for you to configure.

From the following class diagram, you can see that a number of objects are involved in rendering the RefinementWebPart. The first object to consider is the RefinementManager, the engine that processes the configuration for the web part. Refinement configuration can be done at the web part level, by changing the value of the Filter Category Definition property.

Refining Search Results

The Filter Category Definition property accepts XML and typically will have the following structure:

Each Category element is represented by the FilterCategory class. When the control is rendered, a Category is displayed as a section title in the refinement panel. The FilterCategory class defines various properties for controlling the presentation of the section title, such as the text to be displayed and the maximum number of child elements to display. Probably the most important property of the FilterCategory class is the FilterType; this property contains the type name of a class derived from RefinementFilterGenerator that is used to generate a list of child elements to appear within the section. So, for example, if the section was titled Modified Date, the child elements would probably be a list of dates. To generate these dates from the result set, a ManagedPropertyFilterGenerator is used. Effectively the filter generator extracts the values of a particular managed property from the query result set and displays the distinct values in the refinement control.

Because managed properties may have a wide range of values, the ManagedPropertyFilterGenerator allows results to be further grouped before displaying them by introducing an additional ManagedPropertyCustomFilter class. So continuing with our Modified Date example, if we wanted to show a refinement for all documents modified within the past 24 hours rather than a list of the exact times of each modification, we could use a ManagedPropertyCustomFilter class configured to show results only within the past day. Here’s a typical example of custom filter configuration:

We’ve looked at the out-of-the-box refinement web part; and you should be able to see how it can be easily configured to generate refinement options automatically using XML. However, with SharePoint 2010, we can take the concept of refinement a step further and create our own refinement controls. The RefinementManager object is shared between all web parts on a page. This means that we can create a web part that picks up a reference to this object and uses it for providing further data visualization support to refinement data.

The following code sample generates a simple web part that renders a pie chart with slices for each type of document returned in the search results. A pie chart is shown in the illustration that follows the code sample.

Refining Search Results

All rights reserved © 2020 Wisdom IT Services India Pvt. Ltd Protection Status

Share Point 2010 Topics