Enterprise search comprises many components—some were mentioned briefly in earlier; others have been specifically designed to provide enterprise-class search functionality and are covered here.
Before we delve into the components that we, as developers, are most likely to use to meet the specific requirements of our application, let’s take a brief look at how enterprise search works in SharePoint. Every search solution has three main elements: the front end web server, query architecture, and crawl architecture. In that respect, SharePoint 2010 is not remarkably different from MOSS 2007. However, as you’ll see when we drill down a bit, the way in which these three elements are implemented by SharePoint Server 2010 offers a higher degree of flexibility.
The first element in which most of our development work will be done is the front-endweb server. The web server acts as the presentation layer for our search solution and hosts the pages and controls that will be used to capture queries and display results.
The next element in the solution is the query architecture, which consists of one or more query servers, each responsible for directly servicing all or part of a search query. This is the business logic layer of our search solution.
The final element is the crawl architecture. While the query architecture is responsible for servicing end user queries, the crawl architecture is responsible for scanning connected data sources and producing indexes of the content found. In addition to producing the index, the crawl architecture also generates a properties database. As you’ll see when we get into how queries are executed, there’s a big difference between the index file and the property database.
Enhancements in SharePoint 2010
Within MOSS 2007, although enterprise search was capable of supporting a large corpus and tens of thousands of users, the overall topology suffered a few problems. For example, each shared service provider could use only a single index server. Notwithstanding the hardware requirements for this single server on very large farms, this was a major issue because the index server became a single point of failure. Another major drawback was the physical size of the index files. Although this was somewhat mitigated by using a number of smaller shadow index files, ultimately all the index files had to be merged into a single master file, and the master file had to be present on all query servers in the farm. The physical hardware required to support this was significant.
With SharePoint 2010, these problems have been addressed by subdividing the query architecture and the crawl architecture into a number of smaller, more scalable components. For example, rather than a single index server, SharePoint 2010 introduces the concept of a crawl component. Using crawl components, you can add multiple index servers to a farm, each running one or more crawl components.
Now that you understand how an enterprise search solution is implemented using SharePoint Server 2010, let’s look at some of the configurable components—starting with how the crawl architecture can be extended by adding indexing components.
Search Connector Framework
From our perspective as software developers, one of the most significant aspects of the crawl architecture is the Search Connector Framework, the preferred mechanism used by the crawler component when accessing data to be indexed. The Search Connector Framework should be familiar to you at this point in the book, since it’s based on Business Connectivity Services (BCS).
A number of different properties can be attached to Business Data Connectivity (BDC) metadata when you’re creating a search connector. At a minimum, however, the following additions need to be made to configure a search connector properly. First, to make sure that the BDC adaptor is visible in the Search user interface, the ShowInSearchUI property must be set on the LobSystemInstance object.
The next property that needs to be added is RootFinder, which is attached to the finder method that will be called to enumerate the items to be crawled. For example, if our search connector were crawling data in a database table, the RootFinder method would return a list of identifiers for all items to be crawled. The crawl process would then make use of the SpecificFinder method to perform the actual crawl of each field in the row.
To support incremental crawls, entities should include a LastModified TimeStamp column. So that the crawler knows which column is the time stamp, the LastModified Time Stamp Field property should be added to the finder method instance.
You can then configure a Search Connector for a content source as follows:
Protocol Handlers and IFilters
In earlier versions of SharePoint, the index server made use of components known as protocol handlers to connect to content to be indexed. Although protocol handlers are still used in SharePoint 2010, their inclusion is mainly for backward compatibility. The preferred solution for accessing external content is via the Search Connector Framework.
TIP:In SharePoint 2010, when you’re configuring search connectors, the terminology used in the user interface can be confusing. For example, when you’re adding a new content source, the options available include Line of Business Data and Custom Repository. It would be reasonable to assume that selecting Custom Repository would be the correct way to use the Search Connector Framework; however, this isn’t the case. The Custom Repository option is used to reference custom protocol handlers. To make matters more confusing, the user interface often refers to protocol handlers as “custom connectors.” The thing to bear in mind is that the Search Connector Framework is a set of extensions to BCS, and from a user interface perspective, we’re still using BCS. As you’ll see later, the correct way to use the Search Connector Framework is to select a Content Source Type of Line of Business Data.
Working with Content Sources
Building on our understanding of how connections are made to index content physically, let’s look at what happens to that content as part of the indexing process.
You know that the Search Connector Framework can be used to crawl and index content from a wide variety of sources. Each source is defined as a separate entity within SharePoint known as a content source. As well as defining content retrieved via specific connectors, content sources can also be used to subcategorize content within the wider SharePoint farm. For example, a farm may use a content source to define a set of data from a particular site collection.
You should be aware that it’s impossible to create overlapping content sources. For example, it’s impossible to create a content source with a start address and then create another content source with a start address.
TIP:When configuring a search on larger farms, it’s important that you determine which content is most likely to be updated frequently. Since content crawls run on a schedule, it’s good practice to split the corpus into a number of smaller content sources. Doing this will allow greater control over how frequently particular content is indexed and therefore how current search results are for that content. As well as content sources, which define the starting point of any search crawl, SharePoint also allows us to define crawl rules. You can use crawl rules to exclude certain files or folders, or to specify that particular credentials should be used when accessing particular files or folders. An important new feature in SharePoint 2010 is the ability to use regular expressions when defining crawl rules.
Working with Managed Properties
As mentioned earlier, when content is crawled, an index of the content is created along with a property database. Generally speaking, the index contains the main body of the content, whereas the property database contains metadata. So to give an example, when a Word document is crawled, the contents of the document are included in the index, and any properties of the document such as the title, author, or the creation date are added to the property database. The property database contains details of all metadata properties for each item indexed by the crawler process. However, since different items may define the same metadata in different ways, SharePoint incorporated the notion of managed properties.
A managed property is a logical grouping of crawled properties from one or more indexed content types. For example, when an Excel spreadsheet is crawled, author metadata will be retrieved and stored in the property database; however, if an MP3 file is crawled, artist metadata will be retrieved. Logically, both artist and author could be grouped into a managed property named Creator, for example. By making the grouping, it becomes possible for you to search multiple content types using a common set of attributes without your having to understand how those attributes map to the underlying metadata of the content.
Mapping crawled properties to managed properties is particularly important when you’re indexing SharePoint content, since each column in a list or library is stored as a crawled property. When it comes to properties such as Title or Created By, the mapping is straightforward, since these properties are present on every item and therefore the mapping is simply one-to-one. However, as you create custom content types to accommodate your application data structure, the mapping becomes a bit more involved. Mapping crawled properties to managed properties does not occur automatically. If, for example, you have a list named Product containing a Product Name column and a second list named Orders also containing a column named Product Name, these two columns will not be automatically mapped to a managed property. You would physically need to map both crawled properties to a new managed property.
The key thing to be aware of with respect to managed properties versus crawled properties is that only managed properties can be displayed in search results or used for filtering or refining results.
Working with Scopes
We’ve covered how content can be split up using content sources and how metadata can be used by created managed properties; let’s move on to consider one important use of content sources and managed properties: the creation of scopes. For now let’s build up an understanding of what Scopes are and why we might use them.
When a query is executed using the Query Object Model, it’s performed against the entire search index. Sometimes this behavior doesn’t make sense for a number of reasons: if we already know the type of content that we’re looking for, it makes sense to search content of only that type, or if we already know which web site contains the content that we’re looking for, it doesn’t make sense to search all web sites.
Search scopes allow us to define subsets of the search index based on a series of rules. These rules can include only content from a particular content source, only content where a managed property has a specific value, or only content from a specific URL. Additionally, complex combinations of rules can be created to restrict the scope to the content that is appropriate for our search application. For example, if we were implementing a search feature for retrieving technical specification documents, and we knew that these documents existed only within the engineering department web site, we could define a scope that included only content of type technical specification and included only results from the engineering department content source.
We could refine this example further if necessary. Let’s say that some of the technical specifications were flagged as confidential. We could exclude those from search results by creating a managed property that referred to the confidential flag, and then using that managed property in a rule that specifically excluded those documents from the scope. As you can see, by using scopes, you can increase the relevance of search results by restricting the search area to an appropriate subset of the entire index.
As you can imagine, the most important requirement for a search solution is the ability to perform queries. Let’s move on to take a look through the various components available to us as developers. As you’ll see, SharePoint 2010 delivers a number of interfaces covering a range of scenarios.
Query Object Model
Enterprise search in SharePoint 2010 provides a Query Object Model that allows developers to use the capabilities of search programmatically within custom applications. The core class of the Query Object Model is the Query abstract class, which has two concrete implementations: the FullTextSqlQuery class, which can be used to issue full-text SQL syntax queries to the search provider, and the KeywordQuery class, which can be used to issue keyword syntax queries. The Query Object Model can be used to query any SharePoint Search application, whether it’s a default SharePoint Search provider or a FAST Search for SharePoint provider.
One thing to bear in mind is that SQL syntax queries are supported only when using SharePoint Search. The examples that follow focus on keyword syntax queries.
Using the Query Object Model is relatively straightforward, as this example illustrates:
Notice a few interesting things about this code sample. First, take a look at the Result Table Collection object that’s returned by the Execute method. The Results Table Collection is an IEnumerable collection of Result Table objects. Each query can therefore return multiple result sets as defined by the Resul tTypes property of the Query class. In this code sample, only Relevant Results are selected, but multiple result sets can be retrieved by performing a bitwise combination of two or more Result Type enumerations, as shown:
Result Types Returned by the Query Object Model Let’s take a look at the various types of results that can be included in the ResultTableCollection:
Common Query Language
One of the benefits of the Query Object Model is the availability of a common query language that works across all services supported by the Query Object Model. In practice, two query languages are available to the Query Object Model, keyword syntax and SQL syntax, although only keyword syntax is supported across all services.
Keyword syntax is relatively straightforward and will be familiar to users of any search engine. In its simplest form, a query consists of one or more keywords. For example, to return all documents containing the words “SharePoint” or “Search,” a user would enter this:sharepoint search
If the results contained links pertaining to “MOSS 2007”, the user could exclude these results by changing the query to this:sharepoint search -"MOSS 2007"
If the result set contained documents relating to Google search, for example, the user could alter the query to return only documents containing the words “SharePoint” and “search” by changing the query like so:+sharepoint +search -"MOSS 2007"
As you can see, basic keyword syntax is pretty intuitive.
Using Property Filters
As mentioned, when crawling content, an index and a property database are created. From the database of crawled properties, you can create managed properties, which, as you discovered earlier, are logical groupings of crawled properties. One of the main uses of managed properties is for filtering search results. As well as the basic keyword syntax, the common query language allows you to use property filters to return only results in which a managed property is set to a particular value.
So to pick up on the earlier example, if you wanted to return only Word documents matching your keywords, the query could be changed to this:+sharepoint +search -"MOSS 2007" (FileExtension="doc" OR FileExtension="docx")
This example uses the FileExtension managed property to filter the result set. One important thing to note about property filters is that they apply to the entire result set. So, referring back to the original keyword syntax example, this querysharepoint search
returns all results matching either “sharepoint” or “search”. However, if the Word document property filter is applied, here’s how it would look:sharepoint search (FileExtension="doc" OR FileExtension="docx")
You might expect, given the syntax of keyword queries, that this query would return results matching “SharePoint” or “search”, or having a FileExtension of doc or docx. Instead, the query actually returns results containing either “SharePoint” or “search” where the FileExtension is doc or docx. Effectively, the property filter is applied to the result set of the keyword query.
There are many default managed properties in SharePoint 2010 that allow search results to be filtered using metadata such as CreatedBy, ContentType, Department, or even things like PictureHeight. By using these built-in properties and defining domain-specific properties, you can easily build targeted search queries.
Federation Object Model
Earlier I discussed how the query architecture is responsible for servicing end-user queries. As mentioned, this is done using one or more query components, right? Usually yes, but I have to admit that I wasn’t telling the whole truth. It is true to say that queries performedagainst content that’s crawled by SharePoint are serviced via query components. However, SharePoint 2010 also incorporates the concept of federation, meaning that search queries can be serviced directly by external search providers.
Strictly speaking, the functionality of the Federation Object Model is implemented on the web front end, but logically it dictates how queries are performed and therefore I’ve listed it as a query component. The Federation Object Model provides a layer of abstraction between the front-end web parts, used to process search queries, and the physical destination of the query server. In plain English, this means that the front end web parts can be used to query and retrieve results from any search engine that is supported by the Federation Object Model. For example, it’s possible to use the out-of-the-box search web parts to perform queries against the product. To take the example even further, it’s possible to perform queries against the product catalog as well as the content index from our SharePoint farm and return the highlights of the combined result set in a single web part. Figure 1illustrates where the Federation Object Model sits relative to other components.
Figure:The Federation Object Model relative to other search components
As illustrated, the Federation Object Model exists as an abstraction layer between web parts that implement search functionality and the Query Object Model. As you’ve seen, the Query Object Model provides a standard mechanism for communicating with search service applications within SharePoint. The Federation Object Model takes this abstraction a step further by allowing external search engines to be used to service search queries.
Out of the box, SharePoint 2010 allows you to connect to three types of locations:
As discussed earlier, the Query Object Model is a common interface for all SharePoint Search applications. Although the Federation Object Model makes use of distinct runtime classes for SharePoint Search and FAST Search, namely the FASTSearchRuntime class and the SharePointSearchRuntime class, both of these classes use the Query Object Model to communicate with the underlying search application. It’s possible to use the runtime class directly from within our code. One of the benefits of this is that federation location settings are defined at the search service application level and these settings are used automatically by the runtime. This is much simpler than manually configuring each property when using the Query Object Model.
Query Web Service
As shown in Figure, the Query Web Service communicates with SharePoint Search via the Query Object Model. As a result of this, search federation functionality is not available when issuing queries via web service.
Query RSS API
As well as the Query Web Service, SharePoint 2010 also provides a Really Simple Syndication (RSS) application programming interface (API) for retrieving search results. Again, since the RSS API uses the Query Object Model, federation functionality is not available. Using the RSS API is relatively straightforward. It’s simply a case of building a query string that contains the appropriate search criteria. Following are the main values for the query string:
The RSS API can be accessed. For example, the following URL can be used to create an RSS feed for results matching “sharepoint” and “search”:
Custom Ranking Model
When determining the order of search results, SharePoint uses two types of ranking: querydependent ranking, also known as dynamic ranking, and query-independent ranking, also knownas static ranking. However, one thing that does merit some discussion is the ability touse custom ranking models in SharePoint 2010.
In MOSS 2007, it was possible to alter the ranking model using the SearchAdministration Object Model. In effect, this meant altering the weights for particularmanaged properties in the case of query-dependent ranking. The problem with thisapproach was two-fold: it was possible to make changes only programmatically, and it appliedacross the board to all searches performed using a particular Shared Service Provider.
With SharePoint 2010, it’s now possible to create custom ranking models using XMLand apply them on an individual query basis. For example, when using the Core ResultsWeb Part, setting the DefaultRankingModelId to the identifier for a custom ranking modelwill apply that model to all results rendered in the web part.
To make it quick and easy to generate a search user interface, SharePoint 2010 provides a number of web parts out of the box. All of the web parts target the Federation Object Model and therefore support many different types of search results.
Capturing Search Queries
The following web parts provide a user interface that allows the user to build search queries:
Both query web parts work in a similar fashion—they build up a query string that is then used when redirecting to a results page.
Displaying Search Results
Earlier we looked at the Query Object Model and the various types of results that are returned. The main factor in deciding which web part to use to display search results is the type of results to be displayed and the default formatting of the results. The following web parts can be used to display search results:
Shared Query Manager
One significant change between MOSS 2007 and SharePoint 2010 is in the way search web parts are implemented. With SharePoint 2010, each page containing search web parts has a single instance of the SharedQueryManager object that is shared between all web parts on the page. Through this object we can easily create custom web parts that can hook into the query pipeline at a few different points.
The following code snippet shows how to data bind search results to a repeater control rather than using the traditional XSLT-based rendering approach. Such a technique is useful if controls are required within search results that have post-back functionality. For example, in a shopping cart application, an Add to Basket button may be required.
Refining Search Results
The RefinementWebPart can be used to generate a series of refinement filters automatically for a given result set. Refinements are basically property filters that are automatically derived from a result set. So, for example, if a result set contains results of different types such as web pages and Word documents, the refinement web part will show refinement option for web pages or Word documents.
Let’s look at how the RefinementWebPart works in a bit more detail since a good understanding of the inner workings will make it easier for you to configure.
From the following class diagram, you can see that a number of objects are involved in rendering the RefinementWebPart. The first object to consider is the RefinementManager, the engine that processes the configuration for the web part. Refinement configuration can be done at the web part level, by changing the value of the Filter Category Definition property.
The Filter Category Definition property accepts XML and typically will have the following structure:
Each Category element is represented by the FilterCategory class. When the control is rendered, a Category is displayed as a section title in the refinement panel. The FilterCategory class defines various properties for controlling the presentation of the section title, such as the text to be displayed and the maximum number of child elements to display. Probably the most important property of the FilterCategory class is the FilterType; this property contains the type name of a class derived from RefinementFilterGenerator that is used to generate a list of child elements to appear within the section. So, for example, if the section was titled Modified Date, the child elements would probably be a list of dates. To generate these dates from the result set, a ManagedPropertyFilterGenerator is used. Effectively the filter generator extracts the values of a particular managed property from the query result set and displays the distinct values in the refinement control.
Because managed properties may have a wide range of values, the ManagedPropertyFilterGenerator allows results to be further grouped before displaying them by introducing an additional ManagedPropertyCustomFilter class. So continuing with our Modified Date example, if we wanted to show a refinement for all documents modified within the past 24 hours rather than a list of the exact times of each modification, we could use a ManagedPropertyCustomFilter class configured to show results only within the past day. Here’s a typical example of custom filter configuration:
We’ve looked at the out-of-the-box refinement web part; and you should be able to see how it can be easily configured to generate refinement options automatically using XML. However, with SharePoint 2010, we can take the concept of refinement a step further and create our own refinement controls. The RefinementManager object is shared between all web parts on a page. This means that we can create a web part that picks up a reference to this object and uses it for providing further data visualization support to refinement data.
The following code sample generates a simple web part that renders a pie chart with slices for each type of document returned in the search results. A pie chart is shown in the illustration that follows the code sample.
Share Point 2010 Related Interview Questions
|Web Services Interview Questions||XML Interview Questions|
|Share Point 2010 Interview Questions||ASP.NET Interview Questions|
|Share Point Administration Interview Questions||BizTalk Admin Interview Questions|
|Microsoft Office SharePoint Server (MOSS) Interview Questions||Biztalk Server Interview Questions|
|Asp Dot Net Mvc 4 Interview Questions||Biztalk Esb Toolkit Interview Questions|
|InfoPath Interview Questions|
All rights reserved © 2020 Wisdom IT Services India Pvt. Ltd
Wisdomjobs.com is one of the best job search sites in India.