Querying with iFTS - SQL Server 2008

SQL Server provides four ways to query a full-text index. The FREETEXT and CONTAINS predicates canbe used to retrieve rows that match a given search criteria from a table, in much the same way that the EXISTS predicate returns rows that meet specific criteria. The FREE TEXT TABLE and CONTAINS TABLEfunctions return rowsets with two columns: KEY, which is a row identifier (the unique index value specified when the full-text index was created), and RANK, which is a relevance rating. Here, we’ll demonstrate how to use these predicates and functions for iFTS queries.

FREETEXT Predicate Searches

The FREETEXT predicate allows you to search a character-based column, or columns, of a full-text index for words that match inflectional and thesaurus expansion and replacement forms of a freetext search string. The FREETEXT predicate accepts a column name or list of columns, a free-text search string, and an optional LCID, and performs the following steps:

  • It uses a word-breaker to break a free-text string into individual words.
  • It identifies expansions and replacements for words based on the FTS language-specific thesaurus.

Because it is a predicate, FREETEXT can be used in the WHERE clause of a SELECT query. All rows for which the FREETEXT predicate returns true (a match) are returned when FREETEXT is used.

The following is a sample FREETEXT query that uses the full-text index created on the AdventureWorks Production.ProductModel table in the previous section:

SELECT *
FROM Production.ProductModel
WHERE FREETEXT(*, N'bike');

The wildcard character (*) used inside the FREETEXT predicate parentheses indicates that all columns included in the full-text index should be searched for a match. The second FREETEXT parameter is the word you want to match. The query returns ten matching rows from the Production.ProductModel table.

The new integration of the full-text query engine with the SQL Server query engine provides a more efficient full-text search experience than previous versions of FTS. iFTS integration allows SQL Server to take advantage of new operators, such as the Table Valued Function [FulltextMatch] operator (shown in Figure 10-12), which replaces the expensive Remote Scan operator in SQL Server 2005.

FREETEXT Predicate Searches

The FREETEXT predicate finds both exact and approximate matches of words, using two methods:

  • By stemming words to locate inflectional forms of a word
  • By performing word expansions and replacements based on language-specific thesaurus files, as explained in the “Managing Thesaurus Files” section later in this chapter

The FREETEXT predicate and FREETEXTTABLE function (described shortly) automatically stem words to find inflectional forms. The inflectional forms of a word include plural nouns, verb conjugations, and other variants of a word. The following example demonstrates a FREETEXT query that performs automatic stemming of the word ride. The inflectional forms of ride include the verb conjugations rides, rode, ridden, and riding, as well as the nouns rider and riders.

SELECT *
FROM Production.ProductModel
WHERE FREETEXT(CatalogDescription, N'ride', LANGUAGE 1033);

This query retrieves all rows that contain inflectional forms of the word ride in the CatalogDescription column. Inflectional forms that are matched in this query include the plural noun riders and the verb riding. In this FREETEXT query, the CatalogDescription column is identified by name to restrict the search to a single column, and the LANGUAGE specifier is used to indicate LCID 1033, which is US English. The results include six rows that match inflectional forms of the word ride. You can replace the word ride in the FREETEXT predicate with any of the inflectional forms we listed previously, and you will get the same results.

CONTAINS Predicate Searches

The CONTAINS predicate allows more advanced full-text searching options than the FREETEXT predicate.

Like FREETEXT, the CONTAINS predicate accepts a column name or list of columns, a search condition, and an optional language identifier as parameters. The power of CONTAINS comes from its ability to handle sophisticated search conditions, as opposed to the simple search strings that FREETEXT accepts. In addition to searching for a simple word or phrase, you can use CONTAINS to search for a word or phrase prefix, a word in proximity to another word, inflectional generations or thesaurus synonyms of words, or combinations of search criteria.

A simple CONTAINS predicate is a basic keyword search, similar to FREETEXT. Unlike FREETEXT though, CONTAINS will not automatically search for inflectional forms of a word or thesaurus synonyms.

The following example demonstrates a simple CONTAINS query that returns three rows.

SELECT *
FROM Production.ProductModel
WHERE CONTAINS(CatalogDescription, N'ride', LANGUAGE 1033);

To tell CONTAINS to use inflectional forms or thesaurus synonyms, use the FORMSOF generation term in your search condition. The next example performs a CONTAINS search on the Name and CatalogDescription columns of the Production.ProductModel table.

SELECT *
FROM Production.ProductModel
WHERE CONTAINS((Name, CatalogDescription), N'FORMSOF(INFLECTIONAL, shift)',
LANGUAGE 1033);

The results include two rows that contain matches for inflectional forms of the word shift (like shifting). Use the FORMS OF (THESAURUS, ... ) format to return matches for synonyms of words, as defined in your language-specific thesaurus files (discussed in the “Managing Thesaurus Files” section later in this chapter).

The CONTAINS predicate allows you to combine simple search terms by using the AND (&), AND NOT (&!), and OR (|) Boolean operators. (Note that there is no support for an OR NOT Boolean operator.) The following is a CONTAINS query with two search terms combined with the OR keyword.

SELECT *
FROM Production.ProductModel
WHERE CONTAINS(Name, N'FORMSOF(INFLECTIONAL, tour) OR mountain');

The results of this sample query, which retrieves all rows containing inflectional forms of the word tour (like touring) or the word mountain in the Name column, are shown in Figure.

Partial results of CONTAINS query with compound search term

Partial results of CONTAINS query with compound search term

CONTAINS also supports prefix searches using the wildcard asterisk (*) character. Place the search word or phrase in double quotes, followed immediately by the * character to specify a prefix search. Here is a simple prefix search to retrieve all rows that have a word that starts with the prefix road in the Name column.

SELECT *
FROM Production.ProductModel
WHERE CONTAINS(Name, N'"road*"');

The partial results of this query are shown in Figure.

Partial results of CONTAINS prefix search

Partial results of CONTAINS prefix search

The CONTAINS predicate also supports proximity searches via the NEAR keyword (or ~ operator). NEAR will return matches for words that are close to one another in the source columns. The followingdemonstrates a NEAR proximity search that looks for instances of the word aluminum that occur close to the word blueprint. The results consist of four rows that contain these words in proximity to one another.

SELECT *
FROM Production.ProductModel
WHERE CONTAINS(Instructions, N'aluminum NEAR blueprint');

A proximity search using NEAR will fail if the words are too far apart. For the CONTAINS predicate, words are considered close to one another if they are found within approximately 50 words of each other.

FREETEXTTABLE and CONTAINSTABLE Function Searches

The FREETEXTTABLE and CONTAINSTABLE functions both operate like their similarly named predicate counterparts, but return result sets consisting of a table with two columns: KEY and RANK. The KEY column contains the key index values relating back to the unique index to matching rows in thesource table. The RANK column contains relevance rankings as determined by iFTS.

Like the FREETEXT predicate, FREE TEXT TABLE accepts a single column name or column list to limit the search, a free-text search string, and an optional language identifier. FREETEXTTABLE also accepts an optional parameter, top_n_by_rank, which limits the rows returned to a specific number ordered by RANK values. Here is an example of a FREETEXTTABLE query:

SELECT *
FROM FREETEXTTABLE(Production.ProductModel, *, N'aluminum', LANGUAGE 1033, 5) ft
INNER JOIN Production.ProductModel pm
ON ft.[KEY] = pm.ProductModelID;

The FREETEXTTABLE function in this example joins back to the source table, Production. ProductModel, via the KEY column. The query also uses the top_n_by_rank option to limit the results to the top five ranking matches. The query returns five rows in descending order by the RANK column.

The CONTAINSTABLE function has the same search capabilities as the CONTAINS predicate. Like the CONTAINS predicate, the CONTAINSTABLE function accepts the name of the source table, a single column name or list of columns, a CONTAINS-style search condition, and an optional language identifier.

The CONTAINSTABLE function also accepts an optional top_n_by_rank parameter, which limits the results returned to a specific number of rows ordered by RANK. The following example demonstrates the CONTAINSTABLE function in a simple keyword search that retrieves KEY and RANK values for all rows containing an inflectional form of the word tour (like touring). The results are shown in Figure.

SELECT *
FROM CONTAINSTABLE(Production.ProductModel, [Name],
N'FORMSOF(INFLECTIONAL, tour)');

Partial results of the CONTAINSTABLE query with inflectional forms

Partial results of the CONTAINSTABLE query with inflectional forms

Among the host of CONTAINS options supported by CONTAINSTABLE is the ISABOUT term, which assigns weights to the matched words it locates. With ISABOUT, each search word is assigned a weight value between 0.0 and 1.0. CONTAINS TABLE applies the weight to the relevance rankings returned in the RANK column. The following example shows two contrasting CONTAINSTABLE queries:

SELECT ct.[RANK], ct.[KEY], pm.[Name]
FROM CONTAINSTABLE(Production.ProductModel, Instructions,
N'washer OR weld OR polish') ct
INNER JOIN Production.ProductModel pm
ON ct.[KEY] = pm.ProductModelID
ORDER BY ct.[RANK] DESC;
SELECT ct.[RANK], ct.[KEY], pm.[Name]
FROM CONTAINSTABLE(Production.ProductModel, Instructions,
N'ISABOUT(washer WEIGHT(1.0), weld WEIGHT(0.5), polish WEIGHT(0.1))') ct
INNER JOIN Production.ProductModel pm
ON ct.[KEY] = pm.ProductModelID
ORDER BY ct.[RANK] DESC;

The first query returns all products with the words washer, weld, or polish in their Instructions column. The second query uses ISABOUT to assign each of these words a weight between 0.0 and 1.0, which is then applied to the result RANK for each row. The results, shown in Figure, demonstrate how ISABOUT weights can rearrange the rankings of your CONTAINSTABLE query results.

Using ISABOUT to change result set rankings

Using ISABOUT to change result set rankings


All rights reserved © 2018 Wisdom IT Services India Pvt. Ltd DMCA.com Protection Status

SQL Server 2008 Topics