2

Extensions - Full-Text Search

Posted by Danielle Smith on 16:17 in ,
Good afternoon everyone and Happy Friday!

Today's blog post will cover the Full-Text Search - which is one of the Extended Features of SQL Server 2008. Although it was possible to use a kind of Full-Text Search prior to the 2008 release, this version included a fully integrated solution so there was no longer a need to run an external search service.

What is a Full-Text Search?

A Full-Text Search allows you to search for specific keywords within database records which cannot be traditionally found using the standard LIKE clause. It is possible to write queries which:

  • Return inflectional forms of a verb that you input. 
  • Return synonyms for a particular word. 
  • Returns results from data stored within a PDF file (stored within a varbinary(MAX) database field).

Full-Text Search Terminology

Below, I have listed some of the common terminology that you may find associated to the Full-Text Search (taken and reworded from Microsoft SQL Server 2008 - Database Developer Training Kit: MCTS Exam 70-433 p 335-336, ISBN: 978-0-7356-2639-3):

Term 

The Term is the string input for the full-text query; the word/phrase that you are searching for.

Full-Text Catalogue 

The Full-Text Catalogue represents a group of Full-Text Indexes.

Full-Text Index 

A Full-Text Index is an index that has been optimised especially to search within text database fields. In order to run a full-text search against a column, you must have a full-text index on the column first. You may only have one full-text index per table or view and there must be a unique key on the specific table already.

Word Breaker

The Word Breaker is a function that finds the tokens that defines each individual word in a string.

Token

A token is a word or character string defined by the Word Breaker.

Stemmer

The stemmer conjugates (or converts) verbs based on the linguistic rules of the defined language.

Thesaurus

Synonyms for a particular language can be stored within an XML document called the thesaurus. You can then create full-text searches for those synonyms within that particular language.

Stopword 

A word that is ignored within a search. A perfect example would be to Google a sentence and click on search. It has a list of stopwords which provide no meaning to the result set brought back. These often include "a", "and" and "the".

Types of Full-Text Search

There are 4 main types of Full-Text Search:

CONTAINS

CONTAINS allows you to find both exact matches and also less precise matches to the query input and takes 2 arguments:

IncludedColumns - which can take any of the following:
  • Column_name - This is for when you want to search on a single column.
  • Column_list - This is for when you want to search on multiple columns.
  • (*) - The asterisk states that all full-text-enabled columns should be searched upon.
  • Language -  This is the set language option that is being used by SQL Server.

SearchCondition - which can take any of the following:
  • Simple Term - A word or phrase that you wish to search for.
  • Prefix Term - A word or characters that you wish to search for as a prefix to other words. This should be followed by an asterisk (which symbolises 0 or more characters after the specified prefix). For example "Fish*" will return both "Fish" and "Fishing". 
  • Generation Term - Can be either INFLECTIONAL or THESAURUS:
    • INFLECTIONAL uses the stemmer to find relevant verbs. 
    • THESAURUS uses a a thesaurus file to look for synonyms of the same search term. 
  • Proximity Term - States that the term on the left should be close to the term on the right. Can be either NEAR or ~ (tilde) however both work exactly the same way.
  • Weighted Term - Determines the weight of a weighed query. 
  • Logical Operators
    • AND - Both parts of the query should be met. 
    • OR - Either part of the query should be met. 
    • AND NOT - The first part of the query should be true and the second part must be false.

CONTAINSTABLE

CONTAINSTABLE works for the same purpose and has the same search conditions as CONTAINS, however it will also allow you to return a "relevance value" (RANK) and the full-text key (KEY) for each data row. In addition to the arguments listed about for CONTAINS, it also takes:

  • Table - Which tells SQL Server the name of the table which has the full-text search enabled. 
  • Top_n_by_rank - Indicates that only the top number of rows with the highest ranking should be returned.

FREETEXT

FREETEXT makes it possible to match the actual meaning of the search condition. The word breaker is used to split the string into comprehensible words. Stemming is performed and then the thesaurus file is used to look at synonyms with the same meaning as the search term entered.

It uses the following arguments:

  • Column - This can either be a single column, multiple columns or all columns (*). 
  • Free-text string - This stores the string that is to be searched. 
  • Language - This is the set language option that is being used by SQL Server.

FREETEXTTABLE

FREETEXTTABLE works for the same purpose and has the same search conditions as FREETEXT and CONTAINSTABLE and includes the following:

  • Table - Which tells SQL Server the name of the table which has the full-text search enabled. 
  • Top_n_by_rank - Indicates that only the top number of rows with the highest ranking should be returned.
  • Column - This can either be a single column, multiple columns or all columns (*). 
  • Free-text string - This stores the string that is to be searched. 
  • Language - This is the set language option that is being used by SQL Server.

Overall...

The Full-text Search is an incredibly useful tool, particular when searching for keywords within a PDF file which I have come across as a requirement from a client in the past. Personally, I wouldn't say it's too difficult to implement and it really makes it much easier to search for those pesky keywords. This can and will be expanded on at a later date to show working examples on how to set up a Full-Text Index and a Full-Text Search Query.

What Next...

On Monday, I will be discussing the SQL Server Service Broker, which was initially released in the 2005 version but has been expanded on since. Stay tuned!

As A Final Word...

Thank you for reading today's blog post! If you have any questions/comments/feedback, please leave them in the comments section below and I will get back to you as soon as I can. Alternatively, please like my SQL Genius Facebook Page and leave a message on there.

Have a great weekend everyone! :) 

2 Comments


Hi Danielle,

May I ask from what books / articles this post was taken from please?

Thanks!


Hi Ossie,

Thank you for taking the time to comment on my blog!

I used the Microsoft SQL Server 2008 Database Training Kit (70-433) though unfortunately this was a few years ago now when I wrote this post.

Hope this helps,

Danielle

Post a Comment

Please post any feedback or comments here...

Copyright © 2009 SQL Genius - Personal Development of a Junior All rights reserved. Theme by Laptop Geek. | Bloggerized by FalconHive.