Implementing Full Text search in Sql Server Express - Part 1
by Omar AlBadri on 3/29/2009 2:25:00 PM
Read more about Implementing Full text search in Sql Server Express part 1...
Implementing Full text search in Sql Server Express - Part 1
So you have a blogging site or product catalog with a Sql Express backed for a custom application and you want to include a Rank based Full text search engine to your site. You basically have two options
- 1) Upgrade to Sql Server Enterprise Editions which cost about $10,000 or
- Implement it yourself.
In this I will show you how to build a Full Text search engine that will be sorted by rank of how many times the word occurs. Lets look at each piece step by step
What we are trying to do
In this post we will create a simple database called Posts with one table called posts with a couple of fields. We will be looking for the occurrence of the word "the" in the text and title fields.
Setting up the test data
- 1. Open up SQL Server Express and create a database called "Posts"
- 2. Run the script to create a table called posts
CREATE TABLE [dbo].[Posts]
( [id] [int] IDENTITY(1,1) NOT NULL,
[Title] [nvarchar](50) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,
[Text] [nvarchar](1000) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,
CONSTRAINT [PK_Posts] PRIMARY KEY CLUSTERED ( [id] ASC )WITH
(PAD_INDEX = OFF, IGNORE_DUP_KEY = OFF) ON [PRIMARY] ) ON [PRIMARY]
and populate it with some data
insert into posts(title,text) values('One The','The House')
insert into posts(title,text) values('Two The','The house on the corner')
insert into posts(title,text) values('Three The','The house on the corner with the green roof')
insert into posts(title,text) values('Four The','The house on the corner with the green roof and the blue car')
Simple searching
What is simple searching? Simple searching is using a statement like Select * from posts where text like '%the%' While that will return all rows that contain the word "the" it will not return the number of occurrences of the word and it will not rank them based on the number of occurrences
Implement Word counting
We need a fast method of counting the number of occurrence a word has in a string. Sql server does not provide an easy way to count the number of occurrence in a string so we will create a User defined function (UDF) to implement this.
Finding the number of occurrences quickly
The question is is how are we going to implement this? Sql server does not provide an easy method so we are going to have to use a little trick. Look at the sentence below:
The green, green car
Lets say we count the number of letters in the string using the LEN() function. What we should get is 19. Now lets say we are looking for the amount of times the word "green" occurs in this string. So lets say we replace the word "green" with "greenx" using the Function REPLACE(word1,word2). So now we have the sentence:
The greenx, greenx car
Now if we use the LEN() function again the amount of letter is: 21. So we take this result and subtract it from the result of our first LEN() equation and we get:
21 - 19 = 2
What does 2 represent? The amount of times the word "green" occurred!!!!! Neat little trick, huh. Lets look at implementing this in a UDF:
Create FUNCTION [dbo].[WordCount]
(@WordToFind VARCHAR(15),
@PhraseToSearch VARCHAR(1000))
RETURNS SMALLINT AS
BEGIN
/* If @WordtoFind or @PhraseToSearch is NULL the function returns 0 */
IF @WordtoFind IS NULL OR @PhraseToSearch IS NULL RETURN 0
/* @SearchWord is a string one character longer than @WordtoFind */
DECLARE @SearchWord VARCHAR(21) SELECT @SearchWord = @WordtoFind + 'x'
/* Replace @WordtoFind with @SearchWord in @PhraseToSearch*/
DECLARE @SearchPhrase VARCHAR(2000) SELECT @SearchPhrase = REPLACE (@PhraseToSearch, @WordtoFind, @SearchWord)
/* The length difference between @SearchPhrase and @PhraseToSearch is the number we''re looking for */
RETURN LEN(@SearchPhrase) - LEN(@PhraseToSearch)
END
So the UDF take two variable the word and the phrase that you are trying to search. It then replaces the word you are trying to find and copies it to a new variable, then it compares the two and returns the number of occurrences.
That the first part. Stay tooned for next week when we implement the stored procedure.
