Kilone.com





Implementing Full Text search in Sql Server Express - Part 1

by Omar AlBadri on 3/29/2009 2:25:00 PM

Read more about Implementing Full text search in Sql Server Express part 1...

Implementing Full text search in Sql Server Express - Part 1

So you have a blogging site or product catalog with a Sql Express backed for a custom application and you want to include a Rank based Full text search engine to your site. You basically have two options

In this I will show you how to build a Full Text search engine that will be sorted by rank of how many times the word occurs. Lets look at each piece step by step

What we are trying to do

In this post we will create a simple database called Posts with one table called posts with a couple of fields. We will be looking for the occurrence of the word "the" in the text and title fields.

Setting up the test data

 

 

CREATE TABLE [dbo].[Posts]
( [id] [int] IDENTITY(1,1) NOT NULL,
[Title] [nvarchar](50) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,
[Text] [nvarchar](1000) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,
CONSTRAINT [PK_Posts] PRIMARY KEY CLUSTERED ( [id] ASC )WITH
(PAD_INDEX = OFF, IGNORE_DUP_KEY = OFF) ON [PRIMARY] ) ON [PRIMARY]

and populate it with some data

insert into posts(title,text) values('One The','The House')
insert into posts(title,text) values('Two The','The house on the corner')
insert into posts(title,text) values('Three The','The house on the corner with the green roof')
insert into posts(title,text) values('Four The','The house on the corner with the green roof and the blue car')

Simple searching

What is simple searching? Simple searching is using a statement like Select * from posts where text like '%the%' While that will return all rows that contain the word "the" it will not return the number of occurrences of the word and it will not rank them based on the number of occurrences

Implement Word counting

We need a fast method of counting the number of occurrence a word has in a string. Sql server does not provide an easy way to count the number of occurrence in a string so we will create a User defined function (UDF) to implement this.

Finding the number of occurrences quickly

The question is is how are we going to implement this? Sql server does not provide an easy method so we are going to have to use a little trick. Look at the sentence below:

The green, green car

Lets say we count the number of letters in the string using the LEN() function. What we should get is 19. Now lets say we are looking for the amount of times the word "green" occurs in this string. So lets say we replace the word "green" with "greenx" using the Function REPLACE(word1,word2). So now we have the sentence:

The greenx, greenx car

Now if we use the LEN() function again the amount of letter is: 21. So we take this result and subtract it from the result of our first LEN() equation and we get:

21 - 19 = 2

What does 2 represent? The amount of times the word "green" occurred!!!!! Neat little trick, huh. Lets look at implementing this in a UDF:

Create FUNCTION [dbo].[WordCount]
(@WordToFind VARCHAR(15),
@PhraseToSearch VARCHAR(1000))
RETURNS SMALLINT AS
BEGIN

/* If @WordtoFind or @PhraseToSearch is NULL the function returns 0 */
IF @WordtoFind IS NULL OR @PhraseToSearch IS NULL RETURN 0
 
/* @SearchWord is a string one character longer than @WordtoFind */
DECLARE @SearchWord VARCHAR(21) SELECT @SearchWord = @WordtoFind + 'x'

/* Replace @WordtoFind with @SearchWord in @PhraseToSearch*/
DECLARE @SearchPhrase VARCHAR(2000) SELECT @SearchPhrase = REPLACE (@PhraseToSearch, @WordtoFind, @SearchWord)

/* The length difference between @SearchPhrase and @PhraseToSearch is the number we''re looking for */
RETURN LEN(@SearchPhrase) - LEN(@PhraseToSearch)
END

So the UDF take two variable the word and the phrase that you are trying to search. It then replaces the word you are trying to find and copies it to a new variable, then it compares the two and returns the number of occurrences.

That the first part. Stay tooned for next week when we implement the stored procedure.

Post your comment

Thanks for your comments

Be nice ;)

  • Comment