1. Original Entry + Comments2. Write a Comment3. Preview Comment
New comments for this entry are disabled.


July 11, 2004  |  Blog search AND  |  2187 hit(s)

I've added an AND capability to the blog search. The original search creates a WHERE clause using LIKE (adding % around the term). I've occasionally wanted to refine a search, so I'm playing around with it at the moment.

Rewind. The original search I coded used stored procedures. There's one to search text+title and a second one to search text+title+comments, which joins the blog and comments table. (As Colt pointed out, there's a slight flaw with the latter; you can end up with multiple hits for the same item if the search term appears in both the text and comments of a particular entry, a problem I haven't found a good solution for.) Because I wanted to release the source of the blog and wanted the "installation" to be as simple as possible, I reverted to creating the query in code. It's arguable that this simplifies anything, but that's what I did.

Adding the AND facility introduces a new angle. AFAIK, the only way to search for multiple terms ANDed together is to string together a WHERE clause with AND predicates, like this:
WHERE blogDescription like "%foo%" AND blogDescription like "%bar%"
The number of predicates is not fixed, which complicates the picture if I want to use a stored procedure. Since you can't dynamically create WHERE clauses in an SP, it seems (?) the choices are to create separate SPs that accept different numbers of parameters, or perhaps an SP that has an IF statement in it that tests the number of parameters and branches to an appropriate query. In the blog, I could easily impose an arbitrary limit of, say, three predicates. But the use of SPs for this application doesn't seem that critical, so it seems easier to just do things in code.

I made some decisions up front about what I wanted:
  • I needed AND but not OR.
  • Multiple word chunks are inherently treated as units, so I don't need to support quotation marks. In fact, I don't want 'em.
  • The AND operator would be either the literal AND or the & character.
  • I would not worry about escaping quotation marks, AND, or &.
  • I definitely wanted to use ADO.NET parameters for the actual terms.
I need two bits: the first parses the search string looking for AND or &. The second part dynamically creates a WHERE clause. Here's the parse logic:
Dim searchTerm As String = Server.HtmlEncode(textSearch.Text)
searchTerm = searchTerm.Replace("&", "&")
searchTerm = searchTerm.ToLower() ' normalize
searchTerm = searchTerm.Replace("""", "") ' Strip quotation marks
Dim trimmed As Boolean = False
' Trim double spaces from inside the search string.
' (Is there no .NET function for this?)
While Not trimmed
searchTerm = searchTerm.Replace(" ", " ")
If searchTerm.IndexOf(" ") < 0 Then
trimmed = true
End If
End While
searchTerm = searchTerm.Replace(" and ", "&") ' Normalize to &
searchTerm = searchTerm.Replace(" & ", "&")
Dim searchTermArray() As String
searchTermArray = searchTerm.Split("&")
This leaves me with an array of tidied-up search terms. Constructing the WHERE clause is theoretically easy, but I actually have to search multiple fields -- blogDescription, blogTitle, and optionally, blogComments. Searching blogComments introduces the additional complication that it requires a join, because the comments are in another table. Here's the code for the bit without the optional comments included:
For i As Integer = 0 to searchTermArray.Length - 1
If i = 0 Then
whereClause &= " WHERE (blog.blogID < @hivalue) AND "
Else
whereClause &= " AND "
End If
whereClause &= " (blog.description like @searchterm" & i.ToString() & _
" OR blog.title like @searchterm" & i.ToString() & ") "
aCommand.Parameters.Add("@searchterm" & i.ToString(), _
System.Data.SqlDbType.Char).value _
= "%" & searchTermArray(i) & "%"
Next
aCommand.CommandText = "SELECT TOP " & pageSize & _
" blogid, title, entrydatetime, description " & _
" FROM blog " & _
whereClause & _
" ORDER BY blog.EntryDateTime DESC"
All the concatenation is bad enough, and then I need to add parentheses around each predicate. This is not yer best example of easily maintainable code. At least there's no problem creating the parameter objects dynamically, since their ID is a configurable string. When this code runs for a search with two ANDed terms, it constructs a query that looks like this (but not pretty-fied):
SELECT TOP 10 blogid, title, entrydatetime, description
FROM blog
WHERE (blog.blogID < @hivalue)
AND (blog.description like @searchterm0
OR blog.title like @searchterm0)
AND (blog.description like @searchterm1
OR blog.title like @searchterm1)
ORDER BY blog.EntryDateTime DESC
(The @hivalue stuff is used to support paging.) If there were three search terms, a third AND clause would be generated, etc. Note that the % % delimiters are cooked into the searchterm already. If the optional comment search is added, the query looks like this:
SELECT TOP 10 blog.blogid, blog.title, blog.entrydatetime, blog.description
FROM blog
LEFT OUTER JOIN blogComments ON blog.BlogID = blogComments.blogId
WHERE (blog.blogID < @hivalue)
AND (blog.description like @searchterm0
OR blog.title LIKE @searchterm0
OR blogComments.comment LIKE @searchterm0)
AND (blog.description like @searchterm1
OR blog.title LIKE @searchterm1
OR blogComments.comment LIKE @searchterm1)
ORDER BY blog.EntryDateTime DESC
Messy, eh? It does appear to work, although I haven't tested it extensively.

I've considered adding a category search as well so that I could, for example, search for only aspnet stuff or family stuff. But that's probably enough SQL for one day.