(Comments)
Although a simple concept, searching is something that many of us (developers) try skimping on and hence reduce the overall quality of user experience, due to irrelevant search results and illogical search ranking.
Today we will look into the Full text search functionality that we can leverage when we use the Django-PostGres combo. Please note that Full Text search is only supported if we use Postgres database backend with Django 1.10 or newer.
This tutorial is for developers with some background with PYTHON and Django. Also it is assumed that you are familiar with Querysets, aggregators, annotations etc. Although I'll try explaining it from scratch, I would suggest you get acquainted with the Queryset API reference.
Let's first understand what's a lookup. Field lookups are how you specify the meat of an SQL WHERE clause. They’re specified as keyword arguments to the QuerySet methods filter()
, exclude()
and get()
. For eg:
# Python Code Student.objects.filter(name__contains='Smith') # --> SELECT ... WHERE name LIKE '%Smith%'; Student.objects.filter(age__gt=18) # --> SELECT ... WHERE age > 18;
Here contains
and gt
are lookups. Please visit the official documentation for the complete list of Field Lookups.
For Full Text Search the lookup is search
. It is the simplest method to search a single term against a single column in the database. For eg:
# Python Code Product.objects.filter(name__search='Shiny') # Output: # [<Product: Shiny Shoes>, <Product: Very Shiny LED>]
Note:To use the search lookup, 'django.contrib.postgres'
must be in your INSTALLED_APPS.
filters()
or use Q()
(help!). But a more elegant solution is to use SearchVectors. Let us assume an example where we have the models Book and an Author and search among book title and author name simultaneously; we could use a SearchVector
like so:
# Python Code from django.contrib.postgres.search import SearchVector Book.objects.annotate( search=SearchVector('title', 'author__name'), ).filter(search='Arthur') # Output: # [<Book: The Adventures of Sherlock Holmes>, <Book: Arthur's Eyes>]
The arguments to SearchVector
can be any Expression or the name of a field. Multiple arguments will be concatenated together using a space so that the search document includes them all.
SearchVector
objects can be combined together, allowing you to reuse them. For example:
# Python Code from django.contrib.postgres.search import SearchVector Book.objects.annotate( search=SearchVector('title') + SearchVector('author__name'), ).filter(search='Arthur') # Output: # [<Book: The Adventures of Sherlock Holmes>, <Book: Arthur's Eyes>]
We could also assign weight's to each of the vectors, thereby effectively ranking them acording to relevance, which we will be discussing later in the blog.
SearchQuery translates the terms the user provides into a search query object that the database compares to a search vector.
The advantage of using a SearchQuery
is that by default all the words provided are passed through a stemming alogorithm, before looking for matching terms. Thus giving a much relevant search result accounting for the liguistic usage of that word. Another advantage of using SearchQuery
is we can easily combined logically using &
(AND), |
(OR), and ~
(NOT). So we could try applying SearchQuery in above example as follows:
# Python Code from django.contrib.postgres.search import SearchVector Book.objects.annotate( search=SearchVector('title') + SearchVector('author__name'), ).filter(search = SearchQuery('Arthur') ) # Output: # [<Book: The Adventures of Sherlock Holmes>, <Book: Arthur's Eyes>] # #################################################################### # # could be combined logically as follows from django.contrib.postgres.search import SearchVector, SearchQuery Book.objects.annotate( search=SearchVector('title') + SearchVector('author__name'), ).filter(search = SearchQuery('Arthur') & SearchQuery('Sherlock')) # Output: # [<Book: The Adventures of Sherlock Holmes>,]
So far we were beating around the bushes learning stuff to get to the meat of the matter. The SearchRank orders the results according to the its relevancy to the search term(s) provided by the user. PostgreSQL provides a ranking function which takes into account how often the query terms appear in the document, how close together the terms are in the document, and how important the part of the document is where they occur. The ranking is higher with better match. To order by search ranking, we do the following
# Python Code from django.contrib.postgres.search import SearchQuery, SearchRank, SearchVector vector = SearchVector('name') query = SearchQuery('Stevens') Author.objects.annotate( rank=SearchRank(vector, query) ).order_by('-rank') # Output: # [<Author: Fisher Stevens>, <Author: W. Richard Stevens>, <Author: Robert Louis Stevenson>]
Every field may not have the same relevance in a query, so you can set weights of various vectors before you combine them.
We can assign 4 levels of weights to each SearchVector
[A, B ,C ,or D], with A being with the most weight, by convention. It is set by setting the argument weight
in SearchVector
.
In SearchRank
on the other hand we can assign the actual weights for each of these above mentioned letters by passing the value of argument weights
as a list of 4 float numbers. The order for assigning the values are D,C,B,A. The default weights(if you do not pass the weights
argument in SearchRank
) of these letters are D->0.1, C->0.2, B->0.4, A->1.0.
Let's look at an example to understand better:
# Python Code from django.contrib.postgres.search import SearchQuery, SearchRank, SearchVector vector = SearchVector('title', weight='A') + SearchVector('author__name', weight='B') query = SearchQuery('Arthur') Book.objects.annotate( rank=SearchRank(vector, query) )filter(rank__gte=0.3).order_by('-rank') # Output: # [<Book: Arthur's Eyes>, <Book: The Adventures of Sherlock Holmes>]
For assigning the weights differently, we do the following
rank = SearchRank(vector, query, weights=[0.2, 0.4, 0.6, 0.8]) Book.objects.annotate( rank=rank )filter(rank__gte=0.3).order_by('-rank')
SearchVectorField
No special database tweaking is needed for using any of the above function per say. But since full text search being a reaource hungry and heavy process, it may have hiccups when searching more than a few hundred records. The fact to be noted is that Databases aren't actually designed for full text searches.
To mitigate this issue to some extent you could create indexes for full text search, which has been documented in the PostgreSQL documentation.
Other method which works even better is adding a SearchVectorField
to your model. You’ll need to keep it populated with triggers, for example, as described in the PostgreSQL documentation. You can then query the field as if it were an annotated SearchVector
:
# Python Code Book.objects.update(search_vector=SearchVector('title')) Book.objects.filter(search_vector='Arthur') # Output: # [<Book: Arthur's Eyes>]
A typo-tolerant method of searching is trigram similarity. It compares the number of trigrams, or three consecutive characters, shared between the search term(s) and the target text.
But unlike other features, we have to make sure that an extention called pg_trgm extention on PostgreSQL is activated first. You would find a helping hand on the topic here.
There are two complementary functions here and they are TrigramSimilarity
and TrigramDistance
. They both essentially give the same information but one returns the amount of similarity and the latter the amount of difference as you may have guessed. Let's see an example of each
# Python Code from django.contrib.postgres.search import TrigramSimilarity, TrigramDistance Author.objects.create(name='Katy Stevens') Author.objects.create(name='Stephen Keats') test = 'Katie Stephens' ############################################################################## # Similarity Author.objects.annotate( similarity=TrigramSimilarity('name', test), ).filter(similarity__gt=0.3).order_by('-similarity') # Output: # [<Author: Katy Stevens>, <Author: Stephen Keats>] ############################################################################## # Distance Author.objects.annotate( distance=TrigramDistance('name', test), ).filter(distance__lte=0.3).order_by('distance') # Output: # [<Author: Katy Stevens>, <Author: Stephen Keats>]
We develop web applications to our customers using python/django/angular.
Contact us at hello@cowhite.com
Comments