Extend the search functionality of Django using Elastic Search and Haystack

(Comments)

In this article, we will discuss how to improve your website with search functionality using haystack and elastic search. Assuming you already have a good knowledge of django web framework, lets get into haystack and elastic search.

Elastic Search

Elasticsearch is a search engine based on Lucene. It is an open-source, broadly-distributable, readily-scalable, enterprise-grade search engine. Accessible through an extensive and elaborate API, Elasticsearch can power extremely fast searches that support your data discovery applications. It also provides RESTful API and almost any action can be performed using a simple RESTful API using JSON over HTTP. More details on elastic search can be found on its official page.

Haystack

Haystack provides modular search for Django. It features a unified, familiar API that allows you to plug in different search backends (such as Solr, Elasticsearch, Whoosh, Xapian, etc.) without having to modify your code.

Let's get ino setting up and installing Elasticsearch and Haystack

Installing Elastic Search

Install Java 8:

Elasticsearch and Logstash require Java, so we will install that now. We will install a recent version of Oracle Java 8 because that is what Elasticsearch recommends. It should, however, work fine with OpenJDK, if you decide to go that route.

Add the Oracle Java PPA to apt:

sudo add-apt-repository -y ppa:webupd8team/java

Update your apt package database:

sudo apt-get update

Install the latest stable version of Oracle Java 8 with this command (and accept the license agreement that pops up):

sudo apt-get -y install oracle-java8-installer

Now that Java 8 is installed, let's install ElasticSearch.

Download the elastic search from their official website. After downloading the file, unzip it and navigate to bin directory. You can run the elastic search executable to start the elastic search server with default config. Just hit 127.0.0.1:9200 in your browser to check whether your elastic search server is up or not.

You can also specify your own config file while starting elastic search server using the following command

elasticsearch --config=<PATH_TO YOUR_CONFIG_FILE>/elasticsearch.yml

Elasticsearch can also be installed with a package manager by adding Elastic's package source list.

Run the following command to import the Elasticsearch public GPG key into apt:

wget -qO - https://packages.elastic.co/GPG-KEY-elasticsearch | sudo apt-key add -

If your prompt seems to hang, it is likely waiting for your user's password (to authorize the sudo command). If this is the case, enter your password.

Create the Elasticsearch source list:

echo "deb http://packages.elastic.co/elasticsearch/2.x/debian stable main" | sudo tee -a /etc/apt/sources.list.d/elasticsearch-2.x.list

Update the apt package database again:

sudo apt-get update

Install Elasticsearch with this command:

sudo apt-get -y install elasticsearch

Elasticsearch is now installed. Let's edit the configuration:

sudo nano /etc/elasticsearch/elasticsearch.yml

You will want to restrict outside access to your Elasticsearch instance (port 9200), so outsiders can't read your data or shutdown your Elasticsearch cluster through the HTTP API. Find the line that specifies network.host, uncomment it, and replace its value with "localhost" so it looks like this:

/etc/elasticsearch/elasticsearch.yml excerpt (updated)

network.host: localhost

Save and exit elasticsearch.yml.

Now, start Elasticsearch:

sudo systemctl restart elasticsearch

Then, run the following command to start Elasticsearch on boot up:

sudo systemctl daemon-reload
sudo systemctl enable elasticsearch

You will also need to install elastic search python binding to get it working with haystack

pip install elasticsearch

Installing haystack

Haystack can be installed via pip.

pip install django-haystack

After installation, just add it to your installed apps.

INSTALLED_APPS = [
                      ....
                      'haystack',
                      ...
                      ]

Add the following lines to the settings.py file

HAYSTACK_CONNECTIONS = {
    'default': {
        'ENGINE': 'haystack.backends.elasticsearch_backend.ElasticsearchSearchEngine',
        'URL': 'http://127.0.0.1:9200/',
        'INDEX_NAME': 'haystack',
    },
}

Now we need to create SearchIndex so that haystack knows what to search on. For this example I will be using the following Django model Musician from my app called Artist.

# -*- coding: utf-8 -*-
from __future__ import unicode_literals

from django.db import models

# Create your models here.


class Musician(models.Model):
    first_name = models.CharField(max_length=50)
    last_name = models.CharField(max_length=50)
    instrument = models.CharField(max_length=100)

    def __unicode__(self):
        return self.first_name

Now create a file called search_indexes.py in the directory where your models.py file is there. Create MusicianIndex in this file to tell haystack, what data from my Musician model want to store in the search engine.

from haystack import indexes

from .models import Musician


class MusicianIndex(indexes.SearchIndex, indexes.Indexable):
    text = indexes.CharField(document=True, use_template=True)
    first_name = indexes.CharField(model_attr='first_name')
    last_name = indexes.CharField(model_attr='last_name')
    instrument = indexes.CharField(model_attr='instrument')

    def get_model(self):
        return Musician

    def index_queryset(self, using=None):
        return self.get_model().objects.all()

In your main templates directory create a file called search/indexes/artist/musician_text.txt. You have to change the path to use your own app name and index name and add the searchable information in the template musician_text.txt in this case.

{{ object.first_name }}
{{ object.last_name }}
{{ object.instrument }}

You can include all fields or the only fields that you need to search. After this add the haystack urls to the urls.py file.

(r'^search/', include('haystack.urls')),

We have to build the index before we can search, To do that run the following command.

python manage.py rebuild_index

To learn more about haystack commands, visit the official haystack documentation page.

Querying the Data

Now that we have the Search Index, we will see how to query that data using the haystack API. See the example code below using the SearchQuerySet class.

from haystack.query import SearchQuerySet
query = SearchQuerySet().filter(content='guitar')

The results can be iterated upon as well for individual items like shown below

for item in query:
    first_name = item.first_name
    last_name = item.last_name
    instrument = item.instrument

If there are multiple searchIndex classes, we can specify which models to search in to speed up the search like shown below

from haystack.query import SearchQuerySet
query = SearchQuerySet().models(Musician).filter(content='guitar')

For more Haystack filters and other options you can see the official documentation here.

Comments

Recent Posts

Archive

2022
2021
2020
2019
2018
2017
2016
2015
2014

Tags

Authors

Feeds

RSS / Atom