Ordering by relational field without duplicates

(Comments)

Many of the developers might have faced the problem of duplicate results in the queryset when we order by the relational fields. I would I like to explain that using a simple example.

Let us say we have two models User and Order. User is already defined in auth.models so let us create a new model Order.

class Order(models.Model):
    user = models.ForeignKey(User)

Now make migrations and run them. And then open the Django shell using

python manage.py shell

Import the models

from django.contrib.auth.models import User
from app1.models import Order # I placed Order model in the app 'app1'.

Create some users and orders.

for i in range(1, 5):
    user = User.objects.create(username='u%s' % i, email='u%s@localhost.com' % i)
    Order.objects.create(user=user)

In order to show duplicate users, we need more than one order for a single user. So, let us create another order for the user

Order.objects.create(user=User.objects.get(username='u1'))

Now try printing all users

In [46]: User.objects.all() 
Out[46]: [<User: u1>, <User: u2>, <User: u3>, <User: u4>]

Now try printing users order by descending order of "order" id.

In [57]: User.objects.order_by("-order")
Out[57]: [<User: u1>, <User: u4>, <User: u3>, <User: u2>, <User: u1>]

This clearly shows it is returning duplicate users when we order by a relational field "order".

At the same time "distinct" doesn't work here.

So, one solution for this problem is to use annotate.

Let us try that

In [74]: from django.db.models import Max
In [75]: User.objects.annotate(order_temp=Max("order")).order_by("-order_temp") 
Out[75]: [<User: u1>, <User: u4>, <User: u3>, <User: u2>]

Yay ! We got distinct users in the descending order of "order". If you remember when we created orders, we additionally created a new order for the user "u1". So, it is the latest order and it is appearing in the first position.

Now since its working, lets go deep and see whats going on.

Let us print the Sql queries and try to understand it.

In [76]: print User.objects.order_by("-order").query
SELECT "auth_user"."id", "auth_user"."password", "auth_user"."last_login", "auth_user"."is_superuser", "auth_user"."username", "auth_user"."first_name", "auth_user"."last_name", "auth_user"."email", "auth_user"."is_staff", "auth_user"."is_active", "auth_user"."date_joined" FROM "auth_user" LEFT OUTER JOIN "app1_order" ON ("auth_user"."id" = "app1_order"."user_id") ORDER BY "app1_order"."id" DESC

In [77]: print User.objects.annotate(order_temp=Max("order")).order_by("-order_temp").query
SELECT "auth_user"."id", "auth_user"."password", "auth_user"."last_login", "auth_user"."is_superuser", "auth_user"."username", "auth_user"."first_name", "auth_user"."last_name", "auth_user"."email", "auth_user"."is_staff", "auth_user"."is_active", "auth_user"."date_joined", MAX("app1_order"."id") AS "order_temp" FROM "auth_user" LEFT OUTER JOIN "app1_order" ON ("auth_user"."id" = "app1_order"."user_id") GROUP BY "auth_user"."id", "auth_user"."password", "auth_user"."last_login", "auth_user"."is_superuser", "auth_user"."username", "auth_user"."first_name", "auth_user"."last_name", "auth_user"."email", "auth_user"."is_staff", "auth_user"."is_active", "auth_user"."date_joined" ORDER BY "order_temp" DESC

Both the above sql queries are looking bigger which makes it difficult to understand. So, let us fetch only one field "id" so query looks smaller (even though it works without limiting to one field).

In [85]: print User.objects.values("id").order_by("-order").query

SELECT "auth_user"."id" FROM "auth_user" LEFT OUTER JOIN "app1_order" ON ("auth_user"."id" = "app1_order"."user_id") ORDER BY "app1_order"."id" DESC

The above sql query clearly shows that it is returning all users with user.id=order.user_id, ordered by "order id". So, we see duplicate users when a user has more than one order.

In [88]: print User.objects.values("id").annotate(order_temp=Max("order")).order_by("-order_temp").query

SELECT "auth_user"."id", MAX("app1_order"."id") AS "order_temp" FROM "auth_user" LEFT OUTER JOIN "app1_order" ON ("auth_user"."id" = "app1_order"."user_id") GROUP BY "auth_user"."id" ORDER BY "order_temp" DESC

The second query fetches users only with the maximum order id for that user. For example, if a user has 3 orders with ids 3, 4, 5 and if another user has order 10, 11, 12 then it fetches users with the order ids 5 and 12 only. (5 is latest order for first user, 12 is latest order for second user) so it doesnt return duplicate records. We dont even need to do distinct here.

Note that, we are assuming here that the record/instance with the maximum id the latest order.

To learn more about annotations, you can take a look at this link https://docs.djangoproject.com/en/1.9/topics/db/aggregation/. In short, it is simply attaching another temporary field to our model User with the maximum(or the latest) order id for that user.

Comments

Recent Posts

Archive

2022
2021
2020
2019
2018
2017
2016
2015
2014

Tags

Authors

Feeds

RSS / Atom