2014년 11월 29일 토요일

Basic find and Index

This is the query:
        docs = list(db.messages.find({'$and':[{'viewed':{'$exists':False}}, {'$or':[{'to_phone':self.phone}, {'to_username':self.username}] }]}).sort('_id', pymongo.DESCENDING).limit(5))

This is the index created:
       db.messages.ensure_index([('viewed', pymongo.DESCENDING),('to_phone',pymongo.DESCENDING),('to_username',pymongo.DESCENDING), ('_id', pymongo.DESCENDING)], unique=False, background=True)

Even after creating an index on all the fields queried on, mongodb is scanning the entire collection, I have the index creation and query both here below in the pastie.

http://pastie.org/9733221

 Can you please tell what is wrong with the index or what is the right index to create?



It's using the index provided for the sort.   If I might suggest trying something:

Change your query to move viewed inside the $or clauses, i.e.

 {'$or':[{'to_phone':self.phone, 'viewed':{'$exists':False} }, {'to_username':self.username, 'viewed':{'$exists':False} }] }

You cannot have a single index satisfy this query, you would need two:

viewed, to_phone, _id
viewed, to_username, _id

except I would flip them:
to_phone, viewed, _id
to_username, viewed, _id



Thanks that really helped!

Can you explain the difference between what I had and your version, as they essentially do the same?

Does each part of OR query needs a separate index?

Do the order of the fields in the index matter?



The order of fields in the index matters if you are not using both fields in the index, or if they are not both equality in queries, or if one of them is frequently queried by itself.

Example:

index on firstName, lastName can be used for query with equality on firstName, equality on firstName and equality, range or sort on lastName.
it cannot be used for a search just on lastName nor can it be *fully* used if you have search with equality on lastName and range or sort on firstName (only range/sort part can use that index on firstName but lastName won't be found using index).

Rewriting the query helps the optimizer since it can run both clause of the $OR in parallel and this way it's more obvious which indexes can be used for each clause.   The way you had it before, the query optimizer would have to re-write the query itself to use indexes in most optimal form.

Lastly, sort should be removed when it's not necessary because query optimizer always takes into account the need for sort (maybe it should be smart enough to know that equality on _id means the sort is pointless, but it's not *that* advanced)...


댓글 없음:

댓글 쓰기