2014년 11월 29일 토요일

Very slow (though simple) query in MongoDB 2.6.4 ?

I am currently experiencing a very strange situation on my production database. I am running mongodb 2.6.4. Here is an example of a slow query I have in my profile database :
{
"$query" : {
"deleted_at" : null,
"_id" : ObjectId("some_id")
},
"$orderby" : {
"created_at" : -1
}
This query took 9671 milliseconds according to the profile data.
It turns out nsscanedObjects is the total number of objects in the database ( roughly 2 million), which is surprising because it should just find the object by its id using the "BtreeCursor _id_" index.
I then tried to run the query from the mongo shell, and it takes several seconds (between 5 and 10). Then when I run explain() and the result it instantly responds with "BtreeCursor _id_" and nscannedObjects = 1.
What is the problem here ? I am not 100% certain but I think this was not so slow when I was using mongodb 2.4.x a few weeks ago. Should I create a compound index with {deleted_at: 1, _id: 1, created_at: -1} ? Does it make sense to create such an index ?



Is it reproducibly slow or was it just slow that couple of times before the explain and now is running quickly? Can you post an explain plan if you can get one for a slow run of the query? Otherwise, it will be hard to confirm why it ran much slower a couple of times.



What is really surprising is that that running explain on the query is a lot faster than running the request itself ...



It is reproducibly slow. Here is the explain plan :

For me it does not mean anything, because the explain command returns really fast, whereas the query itself is really slow: 

db.my_collection.find({_id: ObjectId("546e2ee0e87678987e3ab000189"), deleted_at: null}).sort({created_at: -1})
What is even stranger is that the following query returns instantly:
db.my_collection.find({parent_id: ObjectId("543509b1fffffaaaff0dd"), _id: ObjectId("546e2ee0e87678987e3ab000189"), deleted_at: null}).sort({created_at: -1})

It seems to me that the query optimizer should use the _id primary index whenever _id is part of the query, regardless of any sort command added,or any other parameter in the query, which does not seem to be the case here (except when I run the explain command). 



What's happening is that it's cached the wrong index (created_at) because of the sort - is there a reason you have an order by clause on a query that can only return a single document?



The reason I am using this is because my query is generated by Mongoid. In my code I just run it as Person.find("someid"). I agree that this is wrong but I don't see why mongodb would cache the wrong index. 



In addition to my previous answer, I've inspected Mongoid code to find out why it is behaving like this. Basically the `find`method can be called on top of any query, this ensure that no object is sent when not all criteria are met, even though one of fields is _id. Mongoid obviously assumes that MongoDB will be smart about it and use the right index. What mongoid does which is a bit less smart is also including the sort criteria, which does not make sense in the context of a request for a single document. However, I would expect MongoDB to be a lot smarter about this.



Should I report a bug in Jira ? 



It's already known behavior (see SERVER-14071, SERVER-15802, etc).

A new ticket would be to add an enhancement that discounts the sort when a query is equality on _id field (or any field with a unique index, I suppose) but it's not likely to be too useful an enhance, I think...


댓글 없음:

댓글 쓰기