There's two things that you might be referring to by parallel collection scan: the parallelCollectionScan command and its associated driver helpers, or the general strategy of a parallel collection scan.
The general strategy, however, can be extended in some cases. If you have an indexed field and some knowledge of the range of values in the field, you can efficiently break up a cursor over all of the results into multiple cursors that could be iterated in parallel using the ordering given by the field. For example, if you had a count field like below
{
"_id" : 0,
"count" : 38
}
that ranged from 0-100 and you wanted to do a parallel collection scan for the query { "count" : { "$gte" : 72 } }, you could make 3 cursors:
var c0 = db.test.find({ "count" : { "$gte" : 72, "$lte" : 80 } })
var c1 = db.test.find({ "count" : { "$gte" : 81, "$lte" : 90 } })
var c2 = db.test.find({ "count" : { "$gte" : 91, "$lte" : 100 } })
Clearly this doesn't always work nicely. It doesn't work nicely if the index is multikey, if the field will be changing a lot during the life of the cursors, if the query is complicated enough, if you don't know how the values of the field are distributed within the ranges of interest, etc. It will work best on fields that are roughly uniformly distributed.
Thank you for the response. Yes, I wanted to know about both of them.
Are there any plans to have a parallel scan given a query? (for cases which don't fit the example you have given)
댓글 없음:
댓글 쓰기