Mongodb User Forum News: parallel collection scan for query results

From what I read, parallel collection scan is for getting better throughput for reading an entire collection. Is it possible to use parallel scan for reading the results of a query (subset of a collection) or for performing the query in parallel?

There's two things that you might be referring to by parallel collection scan: the parallelCollectionScan command and its associated driver helpers, or the general strategy of a parallel collection scan.

The parallel collection scan command, parallelCollectionScan, is for reading all documents in a collection by breaking them up into multiple cursors that can be iterated in parallel. The command doesn't take a query parameter, so you couldn't use it read the results of a query in parallel.

The general strategy, however, can be extended in some cases. If you have an indexed field and some knowledge of the range of values in the field, you can efficiently break up a cursor over all of the results into multiple cursors that could be iterated in parallel using the ordering given by the field. For example, if you had a count field like below

{

"_id" : 0,

"count" : 38

}

that ranged from 0-100 and you wanted to do a parallel collection scan for the query { "count" : { "$gte" : 72 } }, you could make 3 cursors:

var c0 = db.test.find({ "count" : { "$gte" : 72, "$lte" : 80 } })

var c1 = db.test.find({ "count" : { "$gte" : 81, "$lte" : 90 } })

var c2 = db.test.find({ "count" : { "$gte" : 91, "$lte" : 100 } })

Clearly this doesn't always work nicely. It doesn't work nicely if the index is multikey, if the field will be changing a lot during the life of the cursors, if the query is complicated enough, if you don't know how the values of the field are distributed within the ranges of interest, etc. It will work best on fields that are roughly uniformly distributed.

Thank you for the response. Yes, I wanted to know about both of them.

Are there any plans to have a parallel scan given a query? (for cases which don't fit the example you have given)

Mongodb User Forum News

2014년 12월 24일 수요일

parallel collection scan for query results

댓글 없음:

댓글 쓰기