2014년 12월 2일 화요일

Re: mongoDB large array insert

I have the same problem regarding inserting large arrays in MongoDB. As you pointed out, an array of 100,000 numbers can be inserted instantly using mongo shell, but the same array takes over a minute to be inserted in MongoDB using PyMongo!  I am just using a simple test document, testdoc = {"_id":"test", "data": range(100000)}, and inserting it using the following command:
db[collection].insert(testdoc, w = 1, j = True)

Do you have any idea why it takes sooooo much longer to insert the document in Python?

Will Berkeley wrote:
How are you doing the insert? Do you have a code snippet or something? I can insert an array of 100,000 numbers into MongoDB instantly on my local machine using the mongo shell.

> big = []
> for (var i = 0; i < 100000; i++) big.push(i)
> db.bigarray.insert({ "x" : big }) // finishes almost instantly

Is the array field "groups" indexed?

In any case, I'd advise against having gigantic array fields if you want to index the entries or if the array is going to be updated frequently. With large enough arrays, you might hit the 16MB BSON document size limit (1-100,000 in an array in a BSON document is about 1.5MB). I couldn't say a better way to model the data without knowing about the use case, however.




Oh, and I also have to mention that I use the same write concerns (w = 1, j = 1) when inserting the document using mono shell.



I don't reproduce that performance with PyMongo - it's the same near-instantaneous insert of a 100,000 element array as with the shell. Is the server busy at the time your sending the insert? Do you have a full script with connection info, etc, that you could share? Also, what versions of driver and server? I'm using the latest PyMongo with 2.6.4 (and Python 3, so technically I had to change your test code because range is a special sequence type and not a list in Python 3 and it doesn't BSON serialize).

Also, truly, you almost certainly do not want arrays that large in a document. What is driving the desire for such large arrays?


댓글 없음:

댓글 쓰기