I recently came across this question: http://stackoverflow.com/q/ 26950100/383478
Whereby after taking a dump of a files database using gridfs from MongoDB 2.4 he had 86GB and after reimporting it into MongoDB he had 122GB.> var f = db.fs.chunks.findOne();
> print(Object.bsonsize(f));
166069
> db.fs.chunks.stats()
{
"ns" : "test.fs.chunks",
"count" : 1,
"size" : 262128,
"avgObjSize" : 262128,
"storageSize" : 4202496,
"numExtents" : 2,
"nindexes" : 2,
"lastExtentSize" : 4194304,
"paddingFactor" : 1,
"systemFlags" : 1,
"userFlags" : 1,
"totalIndexSize" : 16352,
"indexSizes" : {
"_id_" : 8176,
"files_id_1_n_1" : 8176
},
"ok" : 1
}
The Object.bsonsize() function calculates the size of the BSON representation for a document.
Power of 2 sizes (and padding factor) are related to the storage allocation size for a document (i.e. how large the document can grow in place before it needs to be relocated in storage).
For example, with a document that has a BSON size of 1000 bytes:
- powerOf2Sizes allocation strategy would round up the allocation request to the nearest power of 2, so 1024 bytes for storage (see: http://docs.mongodb.org/
- the paddingFactor allocation strategy will add space based on historical growth, so for example with 1.1 paddingFactor the storage allocation will be 1100 bytes (see: http://docs.mongodb.org/ v2.4/core/record-padding/# padding-factor).
I expect the size increase for the 2.4 GridFS dump restored into 2.6 is because the powerOf2Sizes flag is used by default for new collections.
The historical 256 KB chunk sizes were on the boundary for powerOf2sizes leading to 512 KB allocations, which would be inefficient since GridFS chunks do not grow. Drivers were updated to reduce the chunk sizes to 255 KB (related to https://jira.mongodb.org/ browse/SERVER-13331), but this would not affect historical documents.
The recommended fix (as per the answer on SO) would be to disable powerOf2 allocation strategy on the GridFS fs.chunks collection before importing.
Note: this fix is only required if you are importing legacy GridFS documents or using a driver or chunk size that doesn't take the powerOf2Sizes allocation into consideration.
Ok Kool this seems like a misconception on my part that the padding would be artificially added to the document itself when it is saved, instead is more "invisible".
I guess that is why I see a storage size of: 4202496 but a doc size of 166KB.
Indeed, padding is about record allocation in the storage layer rather than manually padding a document with empty bytes. If you are using a driver or command-line tool like mongorestore, these do not have (or need) direct visibility into the underlying storage representation. It's up to the storage engine to work out how much space to allocate for documents based on the collection settings.
The current allocation strategies (mmapv1) are described in the manual (http://docs.mongodb.org/ manual/core/storage/). If you want to dig into some more technical details Mathias has a great talk from a few years ago: http://www.mongodb.com/ presentations/storage-engine- internals.
댓글 없음:
댓글 쓰기