2014년 12월 13일 토요일

Heavy GC usage with Java driver

Under high load we have CPU usage ~90% and GC spending most of time collecting garbage. Most of this garbage is BasicDBObject objects, which we use only as intermediate objects in mapping from BSON stream to our domain objects.

I came up with the following not very clean solution:

I create a wrapper for BSON:
class BsonDbObject implements DBObject {
    
private byte[] data;
 
    
// empty implementations for other methods
}

custom DBDecoder with one overridden method:
class BsonDbDecoder implements DBDecoder {
    
@Override
    
public DBObject decode(InputStream in, DBCollection collection)throws IOException {
        
byte[] data = fullRead(in);
        
return new BsonDbObject(data);
    
}

    
// empty implementations for other methods
}

and build MongoClient with our custom BsonDbDecoderFactory. Later in the code I perform mapping directly from BSON stream.

This solution is quite complex and not stable yet, i.e. $errors handling in DBObject is still not implemented.

Is there more stable and elegant way to skip creating DBObjects in mongo-java-driver to reduce load on GC (as I know 3.0.x allows it but we use 2.12.4)?



The driver contains a class like this called LazyDBObject, and an associated encoder and decoder.  Let me know if that works for you.



It looks like it is almost the same thing what we do (setting custom decoder and creating custom DBObject implementation). So LazyDBObject can work for us, though I see some disadvantages against our approach:
- LazyDBObject still creates unnecessary (for us) objects, but not so many as for BasicDBObject: an ElementRecord object with 5 fields for each document field and a lot of iterators. Also I see that in the 3.0.x version it got even worse: it creates HashMaps in keySet() and entrySet() methods, which returns us to the same problems as with BasicDBObjects.
- Methods for reading Strings are not optimized, so we will have much time spent in decoding Strings. At the same time similar methods in BasicBSONDecoder are optimized. 
- We use lazy loading and lazy mapping for referenced documents, so we hold only small part of the root document, i.e. a subdocument with 2-3 fields. But LazyDBObject keeps reference to top document's bson array when it returns inner LazyDBObjects. Therefore until we remove all references to root objects and its inner objects we will have full bson array for the root document in the memory.

Also it has very few documentation, deprecated methods and fields. And I did not find much usage of LazyDBObjects neither here in the group nor on the internet.

Could you please share your thoughts? May we go with our solution or there are some strong advantages of using LazyDBObjects especially if we are planning to migrate to 3.0.x version when it's released?



I work on a different Java Driver (which I suspect would have a similar garbage collection overhead) so I am curious as to the cause of the overhead.   

I was wondering if you could describe the structure of your documents that is causing such large the garbage collection overhead. The only structure I can think that would be capable of generating enough BasicDBObject instances to impact the garbage collector would be a very deeply nested tree. Can you share how many fields there are in a document and how deeply they are nested? Can you talk about the problem you are modeling? Maybe there is a better document structure.



Your assumptions are correct - we have a complex data structure, i.e. classes have 20-30 fields and some documents may have object tree with more than 10 levels.
Unfortunately, we can change the data structure only a little, as we are quite bound: we migrate these objects to Mongo from another NoSQL solution (a distributed cache) and these objects are widely used not only by our large code base but also by external clients through provided public API (sorry, but I'm not sure if I can expose the details). Also the previous solution was very fast and current solution based on MongoDB should have similar values, so we concerned about performance.

I suppose it will be useful if a driver can provide some stream interface for reading documents (the 2nd version of java-mongo-driver has callbacks in decoders, but it's not very convenient to use them. BsonReader class in the 3rd version looks better in this sense).



LazyDBObject is not a good candidate for an application that needs to examine all keys/values in each document, as it will end up doing even more work that BasicDBObject.  It's designed more for situations where you want to move a BSON document from one place to another with minimal examination (perhaps just looking at a couple of fields).  For example, MongoDB, Inc. backup service uses this to transfer en-encoded documents from server to server.  You might also use it to create the equivalent of mongodump in Java.

I agree that the 3.0 driver will be able to do better, using a combination of RawBsonDocument and BsonBinaryReader.



Thank you Jeff. I see, but we should examine and convert all fields in each document. So I think we can go with our current solution keeping in mind these classes from the next version and after migrating to the 3.0 driver just start to use BsonBinaryReader.


댓글 없음:

댓글 쓰기