Mongodb User Forum News: Disapointing read performance (and some remedies)

I am porting an application where read performance plays a crucial role. The MSSQL database that I'm am benchmarking against, contains around 7 million rows of time series data.
My MongoDB model of these data in it's current form ended up with 4049 documents with an avgObjSize of 47831 bytes.
The data is structured in 3 layers of nested documents (years, months, days).

Using a .NET 4.5 console application and 1.9.2 driver from nuget, the test application is using MongoCollection<BsonDocument>.ParallelScan and takes 12-13 seconds no matter what I set NumberOfCursors (tried 1-8) to. I am creating a separate Task for each cursor and I have verified that each cursor is enumerated in a separate thread. All measurements are done using Stopwatch class running the app from a command line window, using a Release build.

I did some profiling with the Concurrency Visualizer and noticed that when ramping up the cursor count, all the time was spent allocating memory (waiting for GC).

So I added the following to app.config:

<runtime>
  <gcServer enabled="true"/>
  <gcConcurrent enabled="false"/>
</runtime>

That brings the time down to less than 4 seconds (using NumberOfCursors=5, which seemed to be the best number on my PC), a more than three times improvement!

I wanted to see if I could get it further down, so I compiled my own driver and did a profiling and I could see that almost all the time is spent in creating the BsonDocuments, specifically maintaining the _elements and _indexes private fields.
I replaced _indexes with this class:

internal class IndexIntoList
{
    private readonly List<BsonElement> _list;
    public IndexIntoList( List<BsonElement> list )
    {
        _list = list;
    }
    public void Clear( ){}
    public void Add(string name, int index ){}
    public bool ContainsKey( string name )
    {
        foreach(var e in _list) if( e.Name.Equals( name ) ) returntrue;
        return false;
    }
    public bool TryGetValue( string name, out int index )
    {
        for(int i=0; i<_list.Count; i++)
        {
            var e = _list[i];
            if( e.Name.Equals( name ) ){index = i;return true;}
        }
        index = 0; return false;
    }
}

Using that, combined with pre-allocating _list to size=16, I was able to achieve 2.8 seconds :-)

I also tried pre-allocating _indexes and _elements to size=16, and that gave a minor improvement over the original.

I realize that this optimization is heavily dependent on the BsonDocuments having few keys, since it does a linear scan for the key, but perhaps it would be worth considering only allocating a dictionary when _list exceeds a certain threshold?

In my case, it would result in not having to allocate and maintain 7M dictionaries. I didn't do a memory profiling, but watching the working set seemed to indicate a significantly lower memory usage.

In any case, I was not aware that using the server GC could result in such a dramatic performance increase, so that was a valuable lesson :-)

For reference, here is my test code.

var client = new MongoClient( connectionString );
var srv = client.GetServer( );
var mongo = srv.GetDatabase( "xxxxxx" );
var seriesColl = mongo.GetCollection<BsonDocument>( "series" );
var cursors = seriesColl.ParallelScan( new ParallelScanArgs
{
    BatchSize = 1,
    NumberOfCursors = 5,
} );
var t = new Stopwatch( );
t.Start( );
var tasks = cursors
    .Select( cursor => Task.Factory.StartNew( ( c ) =>
    {
        using( c as IDisposable )
        {
            var l = new List<object>( 1024 );
            lock(_stuff)
                _stuff.Add(l);
            Console.WriteLine( ">{0}",Thread.CurrentThread.ManagedThreadId );
            int i = 0;
            var en = c as IEnumerator;
            while( en.MoveNext( ) )
            {
                i++;
                l.Add( en.Current );
            }
            Console.WriteLine( "<{0} : {1}",Thread.CurrentThread.ManagedThreadId, i );
        }
    }, cursor ) )
    .ToArray( );
Task.WaitAll( tasks );

t.Stop( );
GC.Collect();
Console.WriteLine( "\r\n{0:0.0}\r\n{1:0,0} bytes",t.Elapsed.TotalSeconds, Process.GetCurrentProcess().WorkingSet64 );

What about using something better suited to time series like kdb? It's a commercial product but it's insanely quick as its optimised for time series and is column based. Most of the investment banks use it,

Thanks so much for the in depth analysis and solution. We recently pushed this change to the master branch which will be our next major release: https://jira.mongodb.org/browse/CSHARP-1065, commit https://github.com/mongodb/mongo-csharp-driver/commit/5936555e12212cb3c3c107c8224a9dfecfe36cd9. It is basically about not creating the dictionaries until they are necessary. There might still be some tuning of the numbers, but we found some significant improvements as well.

No problem. Sounds like there is something to look forward to in the next version! Is there an expected release date?

Mongodb User Forum News

2014년 11월 30일 일요일

Disapointing read performance (and some remedies)

댓글 없음:

댓글 쓰기