Mongodb User Forum News: New Wired Tiger storage engine and 'directoryperdb' flag

I have been testing new WiredTiger storage engine with Snappy compression and first numbers look awesome, about 50% of compression with no noticeable performance degradation (no very exhaustive benchmark). I have 2 questions regarding WiredTiger storage engine:

1. Is this new storage engine going to support flag 'deirectoyperdb' in the stable release? It seems not possible to active it now in version rc0 and I think is an important feature.

2. Is there any benchmark of the performance of WiredTigger+Snappy compared to MMAPv1?

The question that comes to my mind is, is the --directoryperdb config setting even necessary with WiredTiger? It says in the docs, it doesn't work with it, but it also doesn't say, it doesn't really need it. It is a valid question, when it is suggested to use the --directoryperdb setting to improve disk throughput for MMAPv1. It is also important for expanding disc capacity too and both improvements are something we need, along with the nice compression algos, which come with WT.

I have been testing new WiredTiger storage engine with Snappy compression and first numbers look awesome, about 50% of compression with no noticeable performance degradation (no very exhaustive benchmark). I have 2 questions regarding WiredTiger storage engine:

1. Is this new storage engine going to support flag 'deirectoyperdb' in the stable release? It seems not possible to active it now in version rc0 and I think is an important feature.

There is an open issue that you can upvote/watch: https://jira.mongodb.org/browse/SERVER-16132. Currently directoryperdb is only supported in the mmapv1 storage engine.

2. Is there any benchmark of the performance of WiredTigger+Snappy compared to MMAPv1?

As always, benchmarks will vary depending on the data and configuration you use as well as where your resource constraints are.

WiredTiger has a number of configuration options, so I expect more information on tuning settings for different workloads will be available as folks test the integration.

The question that comes to my mind is, is the --directoryperdb config setting even necessary with WiredTiger? It says in the docs, it doesn't work with it, but it also doesn't say, it doesn't really need it. It is a valid question, when it is suggested to use the --directoryperdb setting to improve disk throughput for MMAPv1.

Hi Scott,

The directoryperdb option for mmapv1 provides some administrative flexibility:

- option to use different mount points per db by symlinking into your dbpath

- ability to easily snapshot files for a single DB

For example, you could have different storage types on a server (HDD vs SSD) and use per-db mount points to selectively have some dbs using faster storage.

However, there are some associated sharp edges and administrative overhead:

- some tool & option combinations don't work well with directoryperdb (eg. doing a repair with --repairPath)

- dbpath layout will affect backup/restore strategies

- this introduces different failure modes (eg. one db could be linked to a volume that runs out of space while the others on the same mongod instance have plenty)

The WiredTiger engine currently stores some per-instance metadata in the dbpath, so database files would not be easily self-contained in a directory per db.

With the option of multiple storage engines, there are inevitably going to be different features and configuration to choose between based on your use case... "compression" vs "directoryperdb" support is one current example (at least as at 2.8.0-rc0).

Adding to what other folks have said - while there is no directory per db option, you *can* improve disk IO throughput by putting the journal onto different device from the data files - you would do it the same way as in mmapv1 - symlink dbpath/journal directory to a directory on another physical drive.

I've been doing some testing of mmapv1 vs wiredtiger and I'm seeing very good numbers with default configurations, but I'm holding off publishing anything since I'm testing 2.8.0-rc0 plus a number of fixes in master that are targeting rc1, and I'd rather have a write-up of a released version.

As a general observation, throughput for write heavy loads are better in wiregtiger, throughputs for read-heavy or read-heavy and in-place updates workloads is better in mmap1 but these are very preliminary since I've not tuned all possible variables yet.

How much of a difference are you seeing on the read performance? Just a roundabout ballpark figure to get a feeling of what we might expect to see.

I've seen anything from 10-15% worse to 10-15% better for read-heavy loads, depending on various factors.

I wouldn't put too much stock into my WT tests at this point because one of the issues with my early test sets (YCSB) is that they are random binary data and don't compress very well if at all. This means some of the potential benefit in reducing IO is lost and some additional cycles are still spent on checking whether compression will help.

My next round will include both tests with compression options turned off and tests with data that compresses more in line with "normal" (i.e. not random binary) data.

Sounds good. I'm looking forward to the results.

Any news from the WT test front?:-)

> Any news from the WT test front?:-)
Perhaps, but at least I spotted this in the commit log:

https://jira.mongodb.org/browse/SERVER-16132
and
https://github.com/mongodb/mongo/commit/23fad5ee3e26b8d107401e4bfad86f9ada7b7d1f

Indicating that something related to the subject is making it in...

Derick is right - there is a directoryPerDB option as well as an
option to put indexes into a separate directory from data files (so
you could put indexes and journal on different devices from the rest
of data) but to be honest, I'm not sure I've been able to max out my
one disk yet - I seem to be more CPU bound at this point :) :) :)

I take it from the smilies being more CPU bound is a good thing? Not sure why though. Could you explain that a little please?:)

And what I was hoping to get some news about was performance testing. If WT can be close to MMapV1 in terms of performance, I am going to be a very happy camper.:)

It is like MongoDB was reading my mind and answering my wishes on file size for the databases. This should simplify our efforts to have physical data separation between tenants, without blowing up the storage usage unnecessarily or having to put a group of smaller tenants on a single database. I bet there are quite a few users of Mongo thinking the same too, like those who offer DBaaS.

> I take it from the smilies being more CPU bound is a good thing? Not sure
> why though. Could you explain that a little please?:)
In the past (with mmapv1) generally one of your CPUs was busy and the
others were idle.
I'm testing on a 32-core machine and I see very little idle CPU. This
is a good thing as I'm
getting a lot more mileage out of the single physical box. I'm also
not maxing out IO because
wired tiger tries to be efficient in writing to disk in larger chunks
when it can.

> And what I was hoping to get some news about was performance testing. If WT
> can be close to MMapV1 in terms of performance, I am going to be a very
> happy camper.:)
Just close to MMapV1? I'm afraid I have to disappoint you - I'm
finding very few workloads
where wiredTiger is close to MMapV1.

In most cases, I'm finding wiredTiger significantly ahead of MMapV1.
But of course YMMV
as different workloads show very different profiles (plus we're not
done yet - we recently
pushed an improvement to mmapv1 and of course oodles of
improvements/fixes to WT).

> It is like MongoDB was reading my mind and answering my wishes on file size
> for the databases. This should simplify our efforts to have physical data
> separation between tenants, without blowing up the storage usage
> unnecessarily or having to put a group of smaller tenants on a single
> database. I bet there are quite a few users of Mongo thinking the same too,
> like those who offer DBaaS.
Hope we keep delivering on those wish lists, Scott. :)

Merry Christmas a bit early!

Thanks Asya. Wired Tiger looks like a great Christmas present. LOL! :-D

And merry Christmas to you too! Although, I think we'll "see" each other here before Christmas again.:-)

Mongodb User Forum News

2014년 12월 2일 화요일

New Wired Tiger storage engine and 'directoryperdb' flag

댓글 없음:

댓글 쓰기