2014년 12월 11일 목요일

Server Crashing: Aborting after invariant() failture

Server: 2x Windows Server 2012
Replication: 2 Nodes + 1 Arbiter
Sharding: None
MongoDB Version: 2.6.4

Problem: We are rebuilding our collections on Saturday morning and lately something is going on where the dataset is getting into a completely compromised state where I need to rebuild the entire replicate in order to get the services back online. (kill DB directories, re-deploy replica set then re-insert data). Not a fun way to spend your Monday mornings.

It would appear that an exception is being thrown during the rebuild process and putting the collection into a unrecoverable state.

FWIW: We are rebuilding about 30gigs worth of data, indices are built in the background, this was working correctly in the past. From what I can tell, the only change was that we starting building the indices in the background.

Log before crash


2014-12-06T06:26:09.377-0500 [clientcursormon] mem (MB) res:37193 virt:74330
2014-12-06T06:26:09.377-0500 [clientcursormon]  mapped (incl journal view):73758
2014-12-06T06:26:09.377-0500 [clientcursormon]  connections:2
2014-12-06T06:26:09.377-0500 [clientcursormon]  replication threads:32
2014-12-06T06:26:28.878-0500 [conn16870] end connection172.16.56.77:53288 (1 connection now open)
2014-12-06T06:26:28.878-0500 [initandlisten] connection accepted from 172.16.56.77:53301 #16872 (2 connections now open)
2014-12-06T06:26:31.909-0500 [conn16871] end connection172.16.56.77:53289 (1 connection now open)
2014-12-06T06:26:31.909-0500 [initandlisten] connection accepted from 172.16.56.77:53302 #16873 (2 connections now open)
2014-12-06T06:26:38.847-0500 [initandlisten] connection accepted from 172.16.70.67:2570 #16874 (3 connections now open)
2014-12-06T06:26:39.191-0500 [repl writer worker 2] build index on: Risk.country_build properties: { v: 1, key: { _id: 1 }, name: "_id_", ns: "Risk.country_build" }
2014-12-06T06:26:39.191-0500 [repl writer worker 2]   added index to empty collection
2014-12-06T06:26:39.300-0500 [repl index builder 17] build index on: Risk.country_build properties: { v: 1, unique: true, key: { Code: 1 }, name: "[U] Code", ns: "Risk.country_build", background: true }
2014-12-06T06:26:39.300-0500 [repl index builder 17]   building index in background
2014-12-06T06:26:39.300-0500 [repl index builder 17] build index done.  scanned 50 total records. 0 secs
2014-12-06T06:26:39.316-0500 [repl writer worker 2] CMD: drop Risk.country
2014-12-06T06:26:39.487-0500 [repl writer worker 2] build index on: Risk.coverage_build properties: { v: 1, key: { _id: 1 }, name: "_id_", ns: "Risk.coverage_build" }
2014-12-06T06:26:39.487-0500 [repl writer worker 2]   added index to empty collection
2014-12-06T06:26:40.159-0500 [repl index builder 18] Risk.system.indexes Invariant failure nsi.details( descriptor->indexNamespace() ) == NULL src\mongo\db\catalog\index_catalog.cpp 450
2014-12-06T06:26:40.738-0500 [repl index builder 18] mongod.exe    ...\src\mongo\util\stacktrace.cpp(169)                           mongo::printStackTrace+0x43
2014-12-06T06:26:40.738-0500 [repl index builder 18] mongod.exe    ...\src\mongo\util\log.cpp(127)                                  mongo::logContext+0x9c
2014-12-06T06:26:40.738-0500 [repl index builder 18] mongod.exe    ...\src\mongo\util\assert_util.cpp(131)                          mongo::invariantFailed+0x108
2014-12-06T06:26:40.738-0500 [repl index builder 18] mongod.exe    ...\src\mongo\db\catalog\index_catalog.cpp(450)                  mongo::IndexCatalog::IndexBuildBlock::init+0x3ff
2014-12-06T06:26:40.738-0500 [repl index builder 18] mongod.exe    ...\src\mongo\db\catalog\index_catalog.cpp(316)                  mongo::IndexCatalog::createIndex+0x4af
2014-12-06T06:26:40.738-0500 [repl index builder 18] mongod.exe    ...\src\mongo\db\index_builder.cpp(88)                           mongo::IndexBuilder::build+0x16e
2014-12-06T06:26:40.738-0500 [repl index builder 18] mongod.exe    ...\src\mongo\db\index_builder.cpp(65)                           mongo::IndexBuilder::run+0x358
2014-12-06T06:26:40.738-0500 [repl index builder 18] mongod.exe    ...\src\mongo\util\background.cpp(147)                           mongo::BackgroundJob::jobBody+0x11c
2014-12-06T06:26:40.738-0500 [repl index builder 18] mongod.exe    ...\src\third_party\boost\libs\thread\src\win32\thread.cpp(185)  boost::`anonymous namespace'::thread_start_function+0x21
2014-12-06T06:26:40.738-0500 [repl index builder 18] mongod.exe    f:\dd\vctools\crt_bld\self_64_amd64\crt\src\threadex.c(314)      _callthreadstartex+0x17
2014-12-06T06:26:40.738-0500 [repl index builder 18] mongod.exe    f:\dd\vctools\crt_bld\self_64_amd64\crt\src\threadex.c(292)      _threadstartex+0x7f
2014-12-06T06:26:40.738-0500 [repl index builder 18] KERNEL32.DLL                                                                   BaseThreadInitThunk+0xd
2014-12-06T06:26:40.738-0500 [repl index builder 18] Risk.system.indexes
2014-12-06T06:26:40.738-0500 [repl index builder 18]
***aborting after invariant() failure

2014-12-06T06:26:41.223-0500 [repl index builder 18] mongod.exe    ...\src\mongo\util\stacktrace.cpp(169)                           mongo::printStackTrace+0x43
2014-12-06T06:26:41.223-0500 [repl index builder 18] mongod.exe    ...\src\mongo\util\signal_handlers.cpp(107)                      mongo::`anonymous namespace'::abruptQuit+0xf2
2014-12-06T06:26:41.223-0500 [repl index builder 18] mongod.exe    f:\dd\vctools\crt_bld\self_64_amd64\crt\src\winsig.c(593)        raise+0x1ed
2014-12-06T06:26:41.223-0500 [repl index builder 18] mongod.exe    f:\dd\vctools\crt_bld\self_64_amd64\crt\src\abort.c(81)          abort+0x18
2014-12-06T06:26:41.223-0500 [repl index builder 18] mongod.exe    ...\src\mongo\db\catalog\index_catalog.cpp(450)                  mongo::IndexCatalog::IndexBuildBlock::init+0x3ff
2014-12-06T06:26:41.223-0500 [repl index builder 18] mongod.exe    ...\src\mongo\db\catalog\index_catalog.cpp(316)                  mongo::IndexCatalog::createIndex+0x4af
2014-12-06T06:26:41.223-0500 [repl index builder 18] mongod.exe    ...\src\mongo\db\index_builder.cpp(88)                           mongo::IndexBuilder::build+0x16e
2014-12-06T06:26:41.223-0500 [repl index builder 18] mongod.exe    ...\src\mongo\db\index_builder.cpp(65)                           mongo::IndexBuilder::run+0x358
2014-12-06T06:26:41.223-0500 [repl index builder 18] mongod.exe    ...\src\mongo\util\background.cpp(147)                           mongo::BackgroundJob::jobBody+0x11c
2014-12-06T06:26:41.223-0500 [repl index builder 18] mongod.exe    ...\src\third_party\boost\libs\thread\src\win32\thread.cpp(185)  boost::`anonymous namespace'::thread_start_function+0x21
2014-12-06T06:26:41.223-0500 [repl index builder 18] mongod.exe    f:\dd\vctools\crt_bld\self_64_amd64\crt\src\threadex.c(314)      _callthreadstartex+0x17
2014-12-06T06:26:41.238-0500 [repl index builder 18] mongod.exe    f:\dd\vctools\crt_bld\self_64_amd64\crt\src\threadex.c(292)      _threadstartex+0x7f
2014-12-06T06:26:41.238-0500 [repl index builder 18] KERNEL32.DLL                                                                   BaseThreadInitThunk+0xd
2014-12-06T06:26:41.238-0500 [repl index builder 18] SEVERE: Got signal: 22 (SIGABRT).
Backtrace:

Log after crash -

2014-12-06T19:05:49.233-0500 ***** SERVER RESTARTED *****
2014-12-06T19:05:49.295-0500 Trying to start Windows service 'MongoDB'
2014-12-06T19:05:49.327-0500 Service running
2014-12-06T19:05:49.327-0500 [initandlisten] MongoDB starting : pid=2088 port=27017 dbpath=/mongodb/data/db 64-bit host=PRDRISK01-A
2014-12-06T19:05:49.327-0500 [initandlisten]
2014-12-06T19:05:49.327-0500 [initandlisten] ** WARNING: You are running on a NUMA machine.
2014-12-06T19:05:49.327-0500 [initandlisten] **          We suggest disabling NUMA in the machine BIOS
2014-12-06T19:05:49.327-0500 [initandlisten] **          by enabling interleaving to avoid performance problems.
2014-12-06T19:05:49.327-0500 [initandlisten] **          See your BIOS documentation for more information.
2014-12-06T19:05:49.327-0500 [initandlisten]
2014-12-06T19:05:49.327-0500 [initandlisten] targetMinOS: Windows 7/Windows Server 2008 R2
2014-12-06T19:05:49.327-0500 [initandlisten] db version v2.6.4
2014-12-06T19:05:49.374-0500 [initandlisten] git version: 3a830be0eb92d772aa855ebb711ac91d658ee910
2014-12-06T19:05:49.374-0500 [initandlisten] build info: windows sys.getwindowsversion(major=6, minor=1, build=7601, platform=2, service_pack='Service Pack 1') BOOST_LIB_VERSION=1_49
2014-12-06T19:05:49.374-0500 [initandlisten] allocator: system
2014-12-06T19:05:49.374-0500 [initandlisten] options: { config: "C:\mongodb\mongod.cfg", replication: { replSetName: "riskRS01" }, service: true, storage: { dbPath: "/mongodb/data/db" }, systemLog: { destination: "file", logAppend: true, path: "/mongodb/log/mongod.log" } }
2014-12-06T19:05:49.436-0500 [initandlisten] journal dir=/mongodb/data/db\journal
2014-12-06T19:05:49.436-0500 [initandlisten] recover begin
2014-12-06T19:05:49.436-0500 [initandlisten] recover lsn: 152537359
2014-12-06T19:05:49.436-0500 [initandlisten] recover /mongodb/data/db\journal\j._17
2014-12-06T19:05:49.470-0500 [initandlisten] recover skipping application of section seq:97328607 < lsn:152537359
2014-12-06T19:05:49.782-0500 [initandlisten] recover skipping application of section seq:97367021 < lsn:152537359
2014-12-06T19:05:49.907-0500 [initandlisten] recover skipping application of section seq:97405415 < lsn:152537359
2014-12-06T19:05:50.422-0500 [initandlisten] recover skipping application of section seq:97443829 < lsn:152537359
2014-12-06T19:05:50.470-0500 [initandlisten] recover skipping application of section seq:97482233 < lsn:152537359
2014-12-06T19:05:50.643-0500 [initandlisten] recover skipping application of section seq:97520617 < lsn:152537359
2014-12-06T19:05:51.319-0500 [initandlisten] recover skipping application of section seq:97559041 < lsn:152537359
2014-12-06T19:05:51.444-0500 [initandlisten] recover skipping application of section seq:97597436 < lsn:152537359
2014-12-06T19:05:52.118-0500 [initandlisten] recover skipping application of section seq:97635850 < lsn:152537359
2014-12-06T19:05:52.525-0500 [initandlisten] recover skipping application of section more...
2014-12-06T19:06:05.015-0500 [initandlisten] recover cleaning up
2014-12-06T19:06:05.015-0500 [initandlisten] removeJournalFiles
2014-12-06T19:06:05.031-0500 [initandlisten] recover done
2014-12-06T19:06:05.937-0500 [initandlisten] found 1 index(es) that wasn't finished before shutdown
2014-12-06T19:06:05.937-0500 [initandlisten] found 1 index(es) that wasn't finished before shutdown
2014-12-06T19:06:05.937-0500 [initandlisten] found 2 index(es) that wasn't finished before shutdown
2014-12-06T19:06:05.937-0500 [initandlisten] found 1 index(es) that wasn't finished before shutdown
2014-12-06T19:06:05.937-0500 [initandlisten] found 4 index(es) that wasn't finished before shutdown
2014-12-06T19:06:05.937-0500 [initandlisten] found 2 index(es) that wasn't finished before shutdown
2014-12-06T19:06:05.937-0500 [initandlisten] found 1 index(es) that wasn't finished before shutdown
2014-12-06T19:06:05.937-0500 [initandlisten] found 1 index(es) that wasn't finished before shutdown
2014-12-06T19:06:06.297-0500 [initandlisten] waiting for connections on port 27017
2014-12-06T19:06:06.547-0500 [IndexRebuilder] found 1 index(es) that wasn't finished before shutdown
2014-12-06T19:06:06.547-0500 [initandlisten] connection accepted from 172.16.56.77:55311 #1 (1 connection now open)
2014-12-06T19:06:06.578-0500 [initandlisten] connection accepted from 172.16.56.76:49214 #2 (2 connections now open)
2014-12-06T19:06:06.594-0500 [rsStart] replSet I am PRDRISK01-A:27017
2014-12-06T19:06:06.594-0500 [rsHealthPoll] replset info PRDRISK01-B:27017 thinks that we are down
2014-12-06T19:06:06.594-0500 [rsHealthPoll] replSet member PRDRISK01-B:27017 is up
2014-12-06T19:06:06.594-0500 [IndexRebuilder] found 1 interrupted index build(s) on Risk.process_snap
2014-12-06T19:06:06.594-0500 [IndexRebuilder] note: restart the server with --noIndexBuildRetry to skip index rebuilds
2014-12-06T19:06:06.594-0500 [IndexRebuilder] going to rebuild: { v: 1, unique: true, key: { ProcessKey: 1, Timestamp: 1 }, name: "[U] ProcessKey > Timestamp", ns: "Risk.process_snap_build", background: true }
2014-12-06T19:06:06.594-0500 [IndexRebuilder] building index failed: CannotCreateIndex the index spec ns does not match index: { v: 1, unique: true, key: { ProcessKey: 1, Timestamp: 1 }, name: "[U] ProcessKey > Timestamp", ns: "Risk.process_snap_build", background: true }
2014-12-06T19:06:06.594-0500 [IndexRebuilder] found 1 index(es) that wasn't finished before shutdown
2014-12-06T19:06:06.594-0500 [rsHealthPoll] replSet member PRDRISK01-B:27017 is now in state PRIMARY
2014-12-06T19:06:06.609-0500 [IndexRebuilder] found 1 interrupted index build(s) on Risk.coverage
2014-12-06T19:06:06.609-0500 [IndexRebuilder] going to rebuild: { v: 1, unique: true, key: { NodeID: 1, SecurityID: 1 }, name: "[U] NodeID > SecurityID", ns: "Risk.coverage_build", background: true }
2014-12-06T19:06:06.609-0500 [IndexRebuilder] building index failed: CannotCreateIndex the index spec ns does not match index: { v: 1, unique: true, key: { NodeID: 1, SecurityID: 1 }, name: "[U] NodeID > SecurityID", ns: "Risk.coverage_build", background: true }
2014-12-06T19:06:06.609-0500 [IndexRebuilder] found 2 index(es) that wasn't finished before shutdown
2014-12-06T19:06:06.625-0500 [IndexRebuilder] Risk.node_snap warning assertion failure n == 1 src\mongo\db\catalog\index_catalog.cpp 905
2014-12-06T19:06:06.797-0500 [conn2] end connection172.16.56.76:49214 (1 connection now open)
2014-12-06T19:06:06.922-0500 [initandlisten] connection accepted from 172.16.56.77:55312 #3 (2 connections now open)
2014-12-06T19:06:06.937-0500 [conn3] end connection172.16.56.77:55312 (1 connection now open)
2014-12-06T19:06:06.937-0500 [initandlisten] connection accepted from 172.16.56.77:55313 #4 (2 connections now open)
2014-12-06T19:06:08.578-0500 [conn1] end connection172.16.56.77:55311 (1 connection now open)
2014-12-06T19:06:08.578-0500 [initandlisten] connection accepted from 172.16.56.77:55315 #5 (2 connections now open)
2014-12-06T19:06:08.609-0500 [rsHealthPoll] replSet member CS-PRDRISK01:27018 is up
2014-12-06T19:06:08.609-0500 [rsHealthPoll] replSet member CS-PRDRISK01:27018 is now in state ARBITER
2014-12-06T19:06:14.626-0500 [conn5] end connection172.16.56.77:55315 (1 connection now open)
2014-12-06T19:06:14.626-0500 [initandlisten] connection accepted from 172.16.56.77:55321 #6 (2 connections now open)
2014-12-06T19:06:17.005-0500 [conn4] end connection172.16.56.77:55313 (1 connection now open)
2014-12-06T19:06:17.005-0500 [initandlisten] connection accepted from 172.16.56.77:55322 #7 (2 connections now open)
2014-12-06T19:06:21.808-0500 [IndexRebuilder] mongod.exe    ...\src\mongo\util\stacktrace.cpp(169)                           mongo::printStackTrace+0x43
2014-12-06T19:06:21.808-0500 [IndexRebuilder] mongod.exe    ...\src\mongo\util\log.cpp(127)                                  mongo::logContext+0x9c
2014-12-06T19:06:21.808-0500 [IndexRebuilder] mongod.exe    ...\src\mongo\util\assert_util.cpp(102)                          mongo::wasserted+0x1b4
2014-12-06T19:06:21.808-0500 [IndexRebuilder] mongod.exe    ...\src\mongo\db\catalog\index_catalog.cpp(905)                  mongo::IndexCatalog::_deleteIndexFromDisk+0x1f2
2014-12-06T19:06:21.808-0500 [IndexRebuilder] mongod.exe    ...\src\mongo\db\catalog\index_catalog.cpp(935)                  mongo::IndexCatalog::getAndClearUnfinishedIndexes+0x1f9
2014-12-06T19:06:21.808-0500 [IndexRebuilder] mongod.exe    ...\src\mongo\db\index_rebuilder.cpp(103)                        mongo::IndexRebuilder::checkNS+0x248
2014-12-06T19:06:21.808-0500 [IndexRebuilder] mongod.exe    ...\src\mongo\db\index_rebuilder.cpp(67)                         mongo::IndexRebuilder::run+0x214
2014-12-06T19:06:21.808-0500 [IndexRebuilder] mongod.exe    ...\src\mongo\util\background.cpp(147)                           mongo::BackgroundJob::jobBody+0x11c
2014-12-06T19:06:21.808-0500 [IndexRebuilder] mongod.exe    ...\src\third_party\boost\libs\thread\src\win32\thread.cpp(185)  boost::`anonymous namespace'::thread_start_function+0x21
2014-12-06T19:06:21.808-0500 [IndexRebuilder] mongod.exe    f:\dd\vctools\crt_bld\self_64_amd64\crt\src\threadex.c(314)      _callthreadstartex+0x17
2014-12-06T19:06:21.808-0500 [IndexRebuilder] mongod.exe    f:\dd\vctools\crt_bld\self_64_amd64\crt\src\threadex.c(292)      _threadstartex+0x7f
2014-12-06T19:06:21.808-0500 [IndexRebuilder] KERNEL32.DLL                                                                   BaseThreadInitThunk+0xd
2014-12-06T19:06:21.808-0500 [IndexRebuilder] Risk.node_snap
2014-12-06T19:06:21.808-0500 [IndexRebuilder] found 2 interrupted index build(s) on Risk.node_snap
2014-12-06T19:06:21.808-0500 [IndexRebuilder] going to rebuild: { v: 1, unique: true, key: { NodeID: 1, TradeDate: -1 }, name: "[U] NodeID > TradeDate", ns: "Risk.node_snap_build", background: true }
2014-12-06T19:06:21.808-0500 [IndexRebuilder] building index failed: CannotCreateIndex the index spec ns does not match index: { v: 1, unique: true, key: { NodeID: 1, TradeDate: -1 }, name: "[U] NodeID > TradeDate", ns: "Risk.node_snap_build", background: true }
2014-12-06T19:06:21.808-0500 [IndexRebuilder] going to rebuild: { v: 1, key: { TradeDate: -1 }, name: "TradeDate", ns: "Risk.node_snap_build", background: true }
2014-12-06T19:06:21.808-0500 [IndexRebuilder] building index failed: CannotCreateIndex the index spec ns does not match index: { v: 1, key: { TradeDate: -1 }, name: "TradeDate", ns: "Risk.node_snap_build", background: true }
2014-12-06T19:06:21.808-0500 [IndexRebuilder] found 1 index(es) that wasn't finished before shutdown
2014-12-06T19:06:21.808-0500 [IndexRebuilder] found 1 interrupted index build(s) on Risk.gics
2014-12-06T19:06:21.808-0500 [IndexRebuilder] going to rebuild: { v: 1, unique: true, key: { GicLevel: 1, ID: 1 }, name: "[U] GicLevel > ID", ns: "Risk.gics_build", background: true }
2014-12-06T19:06:21.808-0500 [IndexRebuilder] building index failed: CannotCreateIndex the index spec ns does not match index: { v: 1, unique: true, key: { GicLevel: 1, ID: 1 }, name: "[U] GicLevel > ID", ns: "Risk.gics_build", background: true }
2014-12-06T19:06:21.808-0500 [IndexRebuilder] found 4 index(es) that wasn't finished before shutdown
2014-12-06T19:06:21.823-0500 [IndexRebuilder] Risk.rmu_snap warning assertion failure n == 1 src\mongo\db\catalog\index_catalog.cpp 905
2014-12-06T19:06:22.287-0500 [IndexRebuilder] mongod.exe    ...\src\mongo\util\stacktrace.cpp(169)                           mongo::printStackTrace+0x43
2014-12-06T19:06:22.287-0500 [IndexRebuilder] mongod.exe    ...\src\mongo\util\log.cpp(127)                                  mongo::logContext+0x9c
2014-12-06T19:06:22.287-0500 [IndexRebuilder] mongod.exe    ...\src\mongo\util\assert_util.cpp(102)                          mongo::wasserted+0x1b4
2014-12-06T19:06:22.287-0500 [IndexRebuilder] mongod.exe    ...\src\mongo\db\catalog\index_catalog.cpp(905)                  mongo::IndexCatalog::_deleteIndexFromDisk+0x1f2
2014-12-06T19:06:22.287-0500 [IndexRebuilder] mongod.exe    ...\src\mongo\db\catalog\index_catalog.cpp(935)                  mongo::IndexCatalog::getAndClearUnfinishedIndexes+0x1f9
2014-12-06T19:06:22.287-0500 [IndexRebuilder] mongod.exe    ...\src\mongo\db\index_rebuilder.cpp(103)                        mongo::IndexRebuilder::checkNS+0x248
2014-12-06T19:06:22.287-0500 [IndexRebuilder] mongod.exe    ...\src\mongo\db\index_rebuilder.cpp(67)                         mongo::IndexRebuilder::run+0x214
2014-12-06T19:06:22.287-0500 [IndexRebuilder] mongod.exe    ...\src\mongo\util\background.cpp(147)                           mongo::BackgroundJob::jobBody+0x11c
2014-12-06T19:06:22.287-0500 [IndexRebuilder] mongod.exe    ...\src\third_party\boost\libs\thread\src\win32\thread.cpp(185)  boost::`anonymous namespace'::thread_start_function+0x21
2014-12-06T19:06:22.287-0500 [IndexRebuilder] mongod.exe    f:\dd\vctools\crt_bld\self_64_amd64\crt\src\threadex.c(314)      _callthreadstartex+0x17
2014-12-06T19:06:22.287-0500 [IndexRebuilder] mongod.exe    f:\dd\vctools\crt_bld\self_64_amd64\crt\src\threadex.c(292)      _threadstartex+0x7f
2014-12-06T19:06:22.287-0500 [IndexRebuilder] KERNEL32.DLL                                                                   BaseThreadInitThunk+0xd
2014-12-06T19:06:22.287-0500 [IndexRebuilder] Risk.rmu_snap
2014-12-06T19:06:22.303-0500 [IndexRebuilder] rate limiting wassert
2014-12-06T19:06:22.303-0500 [IndexRebuilder] found 4 interrupted index build(s) on Risk.rmu_snap
2014-12-06T19:06:22.303-0500 [IndexRebuilder] going to rebuild: { v: 1, unique: true, key: { NodeID: 1, IsEquity: 1, SecurityID: 1, TradeDate: -1, GroupID: 1, BetID: 1, Strategy: 1 }, name: "[U] NodeID > IsEquity > SecurityID > TradeDate > GroupID > BetID > Strategy", ns: "Risk.rmu_snap_build", background: true }
2014-12-06T19:06:22.303-0500 [IndexRebuilder] building index failed: CannotCreateIndex the index spec ns does not match index: { v: 1, unique: true, key: { NodeID: 1, IsEquity: 1, SecurityID: 1, TradeDate: -1, GroupID: 1, BetID: 1, Strategy: 1 }, name: "[U] NodeID > IsEquity > SecurityID > TradeDate > GroupID > BetID > Strategy", ns: "Risk.rmu_snap_build", background: true }
2014-12-06T19:06:22.303-0500 [IndexRebuilder] going to rebuild: { v: 1, key: { NodeID: 1, IsEquity: 1, TradeDate: -1, GroupID: 1, BetID: 1, Strategy: 1 }, name: "NodeID > IsEquity > TradeDate > GroupID > BetID > Strategy", ns: "Risk.rmu_snap_build", background: true }
2014-12-06T19:06:22.303-0500 [IndexRebuilder] building index failed: CannotCreateIndex the index spec ns does not match index: { v: 1, key: { NodeID: 1, IsEquity: 1, TradeDate: -1, GroupID: 1, BetID: 1, Strategy: 1 }, name: "NodeID > IsEquity > TradeDate > GroupID > BetID > Strategy", ns: "Risk.rmu_snap_build", background: true }
2014-12-06T19:06:22.303-0500 [IndexRebuilder] going to rebuild: { v: 1, key: { SecurityID: 1 }, name: "SecurityID", ns: "Risk.rmu_snap_build", background: true }
2014-12-06T19:06:22.303-0500 [IndexRebuilder] building index failed: CannotCreateIndex the index spec ns does not match index: { v: 1, key: { SecurityID: 1 }, name: "SecurityID", ns: "Risk.rmu_snap_build", background: true }
2014-12-06T19:06:22.303-0500 [IndexRebuilder] going to rebuild: { v: 1, key: { TradeDate: -1 }, name: "TradeDate", ns: "Risk.rmu_snap_build", background: true }
2014-12-06T19:06:22.303-0500 [IndexRebuilder] building index failed: CannotCreateIndex the index spec ns does not match index: { v: 1, key: { TradeDate: -1 }, name: "TradeDate", ns: "Risk.rmu_snap_build", background: true }
2014-12-06T19:06:22.303-0500 [IndexRebuilder] found 2 index(es) that wasn't finished before shutdown
2014-12-06T19:06:22.318-0500 [IndexRebuilder] found 2 interrupted index build(s) on Risk.sec_snap
2014-12-06T19:06:22.318-0500 [IndexRebuilder] going to rebuild: { v: 1, unique: true, key: { SecurityID: 1, TradeDate: -1 }, name: "[U] SecurityID > TradeDate", ns: "Risk.sec_snap_build", background: true }
2014-12-06T19:06:22.318-0500 [IndexRebuilder] building index failed: CannotCreateIndex the index spec ns does not match index: { v: 1, unique: true, key: { SecurityID: 1, TradeDate: -1 }, name: "[U] SecurityID > TradeDate", ns: "Risk.sec_snap_build", background: true }
2014-12-06T19:06:22.318-0500 [IndexRebuilder] going to rebuild: { v: 1, key: { TradeDate: -1 }, name: "TradeDate", ns: "Risk.sec_snap_build", background: true }
2014-12-06T19:06:22.318-0500 [IndexRebuilder] building index failed: CannotCreateIndex the index spec ns does not match index: { v: 1, key: { TradeDate: -1 }, name: "TradeDate", ns: "Risk.sec_snap_build", background: true }
2014-12-06T19:06:22.318-0500 [IndexRebuilder] found 1 index(es) that wasn't finished before shutdown
2014-12-06T19:06:22.318-0500 [IndexRebuilder] found 1 interrupted index build(s) on Risk.currency_snap
2014-12-06T19:06:22.318-0500 [IndexRebuilder] going to rebuild: { v: 1, unique: true, key: { TradeDate: -1, CurrencyCode: 1, ReferenceCurrencyCode: 1 }, name: "[U] TradeDate > CurrencyCode > ReferenceCurrencyCode", ns: "Risk.currency_snap_build", background: true }
2014-12-06T19:06:22.318-0500 [IndexRebuilder] building index failed: CannotCreateIndex the index spec ns does not match index: { v: 1, unique: true, key: { TradeDate: -1, CurrencyCode: 1, ReferenceCurrencyCode: 1 }, name: "[U] TradeDate > CurrencyCode > ReferenceCurrencyCode", ns: "Risk.currency_snap_build", background: true }
2014-12-06T19:06:22.318-0500 [IndexRebuilder] found 1 index(es) that wasn't finished before shutdown
2014-12-06T19:06:22.334-0500 [IndexRebuilder] found 1 interrupted index build(s) on Risk.sec
2014-12-06T19:06:22.334-0500 [IndexRebuilder] going to rebuild: { v: 1, key: { Symbol: 1 }, name: "Symbol", ns: "Risk.sec_build", background: true }
2014-12-06T19:06:22.334-0500 [IndexRebuilder] building index failed: CannotCreateIndex the index spec ns does not match index: { v: 1, key: { Symbol: 1 }, name: "Symbol", ns: "Risk.sec_build", background: true }
2014-12-06T19:06:22.334-0500 [rsStart] replSet STARTUP2
2014-12-06T19:06:22.334-0500 [rsSync] replSet still syncing, not yet to minValid optime 5482e7ef:431e
2014-12-06T19:06:22.334-0500 [rsBackgroundSync] replSet syncing to: PRDRISK01-B:27017
2014-12-06T19:06:22.349-0500 [rsBackgroundSync] replset setting syncSourceFeedback to PRDRISK01-B:27017
2014-12-06T19:06:23.325-0500 [rsSync] replSet still syncing, not yet to minValid optime 5482e7ef:431e
2014-12-06T19:06:31.770-0500 [rsSync] replSet SECONDARY



At a quick glance, it appears that you are naming your index (the one causing a problem):  "[U] Name > Name2 > etc " - I wonder if that may be related to the problem...

Regardless, there probably should be a SERVER ticket filed against this as the server should enforce not allowing naming indexes anything that it wouldn't be able to handle (or we should fix it so that those characters don't cause a problem).  That is, if the naming is related to the problem...

Actually, if you can go to the primary, can you switch to this DB (Risk) and run
db.system.indexes.find()

What does the output look like?



The indices work on the primary (meaning, no errors and queries appear to be optimized).

Here is the output.


/* 0 */
{
    "v" : 1,
    "key" : {
        "_id" : 1
    },
    "name" : "_id_",
    "ns" : "Risk.country"
}
/* 1 */
{
    "v" : 1,
    "unique" : true,
    "key" : {
        "Code" : 1
    },
    "name" : "[U] Code",
    "ns" : "Risk.country",
    "background" : true
}
/* 2 */
{
    "v" : 1,
    "key" : {
        "_id" : 1
    },
    "name" : "_id_",
    "ns" : "Risk.coverage"
}
/* 3 */
{
    "v" : 1,
    "unique" : true,
    "key" : {
        "NodeID" : 1,
        "SecurityID" : 1
    },
    "name" : "[U] NodeID > SecurityID",
    "ns" : "Risk.coverage",
    "background" : true
}
/* 4 */
{
    "v" : 1,
    "key" : {
        "_id" : 1
    },
    "name" : "_id_",
    "ns" : "Risk.currency_snap"
}
/* 5 */
{
    "v" : 1,
    "unique" : true,
    "key" : {
        "TradeDate" : -1,
        "CurrencyCode" : 1,
        "ReferenceCurrencyCode" : 1
    },
    "name" : "[U] TradeDate > CurrencyCode > ReferenceCurrencyCode",
    "ns" : "Risk.currency_snap",
    "background" : true
}
/* 6 */
{
    "v" : 1,
    "key" : {
        "_id" : 1
    },
    "name" : "_id_",
    "ns" : "Risk.gics"
}
/* 7 */
{
    "v" : 1,
    "unique" : true,
    "key" : {
        "GicLevel" : 1,
        "ID" : 1
    },
    "name" : "[U] GicLevel > ID",
    "ns" : "Risk.gics",
    "background" : true
}
/* 8 */
{
    "v" : 1,
    "key" : {
        "_id" : 1
    },
    "name" : "_id_",
    "ns" : "Risk.node_snap"
}
/* 9 */
{
    "v" : 1,
    "unique" : true,
    "key" : {
        "NodeID" : 1,
        "TradeDate" : -1
    },
    "name" : "[U] NodeID > TradeDate",
    "ns" : "Risk.node_snap",
    "background" : true
}
/* 10 */
{
    "v" : 1,
    "key" : {
        "TradeDate" : -1
    },
    "name" : "TradeDate",
    "ns" : "Risk.node_snap",
    "background" : true
}
/* 11 */
{
    "v" : 1,
    "key" : {
        "_id" : 1
    },
    "name" : "_id_",
    "ns" : "Risk.process_snap"
}
/* 12 */
{
    "v" : 1,
    "unique" : true,
    "key" : {
        "ProcessKey" : 1,
        "Timestamp" : 1
    },
    "name" : "[U] ProcessKey > Timestamp",
    "ns" : "Risk.process_snap",
    "background" : true
}
/* 13 */
{
    "v" : 1,
    "key" : {
        "_id" : 1
    },
    "name" : "_id_",
    "ns" : "Risk.sec"
}
/* 14 */
{
    "v" : 1,
    "unique" : true,
    "key" : {
        "SecurityID" : 1
    },
    "name" : "[U] SecurityID",
    "ns" : "Risk.sec",
    "background" : true
}
/* 15 */
{
    "v" : 1,
    "key" : {
        "Symbol" : 1
    },
    "name" : "Symbol",
    "ns" : "Risk.sec",
    "background" : true
}
/* 16 */
{
    "v" : 1,
    "key" : {
        "_id" : 1
    },
    "name" : "_id_",
    "ns" : "Risk.sector"
}
/* 17 */
{
    "v" : 1,
    "unique" : true,
    "key" : {
        "ID" : 1
    },
    "name" : "[U] ID",
    "ns" : "Risk.sector",
    "background" : true
}
/* 18 */
{
    "v" : 1,
    "key" : {
        "_id" : 1
    },
    "name" : "_id_",
    "ns" : "Risk.sec_snap"
}
/* 19 */
{
    "v" : 1,
    "unique" : true,
    "key" : {
        "SecurityID" : 1,
        "TradeDate" : -1
    },
    "name" : "[U] SecurityID > TradeDate",
    "ns" : "Risk.sec_snap",
    "background" : true
}
/* 20 */
{
    "v" : 1,
    "key" : {
        "TradeDate" : -1
    },
    "name" : "TradeDate",
    "ns" : "Risk.sec_snap",
    "background" : true
}
/* 21 */
{
    "v" : 1,
    "key" : {
        "_id" : 1
    },
    "name" : "_id_",
    "ns" : "Risk.rmu_snap"
}
/* 22 */
{
    "v" : 1,
    "unique" : true,
    "key" : {
        "NodeID" : 1,
        "IsEquity" : 1,
        "SecurityID" : 1,
        "TradeDate" : -1,
        "GroupID" : 1,
        "BetID" : 1,
        "Strategy" : 1
    },
    "name" : "[U] NodeID > IsEquity > SecurityID > TradeDate > GroupID > BetID > Strategy",
    "ns" : "Risk.rmu_snap",
    "background" : true
}
/* 23 */
{
    "v" : 1,
    "key" : {
        "NodeID" : 1,
        "IsEquity" : 1,
        "TradeDate" : -1,
        "GroupID" : 1,
        "BetID" : 1,
        "Strategy" : 1
    },
    "name" : "NodeID > IsEquity > TradeDate > GroupID > BetID > Strategy",
    "ns" : "Risk.rmu_snap",
    "background" : true
}
/* 24 */
{
    "v" : 1,
    "key" : {
        "SecurityID" : 1
    },
    "name" : "SecurityID",
    "ns" : "Risk.rmu_snap",
    "background" : true
}
/* 25 */
{
    "v" : 1,
    "key" : {
        "TradeDate" : -1
    },
    "name" : "TradeDate",
    "ns" : "Risk.rmu_snap",
    "background" : true
}


Thanks, this might help in trying to build a reproducer.

Can you also include more of the original log where things crashed?  In particular I'm interested in replication worker 18...



I posted the entirety of the log for 18. In addition, this entire process works at first. Meaning, when I run it the first time, everything works fine. It's on the weekend when it's run the second time it blows up.



Can you clarify what's different in the entire process when you run it the first time?  (i.e. when it works)?

When you run it for the second time, what exactly is different?



You bet,
  1. We have a process written in c# that programmatically builds the collection.
    1. Create the collection with a suffix of '<collection name>_build'.
    2. Insert data into '<collection name>_build'
    3. Create indices on '<collection name>_build'
    4. Rename collection by dropping the '_build' suffix to put the collection into production.
  2. We build it the first time just to get things started.
  3. During the week we perform incremental data updates. (upserts)
  4. On Saturday morning we run the same process we used in bullet point 1 above to essentially 'rebuild' it. It's this rebuild that is blowing up. The first one works great! but subsequent runs appear to be blowing things up. The only thing we do differently during the second run is that we drop the original collection before renaming it outlined in step 1.4 above.
I'm really wondering if it's the background indices... that and the name of the indices that has changed recently. Keep in mind, this was all working great in the past.



Yes, I know exactly what must be going on - or at least something somewhat similar to what you describe.  I think you may be renaming the collection before the indexes are built which cases backwards order of things on the secondary.

It may be similar to https://jira.mongodb.org/browse/SERVER-15393 - does that description seem to fit the sequence of events you're seeing?



I'm not sure how you can resolve all of this BUT by removing the IsBackground flag from the index has resolved our problem. It's a bit of a nasty bug as it ultimately destroys the underlying dataset but it is definitely related to the IsBackground flag on the indices.



background index build allows other things to be happening, including things that might drop or rename the collection, etc.
foreground index builds block other operations so there is no chance
of any racy happenings that make the world invalid for this index...


댓글 없음:

댓글 쓰기