2014년 12월 16일 화요일

Aggregation performance much worse with indexes

I am experiencing an odd behavior with aggregations and indexes. Without the indexes on the 'comp_date' field, the aggregation runs reasonably fine (~2 sec); however with the indexes the aggregation can take over 30 seconds.

Using 2.6.4, and I tried various combinations with $project and $sort before the $match in the pipeline.

Thank you for any help.

AGGREGATION QUERY:

db.completions.aggregate([
  {
    $match: {
      comp_date: {
        $gte: new Date(2012,0,1),
        $lte: new Date(2014,11,30)
      }
    }
  },
  {
    $group: {
      _id: {
        year: { $year: "$comp_date" },
        month: { $month: "$comp_date" } 
      },
      count: { $sum: 1 } 
    }  
  },
  {
    $sort: { '_id.year': 1, '_id.month': 1 } 
  }]
)

When I run with the 'explain: true', I either get the following:
(1) "planError": "InternalError No plan available to provide stats"
(2) Notice no index after 'BtreeCursor':
  "plan": {
          "cursor": "BtreeCursor ",
          "isMultiKey": false,
          "scanAndOrder": false,
          "indexBounds": {
            "comp_date": [
              [
                ISODate("2014-11-29T18:00:00-0600"),
                ISODate("2012-11-30T18:00:00-0600")
              ]
            ]
          }

When I run just the '$match' portion of the query I get expected response times.

db.completions.find({comp_date: {$gte: new Date(2012,0,1),$lte: new Date(2014,11,30)}},{_id:0, comp_date: 1}).explain()

{
  "cursor": "BtreeCursor comp_date_1",
  "isMultiKey": false,
  "n": 64844,
  "nscannedObjects": 0,
  "nscanned": 64844,
  "nscannedObjectsAllPlans": 0,
  "nscannedAllPlans": 64946,
  "scanAndOrder": false,
  "indexOnly": true,
  "nYields": 506,
  "nChunkSkips": 0,
  "millis": 47,
  "indexBounds": {
    "comp_date": [
      [
        ISODate("2012-01-01T00:00:00-0600"),
        ISODate("2014-12-30T00:00:00-0600")
      ]
    ]
  },
  "server": "c40.candidate.46:10040",
  "filterSet": false
}

DETAILS ON THE COLLECTION:

db.completions.stats()

{
  "ns": "completed_projects.completions",
  "count": 77493,
  "size": 1037243568,
  "avgObjSize": 13384,
  "storageSize": 1071263744,
  "numExtents": 18,
  "nindexes": 8,
  "lastExtentSize": 280522752,
  "paddingFactor": 1,
  "systemFlags": 0,
  "userFlags": 1,
  "totalIndexSize": 18068960,
  "indexSizes": {
    "_id_": 2534560,
    "location_2d": 2003120,
    "my_id_1": 2583616,
    "purpose_1": 1970416,
    "tracking_no_1": 1994944,
    "status_1": 2191168,
    "comp_date_1": 1954064,
    "comp_date_-1": 2837072
  },
  "ok": 1
}




I am experiencing an odd behavior with aggregations and indexes. Without the indexes on the 'comp_date' field, the aggregation runs reasonably fine (~2 sec); however with the indexes the aggregation can take over 30 seconds.

Using 2.6.4, and I tried various combinations with $project and $sort before the $match in the pipeline.


Hi Boyd,


When I run with the 'explain: true', I either get the following:
(1) "planError": "InternalError No plan available to provide stats"


This is a known issue in 2.6 when trying an aggregation explain on an uncached plan: https://jira.mongodb.org/browse/SERVER-14636. A fix was committed against 2.7.1 and a backport has been requested but not confirmed yet.



(2) Notice no index after 'BtreeCursor':


This is also a known issue in 2.6 when an aggregation explain is run on a cached plan: https://jira.mongodb.org/browse/SERVER-14628.



When I run just the '$match' portion of the query I get expected response times.
db.completions.find({comp_date: {$gte: new Date(2012,0,1),$lte: new Date(2014,11,30)}},{_id:0, comp_date: 1}).explain() 


Can you provide the output of this query using explain(true) and without the projection, eg:
 db.completions.find({comp_date: {$gte: new Date(2012,0,1),$lte: new Date(2014,11,30)}}).explain(true)

The explain(true) will show details of all candidate plans considered rather than only the winning plan chosen.

Can you also provide the output of: db.completions.getIndexes()?


If you have access to a test/QA environment it would be worth upgrading to 2.6.6 to see if this improves the query performance. There have been some tweaks to the query planner since 2.6.4, and your issue may be related to a bad query plan being selected or cached.



Thank you, Stephen. 

I have not had a chance to try 2.6.6 - it is still on the list.

Answer's to your requests are below:

=======================================================
#1
=======================================================

db.completions.find({comp_date: {$gte: new Date(2012,0,1),$lte: new Date(2014,11,30)}}).explain(true)

{
  "cursor": "BtreeCursor comp_date_-1",
  "isMultiKey": false,
  "n": 65214,
  "nscannedObjects": 65214,
  "nscanned": 65214,
  "nscannedObjectsAllPlans": 65315,
  "nscannedAllPlans": 65316,
  "scanAndOrder": false,
  "indexOnly": false,
  "nYields": 7854,
  "nChunkSkips": 0,
  "millis": 8755,
  "indexBounds": {
    "comp_date": [
      [
        ISODate("2014-12-30T00:00:00-0600"),
        ISODate("2012-01-01T00:00:00-0600")
      ]
    ]
  },
  "allPlans": [
    {
      "cursor": "BtreeCursor comp_date_-1",
      "isMultiKey": false,
      "n": 65214,
      "nscannedObjects": 65214,
      "nscanned": 65214,
      "scanAndOrder": false,
      "indexOnly": false,
      "nChunkSkips": 0,
      "indexBounds": {
        "comp_date": [
          [
            ISODate("2014-12-30T00:00:00-0600"),
            ISODate("2012-01-01T00:00:00-0600")
          ]
        ]
      }
    },
    {
      "cursor": "BtreeCursor comp_date_-1_customer_name_1",
      "isMultiKey": false,
      "n": 101,
      "nscannedObjects": 101,
      "nscanned": 102,
      "scanAndOrder": false,
      "indexOnly": false,
      "nChunkSkips": 0,
      "indexBounds": {
        "comp_date": [
          [
            ISODate("2014-12-30T00:00:00-0600"),
            ISODate("2012-01-01T00:00:00-0600")
          ]
        ],
        "customer_name": [
          [
            {
              "$minElement": 1
            },
            {
              "$maxElement": 1
            }
          ]
        ]
      }
    }
  ],
  "server": "server_name",
  "filterSet": false,
  "stats": {
    "type": "FETCH",
    "works": 72962,
    "yields": 7854,
    "unyields": 7854,
    "invalidates": 0,
    "advanced": 65214,
    "needTime": 1,
    "needFetch": 7746,
    "isEOF": 1,
    "alreadyHasObj": 0,
    "forcedFetches": 0,
    "matchTested": 0,
    "children": [
      {
        "type": "IXSCAN",
        "works": 65215,
        "yields": 7854,
        "unyields": 7854,
        "invalidates": 0,
        "advanced": 65214,
        "needTime": 1,
        "needFetch": 0,
        "isEOF": 1,
        "keyPattern": "{ comp_date: -1 }",
        "isMultiKey": 0,
        "boundsVerbose": "field #0['comp_date']: [new Date(1419919200000), new Date(1325397600000)]",
        "yieldMovedCursor": 0,
        "dupsTested": 0,
        "dupsDropped": 0,
        "seenInvalidated": 0,
        "matchTested": 0,
        "keysExamined": 65214,
        "children": [ ]
      }
    ]
  }
}

=======================================================
#2
=======================================================

db.completions.getIndexes()
[
  {
    "v": 1,
    "key": {
      "_id": 1
    },
    "name": "_id_",
    "ns": "db_name.completions"
  },
  {
    "v": 1,
    "key": {
      "location": "2d"
    },
    "name": "location_2d",
    "ns": "db_name.completions"
  },
  {
    "v": 1,
    "key": {
      "my_id": 1
    },
    "name": "my_id_1",
    "ns": "db_name.completions"
  },
  {
    "v": 1,
    "key": {
      "purpose": 1
    },
    "name": "purpose_1",
    "ns": "db_name.completions"
  },
  {
    "v": 1,
    "key": {
      "tracking_no": 1
    },
    "name": "tracking_no_1",
    "ns": "db_name.completions"
  },
  {
    "v": 1,
    "key": {
      "status": 1
    },
    "name": "status_1",
    "ns": "db_name.completions"
  },
  {
    "v": 1,
    "key": {
      "comp_date": -1
    },
    "name": "comp_date_-1",
    "ns": "db_name.completions"
  },
  {
    "v": 1,
    "key": {
      "comp_date": -1,
      "customer_name": 1
    },
    "name": "comp_date_-1_customer_name_1",
    "ns": "db_name.completions"
  }

]



Index is hurting your performance because your aggregation is matching over 80% of your total data.
So now in order to avoid reading <20% of your data the query has to read the entire index and then still the same amount of data to process.
Indexes help you with selective queries, not ones that end up getting most of the data set


댓글 없음:

댓글 쓰기