2014년 12월 30일 화요일

MongoDB: It takes quite a long time to save data sometimes using pymongo in Python2.7

 have used pymongo to write data into MongoDB in a project. In this project, we write once every 10minutes.

    for i in range(3):
        try:
            conn = self.client
            db = conn[self.db]
            db[collection_name].save(kwargs)
            break
        except AutoReconnect, error:
            import time; time.sleep(1)
But I got the AutoReconnect exception on a regular basis. And a funny thing is that it becomes normal if attempt to write data every 30 seconds. So I decided to use cProfile to find which function used so much time. As a result, we got the following info (just a snippet).
Ordered by: standard name

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    2    0.000    0.000    0.000    0.000 collection.py:182(database)
    2    0.000    0.000  931.184  465.592 collection.py:221(save)
    1    0.000    0.000  931.167  931.167 collection.py:291(insert)
    2    0.000    0.000    0.000    0.000 collection.py:378(gen)
    1    0.000    0.000    0.017    0.017 collection.py:422(update)
    2    0.000    0.000    0.000    0.000 collection.py:51(__init__)
    4    0.000    0.000    0.000    0.000 common.py:166(validate_positive_float)
    4    0.000    0.000    0.000    0.000 common.py:193(validate_read_preference)
    4    0.000    0.000    0.000    0.000 common.py:206(validate_tag_sets)
    4    0.000    0.000    0.000    0.000 common.py:250(validate_uuid_subtype)
    4    0.000    0.000    0.000    0.000 common.py:330(__init__)
    4    0.000    0.000    0.000    0.000 common.py:351(__init__)
    4    0.000    0.000    0.000    0.000 common.py:393(__set_options)
    4    0.000    0.000    0.000    0.000 common.py:431(__get_write_concern)
    4    0.000    0.000    0.000    0.000 common.py:493(__get_slave_okay)
    4    0.000    0.000    0.000    0.000 common.py:511(__get_read_pref)
    4    0.000    0.000    0.000    0.000 common.py:527(__get_acceptable_latency)
    4    0.000    0.000    0.000    0.000 common.py:552(__get_tag_sets)
    6    0.000    0.000    0.000    0.000 common.py:574(__get_uuid_subtype)
    6    0.000    0.000    0.000    0.000 common.py:589(__get_safe)
    2    0.000    0.000    0.000    0.000 common.py:678(_get_write_mode)
    2    0.000    0.000    0.000    0.000 common.py:692(pop1)
    8    0.000    0.000    0.000    0.000 common.py:73(validate_boolean)
    2    0.000    0.000    0.000    0.000 database.py:122(connection)
    2    0.000    0.000    0.000    0.000 database.py:131(name)
    2    0.000    0.000    0.000    0.000 database.py:193(__getattr__)
    2    0.000    0.000    0.000    0.000 database.py:203(__getitem__)
    2    0.000    0.000    0.000    0.000 database.py:250(_apply_incoming_manipulators)
    2    0.000    0.000    0.000    0.000 database.py:255(_apply_incoming_copying_manipulators)
    1    0.000    0.000    0.000    0.000 database.py:260(_fix_incoming)
    2    0.000    0.000    0.000    0.000 database.py:34(_check_name)
    2    0.000    0.000    0.000    0.000 database.py:50(__init__)
    1    0.000    0.000    0.000    0.000 encoding.py:114(_safe_str)
    1    0.000    0.000    0.000    0.000 encoding.py:31(get_default_encoding_file)
    1    0.000    0.000    0.000    0.000 encoding.py:41(default_encoding)
    1    0.000    0.000    0.000    0.000 encoding.py:77(bytes_to_str)
    1    0.000    0.000    0.000    0.000 encoding.py:96(safe_str)
    1    0.000    0.000    0.000    0.000 errors.py:47(__init__)
    1    0.000    0.000    0.000    0.000 formatters.py:13(format)
    1    0.000    0.000    0.000    0.000 genericpath.py:85(_splitext)
    1    0.000    0.000  932.185  932.185 handlers.py:69(_emit)
    1    0.000    0.000  932.186  932.186 handlers.py:90(emit)
    4    0.000    0.000    0.000    0.000 helpers.py:122(_check_command_response)
    4    0.000    0.000    0.000    0.000 helpers.py:76(_unpack_response)
    1    0.000    0.000    0.000    0.000 log.py:137(format)
    1    0.000    0.000    0.000    0.000 log.py:270(makeRecord)
    1    0.000    0.000    0.000    0.000 member.py:100(is_arbiter)
    1    0.000    0.000    0.000    0.000 member.py:41(__init__)
    1    0.000    0.000    0.000    0.000 mongo_client.py:1027(__check_response_to_last_error)
    2    0.000    0.000    0.000    0.000 mongo_client.py:1068(__check_bson_size)
    2    0.000    0.000  931.175  465.588 mongo_client.py:1089(_send_message)
    9    0.000    0.000  931.176  103.464 mongo_client.py:1138(__receive_data_on_socket)
    5    0.000    0.000  931.176  186.235 mongo_client.py:1153(__receive_message_on_socket)


Please take a look at the last but one line, __receive_data_on_socket() takes about 103 seconds to receive data. And later, I found that the first socket.recv takes almost all of the time, about 900 secondsand finally raised Connection Timed out exception, which in turn the AutoReConnect exception raised. Note that I have not set connection timeout and socket timeout when initializing MongoClient instance.
It was awesome! 3 questions here
  1. Why socket.recv() would raise connection timeout but the prior sock.sendall() succeed?
  2. Why it is 9xx seconds, not others? No connection timeout has been set.
  3. How MongoDB handle the connection that does not alive (do not send/receive data) in a long time (like 10 minutes)?
The related source code as following shown

[mongo_client.py, Line 1108]

    def _send_message(...):
        ......
        sock_info 
= self.__socket(member)
        
try:
            
try:
                
(request_id, data) = self.__check_bson_size(message)
                sock_info
.sock.sendall(data)

                
# Safe mode. We pack the message together with a lastError
                
# message and send both. We then get the response (to the
                
# lastError) and raise OperationFailure if it is an error
                
# response.

                rv 
= None
                
if with_last_error:
                    response 
= self.__receive_message_on_socket(1,request_id,
                                                                sock_info
)
                .....

[mongo_client.py, Line 1150]

    def __receive_message_on_socket(self, operation, rqst_id, sock_info):
        """Receive a message in response to `rqst_id` on `sock`.

        Returns the response data with the header removed.
        """
        header = self.__receive_data_on_socket(16, sock_info)
        ......

[mongo_client.py, Line 1135]
    def __receive_data_on_socket(self, length, sock_info):
        """Lowest level receive operation.

        Takes length to receive and repeatedly calls recv until able to
        return a buffer of that length, raising ConnectionFailure on error.
        """
        message = EMPTY
        while length:
            chunk = sock_info.sock.recv(length)
            ......



Anybody could help to take a look?




Does it because the underlying socket become non-alive after a period of time?



You simply didn't provide enough information for anyone to be able to help you.

You have this line in the original post:

                    db[collection_name].save(kwargs)

What exactly is being sent to the server via the save method?

Without knowing more about what you are sending to the database, we
would be guessing what's happening there.



@Asya, the content sent by db[collection_name].save(kwargs)  is a simple dict like this

                           { 'message': 'start...', 'time': datetime.datetime.now() }

I think this issue is caused by the long-lived idle connection, but I have no enough knowledge to make it clear.


댓글 없음:

댓글 쓰기