← Back to team overview

agesys-dev team mailing list archive

[Bug 1046706] Re: Prepared statements used for all queries

 

BrindleFly, there are some advantages to prepared statements--notably
the streaming string/binary value streaming.  That said, this
performance issue is a major headache.  I've read through the code and
implemented a statement cache.  This should allow us to reap the
perfomance benefits of prepared queries for applications which repeated
statements.  The cache exists at the cursor level. If the application
executes identical statements (presumably with different values) against
the same cursor then this patch should eleminate the double network
command penalty that oursql sees, and should see decreased CPU
utilization because the server doesn't need to prepare the query again
and the client doesn't need to escape values.  That said, if the
application recreates the cursor between queries or uses a large number
of non-identical statements w/ small values then a mysql connector which
uses the mysql_real_query interface (like MySQLdb) rather than
mysql_stmt_* interface  may be a better fit.


Essentially this patch saves the _statement objects into a fixed size LRU queue.  If the query is identical, it will issue a free the results and reuse the statement.  Because the statements are not immediately closed, result sets which have not been completly retrieved will remain allocated on the server side until the statement is reused or the statement falls off the end of the cache.  In the case of large data sets this can be a substantial burden to the server. It is not enabled by default.

I'm still doing some testing for edge cases, but it seems to give a
pretty substantial performance increase.

# Test table
CREATE TABLE `t1` (
  `id` int(11) NOT NULL
) ENGINE=BLACKHOLE DEFAULT CHARSET=latin1;

# oursql.wo.cache.py - Without Cache
import oursql
db = oursql.connect(host='localhost', user='root', db='test')
c = db.cursor()
for i in range(100000):
    c.execute('INSERT INTO `t1` VALUES (?)',(i,))

(venv)Codys-MacBook-Air-2:oursql-0.9.3.1 cody$ time python
~/Desktop/oursql.wo.cache.py

real	0m10.896s
user	0m1.915s
sys	0m3.542s


# oursql.w.cache.py - With statement cache 
import oursql
db = oursql.connect(host='localhost', user='root', db='test')
c = db.cursor(statement_cache_size=5)
for i in range(100000):
    c.execute('INSERT INTO `t1` VALUES (?)',(i,))

(venv)Codys-MacBook-Air-2:oursql-0.9.3.1 cody$ time python
~/Desktop/oursql.w.cache.py

real	0m6.401s
user	0m1.438s
sys	0m1.952s


** Patch added: "oursql.statement.cache.patch"
   https://bugs.launchpad.net/oursql/+bug/1046706/+attachment/3467272/+files/oursql.statement.cache.patch

-- 
You received this bug notification because you are a member of Agesys
Team, which is subscribed to oursql.
https://bugs.launchpad.net/bugs/1046706

Title:
  Prepared statements used for all queries

Status in oursql:
  New

Bug description:
  I am new to Python and oursql. In diagnosing why queries were so much
  slower in a Python vs. Ruby application, I noticed you are creating,
  using and then discarding prepared statements for EVERY query (see
  cursor.execute). As best as I can tell, this is to avoid dealing with
  the string interpolation of SQL arguments (e.g. which seems to be
  handled with varying degrees of success in MySQLdb module). It seems
  though this is a defect, since the two step process of a prepared
  statement adds overhead that can be substantial in some use cases.
  This overhead is typically only of use if you plan on actually re-
  using the prepared statement.

  Assuming my understanding above is correct, I would suggest the
  current interface should always be dynamic, since you do not intend to
  cache statements.

  I suspect most Python programmers who switched from MySQLdb do not
  realize the additional overhead they incur for executing a query,
  since this module defaults to dynamic SQL.

To manage notifications about this bug go to:
https://bugs.launchpad.net/oursql/+bug/1046706/+subscriptions


References