agesys-dev team mailing list archive
-
agesys-dev team
-
Mailing list archive
-
Message #00119
[Bug 1046706] Re: Prepared statements used for all queries
BrindleFly, there are some advantages to prepared statements--notably
the streaming string/binary value streaming. That said, this
performance issue is a major headache. I've read through the code and
implemented a statement cache. This should allow us to reap the
perfomance benefits of prepared queries for applications which repeated
statements. The cache exists at the cursor level. If the application
executes identical statements (presumably with different values) against
the same cursor then this patch should eleminate the double network
command penalty that oursql sees, and should see decreased CPU
utilization because the server doesn't need to prepare the query again
and the client doesn't need to escape values. That said, if the
application recreates the cursor between queries or uses a large number
of non-identical statements w/ small values then a mysql connector which
uses the mysql_real_query interface (like MySQLdb) rather than
mysql_stmt_* interface may be a better fit.
Essentially this patch saves the _statement objects into a fixed size LRU queue. If the query is identical, it will issue a free the results and reuse the statement. Because the statements are not immediately closed, result sets which have not been completly retrieved will remain allocated on the server side until the statement is reused or the statement falls off the end of the cache. In the case of large data sets this can be a substantial burden to the server. It is not enabled by default.
I'm still doing some testing for edge cases, but it seems to give a
pretty substantial performance increase.
# Test table
CREATE TABLE `t1` (
`id` int(11) NOT NULL
) ENGINE=BLACKHOLE DEFAULT CHARSET=latin1;
# oursql.wo.cache.py - Without Cache
import oursql
db = oursql.connect(host='localhost', user='root', db='test')
c = db.cursor()
for i in range(100000):
c.execute('INSERT INTO `t1` VALUES (?)',(i,))
(venv)Codys-MacBook-Air-2:oursql-0.9.3.1 cody$ time python
~/Desktop/oursql.wo.cache.py
real 0m10.896s
user 0m1.915s
sys 0m3.542s
# oursql.w.cache.py - With statement cache
import oursql
db = oursql.connect(host='localhost', user='root', db='test')
c = db.cursor(statement_cache_size=5)
for i in range(100000):
c.execute('INSERT INTO `t1` VALUES (?)',(i,))
(venv)Codys-MacBook-Air-2:oursql-0.9.3.1 cody$ time python
~/Desktop/oursql.w.cache.py
real 0m6.401s
user 0m1.438s
sys 0m1.952s
** Patch added: "oursql.statement.cache.patch"
https://bugs.launchpad.net/oursql/+bug/1046706/+attachment/3467272/+files/oursql.statement.cache.patch
--
You received this bug notification because you are a member of Agesys
Team, which is subscribed to oursql.
https://bugs.launchpad.net/bugs/1046706
Title:
Prepared statements used for all queries
Status in oursql:
New
Bug description:
I am new to Python and oursql. In diagnosing why queries were so much
slower in a Python vs. Ruby application, I noticed you are creating,
using and then discarding prepared statements for EVERY query (see
cursor.execute). As best as I can tell, this is to avoid dealing with
the string interpolation of SQL arguments (e.g. which seems to be
handled with varying degrees of success in MySQLdb module). It seems
though this is a defect, since the two step process of a prepared
statement adds overhead that can be substantial in some use cases.
This overhead is typically only of use if you plan on actually re-
using the prepared statement.
Assuming my understanding above is correct, I would suggest the
current interface should always be dynamic, since you do not intend to
cache statements.
I suspect most Python programmers who switched from MySQLdb do not
realize the additional overhead they incur for executing a query,
since this module defaults to dynamic SQL.
To manage notifications about this bug go to:
https://bugs.launchpad.net/oursql/+bug/1046706/+subscriptions
References