Monday, June 27, 2011

Why did it take 0.43 seconds to insert one row?

Jonathan O'Keeffe (jpo) posted this comment on Stupid Error Messages last week:

"Why did it take 0.43 seconds to insert one row? Is there something special about uniqueidentifier that makes this slow?"
The short answers are "it doesn't" and "no".

Jonathan was referring to this output copied from the "SQL Statements" and "Results - Messages" panes displayed by SQL Anywhere's dbisql utility:


INSERT t VALUES ( '4b67abbe3c9246f883db0ab2d2b78f8b' );

Execution time: 0.43 seconds
1 row(s) inserted

I remember thinking at the time "That's an awfully big number!" but I left it in because that's what was actually displayed.

Or was it?

I can't remember (or find) the database where I ran that test, or even remember what version of SQL Anywhere was used.

But wait, it gets worse... I can't reproduce anything that looks even remotely close to that output. The order of the "Execution time:" and "1 row(s) inserted" lines in versions 9 and 10 is different from versions 11 and 12, but in all four versions there are two "Execution time:" numbers displayed because there are two statements executed, CREATE TABLE and INSERT:

Here is the long answer to Jonathan's questions...
  • The "Execution time: 0.43 seconds" is probably a copy-and-paste mistake. It might be for the CREATE TABLE... it could also be for the INSERT, but it probably isn't.

  • No, there is nothing special about the UNIQUEIDENTIFIER data type that makes it slow... under the covers it's just another steenking BINARY string.

  • As far as I know (can't find it in the docs) the "Execution time:" number is a notoriously unreliable client-side view of how long something takes.

  • No matter how fast the client-server interface is, it adds considerable overhead (relatively speaking) to very fast operations like INSERT. For example, if you have a million rows to insert and you care about performance, don't code a client-side loop that sends a million separate INSERT statements to the server. Use INSERT SELECT, call a stored procedure, use LOAD TABLE, do something to push the process down to the server.
    "Any programmer seriously interested in performance runs transactions using a stored-procedure
    interface, rather than SQL commands over ODBC/JDBC."
    - Michael Stonebraker

  • Execution times can vary wildly with the state of the server, especially if (like in this case) absolutely no care has been taken with experimental design fact, in this case, there was no experiment being conducted.

1 comment:

Anonymous said...

Ha. I was mostly just kidding, but thanks for the detailed explanation. Your bullet points cover the (very wide) range of possibilities quite nicely. My own guess was that there might have been a fair amount of overhead involved with inserting the first row into the table (lazy initialization or some such).

And man, you're really rubbing salt in the wound with the client-side loop comment. I've seen a bunch of these lately, and they can be super-ugly. Your comments are right on the mark.