Wednesday, November 16, 2011

Bottleneck: Blocked Connections

Question: Do blocked connections affect performance?

Short Answer: For the user waiting for a response, you betcha!

Long Answer: It's not just those waiting users who are affected, everyone's performance can suffer.

Here's an example of a simple but time-consuming Query From Hell run simultaneously on three SQL Anywhere 12 connections:

SELECT COUNT(*)
  FROM SYSCOLUMN AS A 
       CROSS JOIN SYSCOLUMN AS B
       CROSS JOIN SYSDOMAIN AS C;

Execution time: 36.187 seconds

Execution time: 38.356 seconds

Execution time: 40.052 seconds
Here's what happened when there are 79 other connections running at the same time, each one blocked by a row lock:
SELECT COUNT(*)
  FROM SYSCOLUMN AS A 
       CROSS JOIN SYSCOLUMN AS B
       CROSS JOIN SYSDOMAIN AS C;

Execution time: 110.565 seconds

Execution time: 112.186 seconds

Execution time: 113.905 seconds
Those other 79 connections weren't doing anything at all, but still the Queries From Hell took over three times longer.

But wait, it gets worse...

...a simple SELECT FROM DUMMY on yet another connection took over 2 minutes to execute while all this was going on.

Plus, the 79 blocked connections were only part of the story: the test of blocked connections involved 200 connections in total. Only 79 got as far as being blocked, the remaining 121 never made it to the row lock, they just sat there waiting to run.

Here's what Foxhound showed, followed by an explanation of some of the numbers:

(click image to see full-size)

The sample at 11:00:11 AM shows a "Heartbeat" time of 2m 14.7s: that's how long it took for Foxhound to receive a response to a trivial SELECT FROM DUMMY while all of this was going on. This isn't just a Foxhound effect, separate tests using dbisql confirm that SELECT FROM DUMMY could take a long time.

The subsequent "Sample" time was only 5.1s even though that accounts for much more work than SELECT FROM DUMMY. In effect, Foxhound was completely blocked from proceeding, and it issued an "Alert #1" (database unresponsive) because of it.

The "Max Req" column shows that the AutoMultiProgrammingLevel feature had allowed the multiprogramming level to climb to 80... but no further. 79 of those requests show up in the "Waiting Conns" and "Conns Blocked" columns. The other 121 connections that should have been blocked show up instead in the "Waiting Conns" column... not idle, but waiting.

79 of the 80 requests shown in the "Active Req" column are not, in fact, active but are sitting there doing nothing... not active, not idle, but waiting.

...and that's where the problem arises: it looks like there's only one thread left to do any actual work. Short of STOP ENGINE it's hard to imagine a better example of "bottleneck".

The bottom line is this: You should pay attention to blocked connections, and not just because of the screaming users waiting for their responses.


2 comments:

Anonymous said...

Breck,

would an increase of the MaxMultiProgrammingLevel property (via sa_server_option) to, say, 100 or more help solve the issue?

Agreed, that would be some kind fo "helping the auto-tuning facility"...


Regards
Volker

Breck Carter said...

@Volker: In this case, forcing the multiprogramming level to a higher number would simply allow more of the waiting connections to become blocked. You would have to increase it above 200 to let all the blocked connections reach the "Active Req" state... but then you'd have 200 connections consuming resources without getting any work done. The point is this: blocked connections are very bad BECAUSE they are treated as "Active Requests".