Friday, November 18, 2011

Bottleneck: Latency Versus Throughput

Get three computer experts in a room and ask them for a definition of "latency" and they'll give you six different answers, same thing for "throughput"... the only thing they'll agree on is this: Are latency and throughput important?

Answer: Yes!

OK, let's try that again: What kind of latency is good latency?

Answer: Low latency!

What kind of throughput is good throughput?

Answer: High throughput!

Next question: How do I measure SQL Anywhere latency and throughput?

Answer: You can measure latency and throughput at two levels: at the network level, and closer in at the database server level.

So, for SQL Anywhere, the six different answers can be boiled down to two sets of definitions; first...

Network Latency and Throughput

Here's what Glenn Paulley has to say on the subject in the "Measuring network performance" section of Optimizing Adaptive Server Anywhere Performance Over a WAN:
Latency and throughput can be used together to describe the performance of a network. Latency refers to the time delay between when one machine sends a packet of data and the second machine receives the data (for example, if the second machine receives the data 10 ms later than the first machine sent it, the latency is 10 ms). Throughput refers to the amount of data that can be transferred in a given time (for example, if a one machine sends 1000 KB of data, and it takes 5 seconds for all of it to be received by the second machine, the throughput is 200 KB/s). On a LAN, latency is typically less than 1 ms, and throughput is typically more than 1 MB/s. On a WAN, the latency is typically significantly higher (perhaps 5 ms to 500 ms), and the throughput is typically significantly lower (perhaps 4 KB/s to 200 KB/s).

You can measure network latency between two machines by the round trip time reported by the system’s ping utility. The round trip time is the latency to transfer data from one machine to a second machine plus the latency to transfer data from the second machine back to the first machine. You can measure network throughput by copying a file of a known size of at least 200 KB from one machine to a second machine and timing the copy. This copy could be performed as a regular file copy, using FTP, or by downloading a file using an Internet browser.

To get reasonable Adaptive Server Anywhere performance on a network that has high latency, but reasonable throughput, the number of requests made by the client must be minimized. If a network has reasonable latency, but low throughput, the amount of data transferred between the client and server must be minimized.

Database Server Latency and Throughput

Here's an excerpt from the Foxhound FAQ which talks about latency and throughput from the server's point of view rather than the client or network side:
Latency, also known as response time or access time, is a measure of how long it takes the the database to respond to a single request.

The "Heartbeat / Sample Times" columns are the primary measurements of latency displayed by Foxhound. The Heartbeat is the round-trip elapsed time for a single SELECT dummy_col FROM DUMMY statement issued by Foxhound to the target database; the time is rounded upwards to the nearest tenth of a second so the minimum displayed value is 0.1s.

The Sample time is the round-trip time it takes Foxhound to request and receive three sets of performance statistics from the target database. The workload varies with the number of connections on the target database so the sample time is an indication of how long a non-trivial transaction takes rather than a benchmark measurement.

The Heartbeat time is usually smaller than the Sample time, but it is possible for the Hearbeat time to be larger; here is an example:
           Samples  Interval     Times
May 16 11:00:29 PM   50.1s    39.7s / .9s
The heartbeat query and the sample gathering process are performed separately, one after another, and their elapsed times are calculated separately: the Sample time does not include the Heartbeat time. It is possible that the target database did not respond to the heartbeat query for a long time, but then did respond to the subsequent sample queries on a timely basis.

Throughput, also known as bandwidth, is a measure of how many requests the database can respond to per unit of time.

The following Foxhound Monitor columns provide an indication of throughput:

The "Req" column shows the rate at which the server started processing a new request or resumed processing an existing request during the preceding interval. In this context, a request is defined as an atomic unit of work performed for a connection.
The "Commits" column shows the approximate rate at which COMMIT operations were performed in the previous interval. This number is approximate because a connection may issue a commit and disconnect between two Foxhound samples, and that commit won't be counted in this rate. Depending on how the database workload is structured, the commit count may or may not be the same as the transaction count.

The "Bytes In / Out" columns show the rates at which data was received by and sent back by the server from and to client connections in the previous interval.

Low latency is a generally a good thing, but so is high throughput, and the trick is to achieve a balance between the two.

No comments: