Friday, October 16, 2009

Product Futures: Grow In-Memory Cache Faster

Maybe I've been living a sheltered life, but until recently I've never seen SQL Anywhere have a Cache Panic. Not once, not in production, not in testing, not even while testing Foxhound which is supposed to display the number of Cache Panics on the monitor page.

What is a Cache Panic? It happens when SQL Anywhere needs to allocate a page in the cache but can't find one. Sounds bad, but what with dynamic cache sizing and the temporary file, I'd never seen one.

At one point, I considered removing Cache Panics from the Foxhound monitor page to save valuable screen real estate (if you've seen the Foxhound monitor page, you know what "crowded" means when it comes to displays :) However, a wise voice counseled me to leave it alone, saying that it's an important statistic.

Not important enough, however, for me to include a Cache Panics threshold in the Foxhound email alerts feature... that was a bad decision, about to be fixed... because I've just seen my first Cache Panics, and believe me, they are bad:

What the Foxhound figures above don't show is how badly performance can be affected when Cache Panics are happening. When you start a SQL Anywhere database using the in-memory feature (dbsrv11 -im nw in this case), there is no temporary file, and it doesn't write anything to the database file, so SQL Anywhere has no place other than RAM to put a page of data. If the RAM cache is full, SQL Anywhere has to allocate more RAM.

It gets worse: If you don't specify the -c initial RAM cache size value when you start the server, and if you perform some intense operation like LOAD TABLE a million rows, SQL Anywhere will have to allocate RAM at a furious pace... all the while Cache Panicking when it can't find a page to allocate.

Who cares? If performance is important, you care. In this case adding -c 1200M to the dbsrv11 command line improved LOAD TABLE performance by 165%, from 2,878 rows per second to 7,640 rows per second.

Grow In-Memory Cache Faster

Here's what the server console log looked like when it had to grow the cache a zillion times, up to 10 times per second:

I. 10/14 15:29:15. Now accepting requests
I. 10/14 15:29:18. Cache size adjusted to 2340K
I. 10/14 15:29:23. Cache size adjusted to 3180K
I. 10/14 15:29:28. Cache size adjusted to 3392K
I. 10/14 15:29:33. Cache size adjusted to 3488K
I. 10/14 15:30:14. DIAG 2009-10-14 15:30:14.421 load table started
I. 10/14 15:30:14. Cache size adjusted to 3748K
I. 10/14 15:30:14. Cache size adjusted to 4012K
I. 10/14 15:30:14. Cache size adjusted to 4276K
I. 10/14 15:41:23. Cache size adjusted to 1151416K
I. 10/14 15:41:23. Cache size adjusted to 1151676K
I. 10/14 15:41:23. Cache size adjusted to 1151936K
I. 10/14 15:41:23. Cache size adjusted to 1151724K
I. 10/14 15:41:23. Cache size adjusted to 1152144K
I. 10/14 15:41:23. Cache size adjusted to 1152404K
I. 10/14 15:41:24. Cache size adjusted to 1152668K
I. 10/14 15:41:24. DIAG 2009-10-14 15:41:24.343 load table finished in 669.0 seconds
I. 10/14 15:42:23. Cache size adjusted to 1152388K
I. 10/14 15:42:27. Cache size adjusted to 1152800K

Most of the time, growing the cache step-by-step works just fine... but not when the database is growing rapidly and it's all stored in memory.

The Suggestion...

So here's the Product Futures suggestion: When a SQL Anywhere database is running in-memory, and a lot of Cache Panics are happening, and it's having to grow the cache ten times a second, then the dynamic cache sizing algorithm should pick some increment other than 64 pages when allocating more RAM.

That way, we won't have to fiddle around trying to pick exactly the right -c initial cache size.

...Or, Maybe Not

It's a little known fact (well, I didn't know it) that picking a reasonable value for -c is a good idea regardless of the cache growth performance issue described here.

For example, if you expect the cache to grow to some size between 1G and 2G, you should specify a value like -c 1G instead of letting it default to the tiny value (on Windows) of 2M. The server uses hash chains to find data in the cache, and the number of hash chains is determined by the initial cache size. A cache that starts at 2M will have fewer hash chains and those chains will grow very long (and slow) as the cache grows to 2G; performance will suffer, even if the Cache Panics problem is solved. And that performance problem will persist even after the cache finishes growing.

On the other hand, a cache that starts at -c 1G and grows to 2G will have more, shorter chains, and better performance. In other words, you don't have to pick an exact value for -c, but you should avoid letting it default to a stup..., er, wildly inappropriate value (like 2M) if you know it's going to grow to at least 1G.

And, of course, the Cache Panics problem won't be nearly so noticeable if you pick a half-decent value for -c. Plus, it goes away after the cache finishes growing.

So, maybe it's not worth fixing, just worth knowing about :)

1 comment:

Breck Carter said...

Good thing I didn't (!) use the word "stupid" when talking about 2M default initial cache size on Windows... because in MOST CASES the default is much larger. SQL Anywhere uses a calculation based on the amount of memory available and the size of the database file, all described in the Help. In most cases the default initial value is reasonable.
So here's the bottom line: This article is all about in-memory databases that start out empty and are populated after the engine starts. That's when the 2M default applies.