Friday, December 9, 2011

Product Suggestion: Minimum Row Size

Question: Have you ever set PCTFREE for a table?

Likely Answer Number 1: "Never."

Likely Answer Number 2: "Don't think so, can't remember."

Likely Answer Number 3: "I set it to 100 once, can't remember why."



As SQL Anywhere performance tuning knobs go, the PCTFREE setting is one of most obscure and least used. It's been around forever, but nobody ever uses it, probably because nobody can figure out how to use it properly.

What the %#!@ is PCTFREE?


Here's a code example that sets PCTFREE to 50% for table t:
CREATE TABLE t
   pkey    INTEGER         NOT NULL DEFAULT AUTOINCREMENT PRIMARY KEY,
   data    VARCHAR ( 200 ) NOT NULL DEFAULT '',
   PCTFREE 50 );
Here's what the Help says:

PCTFREE clause - Specifies the percentage of free space you want to reserve for each table page. The free space is used if rows increase in size when the data is updated. If there is no free space in a table page, every increase in the size of a row on that page requires the row to be split across multiple table pages, causing row fragmentation and possible performance degradation.

The value percent-free-space is an integer between 0 and 100. The former value specifies that no free space is to be left on each page—each page is to be fully packed. A high value causes each row to be inserted into a page by itself. If PCTFREE is not set, or is later dropped, the default PCTFREE value is applied according to the database page size (200 bytes for a 4 KB (and up) page size). The value for PCTFREE is stored in the ISYSTAB system table.

Just what should PCTFREE be set to? Is 50% good? Is 40% better, or is 60%?

It doesn't help that PCTFREE is a percentage but the default is expressed in bytes... I think you can convert, but who cares? Is 200 bytes good? Or 100? Or 400? ...the answer is the same, "I don't know."

The problem is this: When designing a new cargo ship / warehouse / database, folks generally know how many tonnes / cubic feet / bytes it should store; i.e., the capacity, not the free space.

Taken down to the database page level, folks know (or can figure out) how big the average row is when it's inserted, and how big it will be after it's been updated. What they don't know is how to convert that to an efficient amount of free space to leave after inserting.

Why should we care?


It is a fairly common practice to insert empty rows and then update them later when actual data become available... easy is good.

Once inserted, a row can never be moved to a different page, so when a page becomes full of empty rows the first UPDATE that expands a column will split that row... splits are bad, hence PCTFREE.

Here's a thought...


...instead of (or in addition to) PCTFREE why not give the users a knob called MINROWSIZE which they can calculate as expected average row size after the row has been updated, and is to be used by SQL Anywhere as the minimum amount of table data space to be reserved for each row when it is inserted?

In this case, MINROWSIZE 25 means reserve 4 bytes for the integer and about 20 bytes for the string even though the INSERT is providing an empty row:
CREATE TABLE t
   pkey    INTEGER         NOT NULL DEFAULT AUTOINCREMENT PRIMARY KEY,
   data    VARCHAR ( 200 ) NOT NULL DEFAULT '',
   MINROWSIZE 25 );

INSERT t VALUES ( DEFAULT, DEFAULT );
Currently, when a new row is inserted the PCTFREE amount is compared to the current free space in the page when deciding if a new row will fit. The size of each existing row and whether or not it is expected to grow in size is not taken into account, just the total space used and the PCTFREE amount, so depending of the current state of affairs inside the page the ideal PCTFREE could be larger or smaller than the actual value.

With MINROWSIZE, SQL Anywhere would take into account existing rows that have still not used up their MINROWSIZE allocation, as well as the MINROWSIZE allocation for the new row, when making the decision.

With some effort, the user could provide a MINROWSIZE value that is meaningful and accurate... as opposed to picking the number for Lotto PCTFREE.


1 comment:

Justin Willey said...

My answer - 3, but that was on Gupta SQLBase! MINROWSIZE would be much easier to get ones head round. I take it this will appearing on the forum near you soon as a product suggestion!