Wednesday, March 16, 2011

In Praise Of SELECT *

Foxhound has several examples of "The Table From Hell". Some of them are performance nightmares with millions of rows and gigabytes of data being inserted and deleted on a daily basis.

Others are relational nightmares with kludgy designs that seemed like a good idea at the time but have evolved to become perpetual PITAs.

The "alerts criteria" table is an example of the latter, with over 200 columns when there should be about 20. Here's an excerpt:

CREATE TABLE rroad_alerts_criteria (
criteria_set_type VARCHAR ( 100 ) NOT NULL CHECK ( @criteria_set_type IN (
'Factory Settings',
'Saved Defaults',
'Target Specific' ) ),
sampling_id UNSIGNED INTEGER NOT NULL, -- sampling_id = 0 for 'Factory Settings'
-- sampling_id = 0 for 'Saved Defaults'
-- sampling_id > 0 for 'Target Specific'
send_alert_emails VARCHAR ( 1 ) NOT NULL CHECK ( @column IN ( 'Y', 'N' ) ),
mail_recipient LONG VARCHAR NOT NULL,
use_html_in_emails VARCHAR ( 1 ) NOT NULL CHECK ( @column IN ( 'Y', 'N' ) ),
email_url_host_port LONG VARCHAR NOT NULL,
display_alerts VARCHAR ( 1 ) NOT NULL CHECK ( @column IN ( 'Y', 'N' ) ),
previous_sample_set_number_processed UNSIGNED BIGINT NOT NULL,
previous_gathered_sample_finished_at TIMESTAMP NOT NULL,
current_sample_finished_at TIMESTAMP NOT NULL,

-- Alert #1. Foxhound has been unable to gather samples for [a001p1] or longer.

alert1_enabled VARCHAR ( 1 ) NOT NULL CHECK ( @column IN ( 'Y', 'N' ) ),
alert1_is_in_effect VARCHAR ( 1 ) NOT NULL CHECK ( @column IN ( 'Y', 'N' ) ),
alert1_in_effect_at TIMESTAMP NOT NULL,
alert1_all_clear_at TIMESTAMP NOT NULL,
alert1_threshold_lost_sample_accumulated_msec BIGINT NOT NULL CHECK ( @column >= 10000 ), -- a001p1
alert1_actual_lost_sample_accumulated_msec BIGINT NOT NULL,

-- Alert #2. The heartbeat time has been 1s or longer for 10 or more recent samples.

alert2_enabled VARCHAR ( 1 ) NOT NULL CHECK ( @column IN ( 'Y', 'N' ) ),
alert2_is_in_effect VARCHAR ( 1 ) NOT NULL CHECK ( @column IN ( 'Y', 'N' ) ),
alert2_in_effect_at TIMESTAMP NOT NULL,
alert2_all_clear_at TIMESTAMP NOT NULL,
alert2_threshold_canarian_msec BIGINT NOT NULL, -- a002p1
alert2_threshold_canarian_msec_sample_count BIGINT NOT NULL CHECK ( @column >= 1 ), -- a002p2
alert2_actual_canarian_msec_threshold_reached_sample_counter BIGINT NOT NULL,


-- Alert #29. There have been [1] or more cache panics per second for [10] or more recent samples.

alert29_enabled VARCHAR ( 1 ) NOT NULL CHECK ( @column IN ( 'Y', 'N' ) ),
alert29_is_in_effect VARCHAR ( 1 ) NOT NULL CHECK ( @column IN ( 'Y', 'N' ) ),
alert29_in_effect_at TIMESTAMP NOT NULL,
alert29_all_clear_at TIMESTAMP NOT NULL,
alert29_threshold_cache_panics BIGINT NOT NULL CHECK ( @column >= 1 ), -- a029p1
alert29_threshold_cache_panics_sample_count BIGINT NOT NULL CHECK ( @column >= 1 ), -- a029p2
alert29_actual_cache_panics_threshold_reached_sample_counter BIGINT NOT NULL,

PRIMARY KEY ( criteria_set_type, sampling_id ) );

With a table like that, any day you can code SELECT * instead of a SELECT-endless-list is a good day, and that's what I was hoping for when faced with this maintenance chore:
  • Add a button "Force Default Settings On All Targets" to do the following:

    • Retrieve the "default settings" row with primary key [criteria_set_type; sampling_id] = ['Saved Defaults'; 0].

    • Use that row to update all the rows with primary key [criteria_set_type; sampling_id] = ['Target Specific'; 1, 2, 3, ...].

One brute-force solution might be to SELECT FROM the 'Saved Defaults' row into two hundred local variables, then loop through an UPDATE with two hundred SETs.

A slightly-less-brutish solution might be to move the FROM clause into the UPDATE; this avoids the local variables, but still has the two hundred SETs.

But wait, there's a better way...

Go Brute-Free With SELECT *

INTO LOCAL TEMPORARY TABLE temp_alerts_criteria
FROM rroad_alerts_criteria
WHERE rroad_alerts_criteria.criteria_set_type = 'Saved Defaults'
AND rroad_alerts_criteria.sampling_id = 0;

FOR f_alerts_criteria AS c_alerts_criteria INSENSITIVE CURSOR FOR
SELECT rroad_alerts_criteria.sampling_id AS @target_sampling_id
FROM rroad_alerts_criteria
WHERE rroad_alerts_criteria.criteria_set_type = 'Target Specific'
ORDER BY rroad_alerts_criteria.sampling_id

-- Set the temporary table primary key to match the 'Target Specific' row.

UPDATE temp_alerts_criteria
SET temp_alerts_criteria.criteria_set_type = 'Target Specific',
temp_alerts_criteria.sampling_id = @target_sampling_id;

-- Copy all the temporary table columns into the 'Target Specific' row.

INSERT rroad_alerts_criteria
FROM temp_alerts_criteria;


Here's how it works:
  • The SELECT on lines 1 through 5 creates a single-row temporary table holding the 'Saved Defaults' row. This table has exactly the same columns as the permanent rroad_alerts criteria table, and except for the primary key values, the single row holds exactly the same data as each and every 'Target Specific' row is going to hold in the end.

  • The FOR statement on lines 7 through 13 retrieves the sampling_id column for each 'Target Specific' row.

  • The UPDATE on lines 17 through 19 changes the temporary table primary key to match the current 'Target Specific' row.

  • The INSERT ON EXISTING UPDATE statement on lines 23 through 26 replaces the 'Target Specific' with the data from the 'Saved Defaults' row.
The two uses of SELECT * don't just save initial coding effort, they make this block of code immune to most schema changes involving the rroad_alerts_criteria table, like adding or removing data columns.

Of course, none of this would be possible without SQL Anywhere's wonderful INTO LOCAL TEMPORARY TABLE and ON EXISTING UPDATE clauses.

No comments: