Wednesday, July 15, 2009

Danger! NULLs!

It's legal to avoid taxes, but not to evade them. With NULLs, I don't care, I will avoid them, evade them, whatever it takes.

Like Al Gore does with global warming, whenever something bad happens I blame the NULLs.

And here's why: The code I posted in SELECT FROM Excel Spreadsheets was wrong! Because of NULLs!

Well, because of a stupid mistake *I* made involving NULLs. And it isn't really the NULLs' fault. Not really.

Here's the query that was wrong; can you see why?

SELECT proxy_browsers.browser          AS brand_name,
SUM ( proxy_browsers.hits ) AS hit_count,
hit_count / total_hits * 100.0 AS percent
FROM proxy_browsers
CROSS JOIN ( SELECT SUM ( hits ) AS total_hits
FROM proxy_browsers )
AS summary
WHERE browser IS NOT NULL
GROUP BY proxy_browsers.browser,
summary.total_hits
ORDER BY hit_count DESC;

And this is the most embarrassing part: I knew there was someting wrong, because that query gave a different answer from this one which used an OLAP WINDOW instead of a CROSS JOIN:
SELECT DISTINCT 
FIRST_VALUE ( browser ) OVER ( brand_window ) AS brand_name,
SUM ( hits ) OVER ( brand_window ) AS hit_count,
CAST ( hit_count
/ SUM ( hits ) OVER ( everything )
* 100.0 AS DECIMAL ( 11, 4 ) ) AS percent
FROM proxy_browsers
WHERE browser IS NOT NULL
WINDOW brand_window AS ( PARTITION BY browser ),
everything AS ()
ORDER BY hit_count DESC;

But I thought it was the OLAP query giving the wrong answer, not the CROSS JOIN! I didn't post the OLAP query because I thought there was a bug in SQL Anywhere 11!

Arrgh!

What's the answer? Last chance, the contest is closed...

Scroll down to see the answer...

down...

down...

down...

down...

down...

down...

down...

down...

down...

down...

down...

down...

down...

There are two SELECTs in the CROSS JOIN query but only one of them has the necessary WHERE browser IS NOT NULL. Here's the corrected CROSS JOIN query, which now returns the same result set as the (always correct) OLAP version:
SELECT proxy_browsers.browser          AS brand_name,
SUM ( proxy_browsers.hits ) AS hit_count,
hit_count / total_hits * 100.0 AS percent
FROM proxy_browsers
CROSS JOIN ( SELECT SUM ( hits ) AS total_hits
FROM proxy_browsers
WHERE browser IS NOT NULL )
AS summary
WHERE browser IS NOT NULL
GROUP BY proxy_browsers.browser,
summary.total_hits
ORDER BY hit_count DESC;

Jonathan's the clear winner in the Find The Mistake Contest; not only did he provide right answer, but he proposed an alternative solution:

Clearly you're anticipating the possibility that the "browser" column may be null, based on the "browser is not null" clause in the outer query. But your inner query that generates the total hits doesn't include this clause, which means that you're measuring percentages against the total number of hits, not the total of hits from non-null browsers.

This may or may not be considered a bug, depending on what you're trying to present. But if you have hits from null browsers, then your percentages will not sum to 100, and at a minimum this will look odd.

Fix is obviously to add "where browser is not null" to the subquery. A perhaps better fix would be to handle the null values, something like this:
SELECT coalesce(proxy_browsers.browser, 'Unknown') AS brand_name,
SUM ( proxy_browsers.hits ) AS hit_count,
hit_count / total_hits * 100.0 AS percent
FROM proxy_browsers
CROSS JOIN ( SELECT SUM ( hits ) AS total_hits
FROM proxy_browsers )
AS summary
GROUP BY brand_name,
summary.total_hits
ORDER BY hit_count DESC;

Monday, July 13, 2009

Techwave Agenda

This year's Techwave Symposium in Washington DC may be a shadow of former Techwaves, but it's not all bad.

First, there's the price, $200 for a day and a half of technical content, and then there's the content itself: The agenda for the SQL Anywhere and MobiLink sessions has just been published and it looks solid.

Oh, and registration is now open.

Friday, July 10, 2009

Search this blog, plus Glenn Paulley's

      ...search this blog, plus Glenn Paulley's

I've added Glenn Paulley's blog to the Google Custom Search Engine gadget at the top right of this page, and here's why:
  • It's the Number One most popular SQL Anywhere blog, plus

  • Glenn has recently increased the number of code-related posts.
November 9, 1995
Yes, actual code! Which makes searching both blogs at the same time a good idea when you're looking for SQL Anywhere technical topics.

Wednesday, July 8, 2009

Is It Safe?

That's a simple question made memorable by the 1976 movie Marathon Man:

Christian Szell: Is it safe?... Is it safe?
Babe Levy: You're talking to me?
Christian Szell: Is it safe?
Babe Levy: Is what safe?
Christian Szell: Is it safe?
Babe Levy: I don't know what you mean. I can't tell you something's safe or not, unless I know specifically what you're talking about.
Christian Szell: Is it safe?
Babe Levy: Tell me what the "it" refers to.
Christian Szell: Is it safe?
Babe Levy: Yes, it's safe, it's very safe, it's so safe you wouldn't believe it.
Christian Szell: Is it safe?
Babe Levy: No. It's not safe, it's... very dangerous, be careful.
In my case, the question was this:
Is it safe to call GET_IDENTITY ( 'table-name', 0 )?
The SQL Anywhere function call GET_IDENTITY ( 't', 1 ) pre-allocates the next value that would normally be assigned by an INSERT statement to the DEFAULT AUTOINCREMENT column in t, and returns that value to the caller. This is very useful if you need to know what a new AUTOINCREMENT primary key is going to be, before you INSERT the row.

You can pass GET_IDENTITY other numbers, like 2, 3, ..., to have it pre-allocate multiple values and return you the first value.

The Help doesn't talk about passing it zero, but that's what I wanted it to do: Just tell me what the next value is going to be, but don't pre-allocate it... let the next INSERT use it.

Actually, what I really wanted was the last value assigned, which I could get by subtracting:

GET_IDENTITY ( 'table-name', 0 ) - 1

In other words, give me the current AUTOINCREMENT value, the one that was last assigned to some particular table. This is different from @@IDENTITY in two ways:
  • @@IDENTITY returns the last AUTOINCREMENT value assigned to any table, whereas GET_IDENTITY() lets you specify which table.

  • @@IDENTITY only returns values assigned by the current connection, whereas GET_IDENTITY() doesn't care what connection made the assignment. In other words, @@IDENTITY remembers the last value assigned by the current connection, whereas GET_IDENTITY() will return values assigned by other connections.
That last point is one you should consider carefully. If it's important to you, you may be better off calling GET_IDENTITY ( 't', 1 ) before doing the INSERT, because the value that is pre-allocated by GET_IDENTITY ( 't', 1 ) is protected from work done by other connections, and you are safe to specify it in the INSERT.

However, you may be looking for a faster alternative to SELECT MAX ( t.c ), which may be slow because:
  • there's no index on the DEFAULT AUTOINCREMENT column, or

  • there's an index but it's not useful because the DEFAULT AUTOINCREMENT column is not the first column.
It turns out that yes, it is safe to call GET_IDENTITY with zero in the second argument. Here's some code that shows how GET_IDENTITY ( 'table-name', 0 ) - 1 returns the same value as SELECT MAX ( t.c ):

CREATE TABLE t ( c INTEGER DEFAULT AUTOINCREMENT );

SELECT GET_IDENTITY ( 't', 0 ) - 1,
MAX ( t.c )
FROM t;

INSERT t VALUES ( DEFAULT );

SELECT GET_IDENTITY ( 't', 0 ) - 1,
MAX ( t.c )
FROM t;

INSERT t VALUES ( DEFAULT );

SELECT GET_IDENTITY ( 't', 0 ) - 1,
MAX ( t.c )
FROM t;

Here are the three result sets; note that the first SELECT returns NULLs because there are no rows in t yet:

GET_IDENTITY('t',0)-1,MAX(t.c)
(NULL),(NULL)

GET_IDENTITY('t',0)-1,MAX(t.c)
1,1

GET_IDENTITY('t',0)-1,MAX(t.c)
2,2

Monday, July 6, 2009

Foxhound Sends Email Alerts

Fans of Foxhound will appreciate that the next beta of the database monitor will be getting "Email Alerts".

That's where you tell Foxhound what worries you the most about your SQL Anywhere server, and Foxhound tells you when your worst fears come true.

By email.

Like when your database goes offline.

Or the CPU goes above 90% for an hour.

Or when fifty users can't get any work done because their database connections are all blocked.

Here's an example of an alert email:



It's been surprisingly difficult to implement email alerts in a useful way; there's more to it than detecting anomalies and sending emails:

  • Letting the administrator specify criteria for each different alert condition.

  • Specifying how soon to actually issue an alert after the condition is first detected.

  • Deciding how soon to issue an "all clear" after the condition is no longer detected.

  • Deciding what to do when the administrator changes the criteria while an alert is in effect.

  • Displaying the alerts and all clears on the Database Monitor page, when email isn't enough.

  • Displaying an error message when Foxhound can't send an email.

  • Doing all this without making the administrator fill in endless forms.
Here's an example of an alert, followed by an all clear, displayed on the Foxhound Database Monitor page:

Wednesday, June 17, 2009

Techwave Symposium, Men In Black Edition

The first 2009 Techwave Symposium is going to be held on August 26 and 27, 2009 at the Renaissance Mayflower Hotel in Washington DC... a real bargain at $200, plus a discounted hotel rate of $149 per night.

Plus this: "vertical content specifically designed for mid-level and senior level IT staff from civilian agencies, intelligence agencies and the Department of Defense"... will the "Special Event" will be an alien abduction foiled by black helicopters?

Wednesday, May 6, 2009

Is SQL Anywhere Cool?

It's not that hard to get into the ServerFault beta...

[ "How hard can it be, they let you in?" says DoppelGanger ]
...all you need is 100 reputation points in StackOverflow and then follow the instructions on this page.
[ "And it's really easy to get 100 points on StackOverflow, isn't it?" ]
OK, here's why I think ServerFault's gonna be a huge success, just like StackOverflow: it's already got useful content. In fact, it's got the answer to the first question I wanted to ask.
[ "It's better than that, isn't it? It already had the first answer you were going to post, didn't it?" ]
Yes, that's right... I had a question, and I didn't look it up on ServerFault, I thought I would find the answer elsewhere first and post both question and answer, as my first working experience with ServerFault.

So I went and wasted a bunch of time, finally found the poorly indexed Microsoft KB page (it didn't include the message text I was getting), applied the fix, then went over to ServerFault to post the answer... and this is what I saw as soon as I started the "Ask a question" dialog:



My question was "How do I fix "MMC could not create the snap-in" in Windows?", and when ServerFault popped the list of "Related Questions" the very first one was an exact match: "MMC could not create the snap-in".

And yes, the answer was exactly the same as the one I was going to post. I'm guessing it didn't show up earlier in my Google searches because StackOverflow's still in beta.


When I wrote StackOverflow and ServerFault Are The Future I wondered if a site devoted to "people who manage or maintain computers in a professional capacity" would be suitable for asking questions about database servers. The answer is yes, ServerFault already has 34 questions tagged "sqlserver" and 15 tagged "mysql" in the first week of beta.

That means if your SQL Anywhere question is about writing application code that talks to the server, or writing SQL code that runs on the server, then StackOverflow is the place to be, but if you question is about, say, setting up a High Availability configuration, or backing up your 100GB database, then ServerFault is the place.

The question now is, "Will anyone go there, to ask and answer questions about SQL Anywhere?"

More specifically, "Will iAnywhere Solutions staff go there?" ...the vast majority of questions on the NNTP newsgroups are answered by iAnywhere tech support, professional services and engineering folks. Without them, asking a SQL Anywhere question on StackOverflow and ServerFault will be like asking a COBOL question... an experience in loneliness.

Maybe THAT'S the question: "Is SQL Anywhere cool? Or is it COBOL?"