Question: I want to check a text file against a table... can I do that without messing around with CREATE SERVER statements and proxy tables and ODBC drivers?
Answer: Well, sure, you could write a loop in Java and use JDBC instead... oh, you don't want to do that either?
I guess unearthing an ORM solution is out of the question, too... you just want to code a SELECT and be done with it, right?
Acceptable Answer: Use OpenString() in the FROM clause.
But first, here's a table and a text file.
The table is used to generate the SQL Anywhere Technical Documents web page, and the text file is sent to Google for the corresponding Custom Search Engine:
CREATE TABLE DBA.article ( -- 526 rows, 284k total = 248k table + 28k ext + 8k index, 553 bytes per row artificial_id /* PK */ BIGINT NOT NULL DEFAULT autoincrement, article_number DECIMAL ( 30, 6 ) NOT NULL DEFAULT 0, eyecatcher LONG VARCHAR NOT NULL DEFAULT '', display_date LONG VARCHAR NOT NULL DEFAULT '', order_by_date DATE NOT NULL DEFAULT '1901-01-01', versions LONG VARCHAR NOT NULL DEFAULT '', url LONG VARCHAR NOT NULL DEFAULT '', title LONG VARCHAR NOT NULL DEFAULT '', description LONG VARCHAR NOT NULL DEFAULT '', added_on_date DATE NOT NULL DEFAULT '1901-01-01', authors LONG VARCHAR INLINE 256 PREFIX 8 NOT NULL DEFAULT '', updated_at TIMESTAMP NOT NULL DEFAULT timestamp, CONSTRAINT ASA81 PRIMARY KEY ( -- 8k artificial_id ) ); -- Text file... http://www.sybase.com/detail?id=1098527 http://www.sybase.com/detail?id=1067224 http://www.sybase.com/detail?id=1098128 http://www.sybase.com/files/White_Papers/wp-ias-MobiLink12Performance.pdf http://www.sybase.com/detail?id=1096289 http://www.sybase.com/detail?id=1095051 http://www.sybase.com/detail?id=1096331 http://www.sybase.com/detail?id=1095775 http://www.sybase.com/detail?id=1095123 http://www.sybase.com/detail?id=1094921 http://www.sybase.com/detail?id=1094907 http://download.sybase.com/presentation/TW2011/DATSQL16.pdf http://download.sybase.com/presentation/TW2011/DATSQL12_SQLAnywhereOData.pdf http://download.sybase.com/presentation/TW2011/DATSQL11.pdf ...The question now becomes "How do I join article.url with the data in the text file?"
Voila! A single SELECT!
That's all there is to it...SELECT article.artificial_id, article.url, text_file.url FROM article INNER JOIN OpenString ( FILE 'temp_include_sites_in_bulk.txt' ) WITH ( url LONG VARCHAR ) AS text_file ON text_file.url = article.url ORDER BY article.artificial_id; 1 http://www.sybase.com/detail?id=1095775 http://www.sybase.com/detail?id=1095775 2 http://www.sybase.com/detail?id=1095123 http://www.sybase.com/detail?id=1095123 3 http://www.sybase.com/detail?id=1094491 http://www.sybase.com/detail?id=1094491 ... 550 http://download.sybase.com/presentat... http://download.sybase.com/presentat... 551 http://download.sybase.com/presentat... http://download.sybase.com/presentat... 552 http://download.sybase.com/presentat... http://download.sybase.com/presentat...The OpenString() clause on lines 4 to 6 works like this: the FILE clause points to the text file, the WITH clause specifies the layout of the text file (one single string column), and the AS clause gives the OpenString() a table name to be used in the rest of the SELECT.
After that, it's just an ordinary SELECT: the ON clause on line 7 refers to the url column in the two tables (one real table, article, and one not-so-real table, text_file).
The SELECT list, the ORDER BY clause, all business as usual, and the result set (lines 10 to 16) looks like it should.
The REAL question was this: How do I make sure that all the urls in the table also exist in the text file, and vice versa?
Stuff happens when you have the same data stored in two different places: Maybe the table's more up to date than the text file, or maybe the the most recent changes got rolled back, or maybe it got restored from a backup.
Or maybe we're looking at the wrong copy of the text file... a girl can't be too careful these days.
The solution: Change the INNER JOIN to a FULL OUTER JOIN.
Not a LEFT OUTER JOIN, that only tells you about data that's in the table but missing from the text file... a FULL OUTER JOIN tells you about the vice versa too:
SELECT article.artificial_id, article.url, text_file.url FROM article FULL OUTER JOIN OpenString ( FILE 'temp_include_sites_in_bulk.txt' ) WITH ( url LONG VARCHAR ) AS text_file ON text_file.url = article.url WHERE text_file.url IS NULL OR article.url IS NULL ORDER BY article.artificial_id; (NULL) (NULL) http://www.sybase.com/detail?id=109166 327 not available (NULL) 337 not available (NULL) 338 not available (NULL) 339 not available (NULL) 344 not available (NULL) 349 not available (NULL) 362 too large (NULL) 366 too large (NULL) 368 too large (NULL) 379 empty (NULL) 502 not available (NULL) 505 not available (NULL) 506 not available (NULL) 507 not available (NULL) 510 not available (NULL) 511 not available (NULL) 530 not available (NULL) 531 not available (NULL) 532 not available (NULL) 533 not available (NULL)Line 12 shows an url that exists in the file but not the table: that's a real problem, and it makes writing this query a worthwhile endeavour.
The other rows (lines 13 to 32) show entries in the table that don't have urls at all, so they're not going to show up in the text file sent to Google: that's not a problem.
Ooops, almost forgot... the WHERE clause on lines 8 and 9 cuts the output down to just the Possibly Bad Stuff: if the query returns no rows there's nothing to look at, let alone worry about.
Where did "OpenString" come from?
Here's the question as it was asked on sqlanywhere.forum.com:Here's the answer from Glenn Paulley:
Lo these many years, based on nothing more than the prefix "Open", I have assumed that the OpenString function clause was based on some pre-existing feature in common use by some (unknown to me) portion of the computing community.
However, two possibilities now come to mind: first, that the "Open" in OpenString has nothing to do with any utopian philosophy but that it might as well have been called "UnloadString"; and second, that OpenString might be an invention ofWatcom, er,iAnywhere Solutions, er,Sybase, er, SAP.
Is it possible?
That being Anil Goel... and here's some more information from John Smirnios:
As far as I know that is true - OPENSTRING is Anil's invention, I haven't seen anything like it with any other product.
If the Wikipedia entry for "innovation" had a photo (it doesn't), it would be Anil's.
It was an excellent evolution of LOAD TABLE and, not surprisingly, they both share the same code. Many people needed the ability to massage the data coming from a LOAD TABLE so they would insert the data into a temporary table then massage the data in the temp table. That was wasteful and cumbersome. OpenString allowed it to happen in one step. The rewrite of the execution engine (many versions ago) made it much more feasible to do. LOAD TABLE is now very little other than an INSERT (with special properties and special logging) from OpenString.
If I was Anil's boss (I'm not)... or John Smirnios, I'm not his boss either... but if I was, this would not happen...
No comments:
Post a Comment