Q: How do I run SQL SELECT queries against data stored in Excel spreadsheets?
(This is the opposite direction from importing data into a spreadsheet from a database; e.g., via Excel - Data - From Other Sources - From Microsoft Query, etc.)A: You can use SQL Anywhere's proxy table feature to treat an Excel spreadsheet like a relational table, thus bringing the power of SQL SELECT to bear on spreadsheet data.
First, a simple example, then some tips and suggestions. Here is an Excel spreadsheet containing website hit counts by browser (MSIE 7, Firefox 3.03, etc):
Here's the code to answer the question, "Who's winning the browser wars, IE or Firefox?". Hit counts are summarized by browser brand name, getting rid of the version number differences:
CREATE SERVER EXCELHere is what the output looks like; Firefox is winning for this particular website (this blog, actually):
CLASS 'ODBC' USING 'Excel 2007';
CREATE EXISTING TABLE proxy_browsers
SELECT proxy_browsers.browser AS brand_name,
SUM ( proxy_browsers.hits ) AS hit_count,
hit_count / total_hits * 100.0 AS percent
CROSS JOIN ( SELECT SUM ( hits ) AS total_hits
FROM proxy_browsers )
WHERE browser IS NOT NULL
GROUP BY proxy_browsers.browser,
ORDER BY hit_count DESC;
You can read about setting up proxy tables for Excel here:
CREATE EXISTING TABLE DBA.proxy_browsers -- proxyHere's what the data looks like:
-- hits DOUBLE NULL,
-- pct DOUBLE NULL,
-- browser NVARCHAR ( 255 ) NULL,
-- version NVARCHAR ( 255 ) NULL
ORDER BY hits DESC;
Here are some tips and suggestions:
- The official name of the "proxy table" feature is "Remote Data Access"... it helps to know that when you're searching the docs.
- The syntax for the CREATE EXISTING TABLE ... AT clause goes like this:
'<CREATE SERVER name>;<filespec>;;<sheet name>$'
- You may or may not need to code the "magic dollar sign" after the sheet name in the AT clause. If you do need it, and you don't code it, you will see this bogus message bubbling up from the Excel driver:
The remote table 'EXCEL.C:/temp/browsers.xlsx..Sheet1' could not be foundAside: Don't you try to code the AT clause using periods '.' like in that error message, you need to use semicolons ';' to avoid ambiguity with the filespec.
SQLCODE=-666, ODBC 3 State="HY000"
- You only have to set up one ODBC DSN for everything, not one DSN for each sheet or workbook. You do have to use "Select Workbook" to point the DSN at a specific file, but after that the file and sheet names are specified in your SQL code. The file named in the "Select Workbook" doesn't even have to remain in existence.
- Here's how the ODBC Administrator was used to create the "Excel 2007" DSN for this article:
- The ODBC DSN must be set up on the same computer that's running the SQL Anywhere engine. If it's a network server on some other computer, that's where the DSN is located, and (unless you're using a mapped drive or UNC file specifications) that's where the Excel file is located as well.
Plus... if the server is started as a service, the DSN must be a "System DSN", not a "User DSN".
- In related news, the ISQL INPUT and OUTPUT statements no longer support the Excel format in SQL Anywhere 11. Plus, ISQL INPUT cannot read Excel 2007 *.xlsx files in SQL Anywhere 10: "The file may be in an unsupported format, or it could be corrupt."
These changes are no big deal, since proxy tables work so well, except for the fact that any behavior change is a pain in someone's patootie, and the INPUT and OUTPUT commands specify files that are relative to the client whereas (see above) proxy table DSNs are relative to the server.
And before anyone says "LOAD and UNLOAD now have CLIENT FILE clauses!", check out how well LOAD and UNLOAD work with proxy tables:
Feature 'unload from non-SQL Anywhere tables' not implementedWorkarounds are welcome, this blog invites comments!
SQLCODE=-134, ODBC 3 State="HY000"
- You might get away with having the workbook open in Excel at the same time you're running proxy table queries... or not.
- The names in row 1 of the spreadsheet are used to generate (and match) column names in the proxy table. If you see an error like "Too few parameters. Expected 8." it may be because you have included spaces in the row 1 column names.
If you really want spaces, you'll have to code an explicit column name list in the CREATE EXISTING TABLE, with "quoted column names" and data types... DOUBLE and NVARCHAR ( 255 ) seem to be good choices.
- The Microsoft ODBC drivers for Excel 2003 and 2007 seem to impose limits on string lengths: 64 bytes if you're not lucky, 255 bytes if you are. "NVARCHAR" is a good luck talisman here.
- Only the final cell data values are received via the proxy table, not formulas.
- Excel data is not intended for simultaneous access by multiple connections, so don't go trying serve up your spreadsheets in an OLTP system.
- Tip: A quick and dirty way to copy and paste data from the web or some other source into Excel without carrying over fancy formatting is to copy and paste into Notepad first, then copy and paste from Notepad to Excel; e.g., by Ctrl-A and Ctrl-C from Notepad and then Ctrl-V into Excel. This works well if the data pasted into Notepad turns out to be tab-delimited, which it often is when coming from a website.
- Tip: Here's how to turn a SQL Anywhere result set into a tab-delimited text file, suitable copy and paste from Notepad into Excel:
ORDER BY ccc
DELIMITED BY '\X09' HEXADECIMAL OFF ESCAPES OFF QUOTES OFF;
- Proxy tables work with more than Excel; you can join Oracle tables with SQL Server, for example. SQL Anywhere's own Migrate Database Wizard uses proxy tables to set up the schema and suck in the data when you're converting from some other DBMS.
- High volume proxy table queries sometimes run slowly. For example, complex cross-server queries don't get optimized the same way local queries do. Cross-server optimization may improve in the future but it's likely that local optimization will always be better. That's another reason not to serve up proxy tables in an OLTP environment.
- Having said that, folks who like proxy tables like them a LOT. It's been one of the most popular features in SQL Anywhere since 1998.