Monday, December 27, 2010

In-Memory Downloading from SQL Server (5)

This is 5th and last part of an article written in March 2010 with the title "Downloading Data into a SQL Anywhere In-Memory Database from Microsoft SQL Server".

Part 1 is here.

Part 2 is here.

Part 3 is here.

Part 4 is here.



Technique 5: Download Via OPENROWSET


OPENROWSET is a special SQL Server function that provides a lot of the power of linked servers without requiring you to actually define a linked server. The syntax for OPENROWSET is rather funky, but it does make for concise code as shown in Figure 22.

Figure 22: Download Via OPENROWSET
Context: SQL Server
INSERT INTO OPENROWSET ( 'MSDASQL', 
'sa_system_dsn'; 'dba'; 'sql',
dba.sa_target )
SELECT *
FROM mss_source
GO

Line 1 in Figure 22 shows that an OPENROWSET function call may be used as the target of a SQL Server INSERT statement. OPENROWSET calls may also the targets of UPDATE and DELETE statements, and they can appear in the FROM clauses of SELECT statements.

Line 1 also identifies the OLE DB provider to be used to communicate with SQL Anywhere, in this case the same standard SQL Server ODBC provider MSDASQL that was used in the linked server download technique shown earlier. In fact, the OPENROWSET technique is a variation on linked servers as far as the underlying technology is concerned.

Line 2 names the ODBC DSN to be used, plus the user id and password to be passed to SQL Anywhere; again, the same information that's necessary for linked servers. Line 2 is where the funky syntax comes into play: semicolons are used to separate the three parameters on this line rather than commas.

Line 3 names the target table on the SQL Anywhere database. Because there is no linked server involved, the simpler [owner].[table] naming convention is used as opposed to the full [server].[database].[owner].[table] syntax.

Lines 4 through 6 select all the rows from the SQL Server source table for processing by the INSERT.

There is some extra setup required for OPENROWSET, however, as shown in Figure 23.

Figure 23: Setup For OPENROWSET
Context: SQL Server
USE master
GO

sp_configure 'show advanced options', 1
GO

RECONFIGURE
GO

sp_configure 'Ad Hoc Distributed Queries', 1
GO

RECONFIGURE
GO

SELECT *
FROM OPENROWSET ( 'MSDASQL',
'sa_system_dsn'; 'dba'; 'sql',
sys.dummy )
GO

dummy_col
-----------
0

Line 10 in Figure 23 is the central step in the setup: it makes OPENROWSET possible by turning on the Ad Hoc Distributed Queries feature.

Lines 16 through 20 are a simple test to see if the setup worked: a SELECT to display SQL Anywhere's single-row single-column "dummy" table. If OPENROWSET works, you will see the result on lines 22 through 24. If not, you'll see something like "Msg 15281 SQL Server blocked access to STATEMENT 'OpenRowset/OpenDatasource'".

Revisiting Techniques 4 and 5: The SAOLEDB.11 Provider


SQL Anywhere 11 ships with an alternative OLE DB provider for use with SQL Server, called SAOLEDB.11.

Tip: In theory, also known as "the documentation", you can refer to SAOLEDB.11 by the simpler name "SAOLEDB" but I haven't had any luck with that, finding success only with the full name "SAOLEDB.11".

Caution: The techniques shown in this section will fall over in a heap, also known as a "SQL Server fatal memory leak", if you have any column in your tables that SQL Server regards as a blob. If you're asking the question "Why then use SAOLEDB.11?" the answer is "Because it's faster than MSDASQL"... more on this in the next section called Performance.

Figure 24 shows two versions of the Windows command lines necessary to register the SAOLEDB.11 provider on the computer that's running SQL Server. The two versions are with and without the dialog boxes like the one shown in Figure 25.

Figure 24: Register SAOLEDB.11 Provider
Context: SQL Server
REM With "OK" dialog boxes...
regsvr32 dboledb11.dll
regsvr32 dboledba11.dll
PAUSE

REM Without "OK" dialog boxes...
regsvr32 /s dboledb11.dll
regsvr32 /s dboledba11.dll
PAUSE

Figure 25: RegSvr32 dboledb11
Context: SQL Server


Figure 26 shows some "Magic Settings" that SQL Server needs to enable the SAOLEDB.11 provider.

Note: If the EXEC on lines 7 through 10 in Figure 26 produces the following message, it probably means that DisallowAdHocAccess is already set to zero for SAOLEDB.11, and you may carry on: "RegDeleteValue() returned error 2, 'The system cannot find the file specified.'".

Figure 26: Configure SAOLEDB.11 Provider
Context: SQL Server
EXEC master.dbo.sp_MSset_oledb_prop 
N'SAOLEDB.11', 
N'AllowInProcess', 
1
GO

EXEC master.dbo.sp_MSset_oledb_prop  
N'SAOLEDB.11',  
N'DisallowAdHocAccess',  
0
GO

Here's a tip: You can get the SQL Server Management Studio to help write your scripts by using the Script menu item that appears in many of the property dialog boxes.

For example, consider the task of enabling the "Allow inprocess" which requires you to navigate through the following steps:
Microsoft SQL Server Management Studio
- [servername]
- Server Objects
- Linked Servers
- Providers
- SAOLEDB.11
- Provider Options - SQL Anywhere OLE DB Provider 11
- General page
- check "Enable - Allow inprocess"

Figure 27 shows what the Provider Options dialog for SAOLEDB.11 looks like when you check the Allow Inprocess option and then select Script - Script Action to Clipboard:

Figure 27: Copy Script Action to Clipboard
Context: SQL Server


Here's the code that appears in the clipboard, which you can paste into your script file (a slightly edited version is shown in Figure 26 earlier):
USE [master]
GO
EXEC master.dbo.sp_MSset_oledb_prop N'SAOLEDB.11', N'AllowInProcess', 1
GO

That is truly Magic Code... the stored procedure sp_MSset_oledb_prop is not mentioned anywhere in the SQL Server 2008 Help, let alone documented, nor does any documentation appear on the web... at least none that I can find. All that's out there are a few examples like the code shown above.

Another Tip: The Script Action to Clipboard menu item won't actually generate any code unless you make some kind of change, but it's easy to change an option and then change it back so you can capture the code.

Figure 28 shows how to create a linked server using the SAOLEDB.11 provider instead of MSDASQL when you're using Technique 4: Download Via Linked Server. The code in Figure 28 is very much the same as the earlier Figure 20, with these variations: The @srvproduct on line 3 needs to be filled in with an actual value, the new value for @provider is necessary on line 4, and the @locallogin parameter must be assigned the NULL value on line 11.

Figure 28: Create Linked Server Using SAOLEDB.11
Context: SQL Server
EXEC sp_addlinkedserver
@server = 'mem',
@srvproduct = 'SQL Anywhere OLE DB Provider',
@provider = 'SAOLEDB.11',
@datasrc = 'sa_system_dsn'
GO

EXEC sp_addlinkedsrvlogin
@rmtsrvname = 'mem',
@useself = 'false',
@locallogin = NULL,
@rmtuser = 'dba',
@rmtpassword = 'sql'
GO

Figure 29 shows how to run a quick test to see if your linked server is set up correctly... just select from the single-column single-row "dummy" table on SQL Anywhere.

Figure 29: Test Linked Server Using SAOLEDB.11
Context: SQL Server
SELECT * FROM mem..sys.dummy
GO

Here's what it the right result looks like:
1> SELECT * FROM mem..sys.dummy
2> GO
dummy_col
-----------
0

(1 rows affected)

If you get the following error message blast instead, it may mean you forgot to start the SQL Anywhere database on the target computer:
OLE DB provider "SAOLEDB.11" for linked server "mem" returned message "Database
server not found".
Msg 7399, Level 16, State 1, Server BRECK-PC\TSUNAMI, Line 2
The OLE DB provider "SAOLEDB.11" for linked server "mem" reported an error. Auth
entication failed.
Msg 7303, Level 16, State 1, Server BRECK-PC\TSUNAMI, Line 2
Cannot initialize the data source object of OLE DB provider "SAOLEDB.11" for lin
ked server "mem".

Figure 30 shows the INSERT for Technique 4 using SAOLEDB.11. It's exactly the same code as the INSERT in Figure 20 earlier, except for the fact that a different provider is being used behind the scenes.

Figure 30: Download Via Linked Server Using SAOLEDB.11
Context: SQL Server
INSERT INTO mem..dba.sa_target
SELECT *
FROM mss_source
GO

Once you've configured SAOLEDB.11 you can immediately start using it for Technique 5: Download Via OPENROWSET. Figure 31 shows a simple test you can run to confirm everything's OK so far; another select from dummy, this time using the funky OPENROWSET call.

Figure 31: Test OPENROWSET Using SAOLEDB.11
Context: SQL Server
SELECT *
FROM OPENROWSET ( 'SAOLEDB.11',
'sa_system_dsn'; 'dba'; 'sql',
sys.dummy )
5GO

Figure 32 shows the SAOLEDB.11 version of the OPENROWSET insert shown earlier in Figure 22.

Figure 32: Download Via OPENROWSET Using SAOLEDB.11
Context: SQL Server
INSERT INTO OPENROWSET ( 'SAOLEDB.11', 
'sa_system_dsn'; 'dba'; 'sql',
dba.sa_target )
SELECT *
FROM mss_source
GO

Performance


There are two kinds of performance: the computer's, and yours. Figures 33 and 34 show the first kind.

Figure 33: Download Rows Per Second


Figure 33 clearly shows that

a) the two-step BCP and LOAD TABLE technique is fastest by far,

b) the Proxy Table technique is the slowest,

c) MobiLink is The Best Loser (second place),

d) Linked Server and OPENROWSET techniques have identical performance, which is not surprising since under the hood they are identical techniques,

e) the SAOLEDB.11 provider runs rings around MSDASQL, and

f) the Proxy Table technique isn't THAT much worse than using MSDASQL.

I'll be frank: I expected some of these results, but not MobiLink...I didn't expect MobiLink to be as fast as it is.

Figure 34 shows the performance numbers used to create the graph in Figure 33, plus an extra bit of information: the SQL Anywhere "Row Locks" column.

Figure 34: Download Performance Numbers
Download Via...                          Seconds, Part 1   Seconds, Part 2   Total Seconds   Rows Per Second   Row Locks

Technique 1: BCP and LOAD TABLE             446   274     720   2,674           0
Technique 2: MobiLink                       533   733   1,266   1,521   1,925,469
Technique 3: Proxy Table                  4,230     -   4,230     455   1,925,469
Technique 4: Linked Server - MSDASQL      3,376     -   3,376     570   1,925,469
Technique 4: Linked Server - SAOLEDB.11   1,860     -   1,860   1,035   1,925,469
Technique 5: OPENROWSET - MSDASQL         3,425     -   3,425     562   1,925,469
Technique 5: OPENROWSET - SAOLEDB.11      1,916     -   1,916   1,005   1,925,469

The Row Locks column in Figure 34 shows why LOAD TABLE is a winner: every other technique forces SQL Anywhere to gather and release 1.9 million row locks, but LOAD TABLE is essentially non-transactional from a performance point of view. Couple that with the raw performance of BCP on the SQL Server side and it's unlikely that any other technique will ever come close... when rows per second is your main (or only) measure of success.

The "Seconds, Part 1 and Part 2" columns in Figure 34 apply to the first two techniques: separate times are shown for BCP and LOAD TABLE, and for the two distinct stages of a MobiLink download: Part 1 is for the MobiLink server to SELECT the rows from SQL Server and transmit the download stream (all 1.9 million row images), and Part 2 is for the MobiLink client to INSERT the rows on the SQL Anywhere database.

Comparing human performance is a lot harder; here are some subjective opinions:

a) The BCP and LOAD TABLE technique can be difficult to implement because it uses an external file rather than a direct database connection. Think WAN, think security, encryption, etc.

b) BCP is less than perfectly flexible when it comes to dealing with non-SQL-Server targets and sources. In particular, it can be difficult to specify BCP row and column delimiters that are guaranteed to work with (and not collide with) all possible combinations of characters in the data itself. The good news is, if you can get something to work on the BCP side, the syntax of LOAD TABLE is flexible enough to deal with it.

c) MobiLink is by far the most flexible technique. If you have "interesting" data transformations to perform, like big schema differences between source and target, MobiLink may be your best bet. Plus, MobiLink offers end-to-end encryption.

d) Proxy tables are loved, or hated, there is no middle ground. Performance has been a big issue for many years, not just with complex cross-server joins, but with simple straight-through data transmissions like the one shown in this article.

e) The folks who love proxy tables, love the flexibility. Folks who have used both SQL Anywhere proxy tables and the SQL Server linked server and OPENROWSET techniques tend to have, ahem, mixed opinions about the SQL Server side of things. It's not the syntax that's the problem, it's flexible enough, it's the actual user experience that can leave something to be desired.

Breaking News!


The next version of SQL Anywhere, code-named Innsbruck, entered beta testing while this article was being written. Initial testing of the beta software reveal enormous performance improvements in the processing of proxy tables, at least as far as straight-through data transmissions are concerned.
(Note: Innsbruck is now SQL Anywhere 12, the current Generally Available (GA) version of SQL Anywhere.)
In fact, the proxy table technique using the Innsbruck software is faster than all other techniques shown in this article, even faster than BCP and LOAD TABLE. The proxy table INSERT - SELECT statement shown in Figure 18 took 4,230 seconds to run using SQL Anywhere 11.0.1 software but only 687 seconds using Innsbruck; the effect is clearly shown in Figure 35.

Figure 35: Innsbruck Proxy Table Improvement


Caveats: Figure 35 doesn't show Innsbruck results for the other techniques so the comparison may not be completely fair. Also, performance of the Innsbruck software may change before it becomes generally available

Having said that, the future looks bright for proxy tables; performance has always been the biggest stumbling block in the path of wider acceptance.

But Wait, There's More


The five techniques presented here don't cover all the ways you can copy data back and forth between SQL Server databases and SQL Anywhere in-memory databases. They don't even cover all the download techniques, let alone covering products like Sybase IQ or Oracle.

Here's a partial list of topics-not-covered in this article, just the SQL Server-related ones, for both download and upload:
  • Using SQL Anywhere's FROM OPENSTRING ( FILE ... ) clause to treat the contents of a file as a set of rows.

  • Using SQL Server's FROM OPENROWSET ( BULK ... ) clause to treat the contents of a file as a set of rows.

  • Using the many XML-aware features in both SQL Anywhere and SQL Server to push and pull data back and forth.

  • Using SQL Anywhere's UNLOAD TABLE and UNLOAD SELECT to rapidly create a file from a set of rows.

  • Using SQL Server's bcp.exe utility to rapidly upload data from a file to a table.

  • Using SQL Server's BULK INSERT command to rapidly upload data from a file to a table.

  • Using MobiLink to directly upload data to SQL Server, using the script-driven upload feature to cope with the fact that an in-memory SQL Anywhere database doesn't have the transaction log that is usually used to drive the upload process.

  • Using proxy tables to directly upload data to SQL Server.

  • Using a linked server to directly upload data to SQL Server.

  • Using OPENROWSET to directly upload data to SQL Server.
-- [end] --

No comments: