The LOAD TABLE statement has been a feature of SQL Anywhere since the early days, and the Version 5.5 Help was enthusiastic about it: "LOAD TABLE is more efficient than the ISQL statement INPUT and can be called from any client application."
The official enthusiasm had waned a bit by Version 6: "LOAD TABLE is intended solely for fast loading of large amounts of data. It is not intended for routine use in applications."
The Dark Side of LOAD TABLE was introduced in Version 7: "Inserts are not recorded in the log file. Thus, the inserted rows may not be recovered in the event of a crash. In addition, the LOAD TABLE statement should never be used in database involved in SQL Remote replication or databases used as MobiLink clients because these technologies replicated changes through analysis of the log file.
Versions 8, 9 and 10 were positively shrill on the subject: "Inserts are not recorded in the log file, raising the risk that data will be lost in the event of a crash and making this statement unusable with SQL Remote or with MobiLink remote databases."
Version 11, however, dropped the scary rhetoric and introduced three new LOAD TABLE clauses: WITH FILE NAME LOGGING, WITH ROW LOGGING and WITH CONTENT LOGGING.
WITH FILE NAME LOGGING is the default in Versions 11 through 17, and it is the name given to the way LOAD TABLE worked in all the earlier versions:
- It records the file specification of the input file in the transaction log file, not the individual row INSERT statements.
- It is also the fast method because it doesn't have to write millions of records to the log, and
- it makes the log file much smaller.
- SQL Remote stops working if you use LOAD TABLE WITH FILE NAME LOGGING on any table involved in replication,
- MobiLink stops working if you do the same thing to any remote table involved in synchronization, and
- any database restart recovery that needs to apply transaction log data must have access to the original input file in its original location.
The third problem (LOAD TABLE versus restart recovery) is less well understood.
For example, here's what an ordinary LOAD TABLE FROM 'file' statement
looks like in the SQL Anywhere 17 transaction log:
LOAD TABLE ttt FROM 'C:\\DATA\\TEST\\ttt.TXT';
Let's say you do these four things:
--CHECKPOINT-0000-05896335817-2019-01-04 08:32:43.559 --CONNECT-1004-05896335806-dba-2019-01-04 08:32 --ROLLBACK-1004-05896335851 ROLLBACK WORK go --CHECKPOINT-0000-05896335867-2019-01-04 08:38:26.067 --CONNECT-1007-05896335856-dba-2019-01-04 08:38 --BEGIN TRANSACTION-1007-05896335901 BEGIN TRANSACTION go --BEGIN LOAD TABLE-1007-05896335904: load into table "dba"."ttt" using file 'C:\\TEMP\\ttt.TXT' --END LOAD TABLE-1007-05896335966 --ROLLBACK-1007-05896335969 ROLLBACK WORK go --CHECKPOINT-0000-05896335972-2019-01-04 08:39:05.645 --BEGIN TRANSACTION-1007-05896336006 BEGIN TRANSACTION go --BEGIN LOAD TABLE-1007-05896336009: load into table "dba"."ttt" using file 'C:\\DATA\\TEST\\ttt.TXT' --SQL-1007-05896336077 begin set temporary option "conversion_error" = 'On'; set temporary option "string_rtruncation" = 'On'; set temporary option "date_order" = 'YMD'; set temporary option "nearest_century" = '50'; set temporary option "default_timestamp_increment" = '1'; set temporary option "time_zone_adjustment" = '-300'; load into table "dba"."ttt" using file 'C:\\DATA\\TEST\\ttt.TXT' encoding 'windows-1252'; set temporary option "conversion_error" = ; set temporary option "string_rtruncation" = ; set temporary option "date_order" = ; set temporary option "nearest_century" = ; set temporary option "default_timestamp_increment" = ; set temporary option "time_zone_adjustment" = ; end go --END LOAD TABLE-1007-05896336793 --COMMIT-1007-05896336796-2019-01-04 08:41:03.742 COMMIT WORK go --START CHECKPOINT-0000-05896336807-2019-01-04 08:41:05.581 --FINISH CHECKPOINT-0000-05896336832-2019-01-04 08:41:26.851 --ROLLBACK-1007-05896336848 ROLLBACK WORK go --CHECKPOINT-0000-05896336853-2019-01-04 08:45:55.716
- You make a full backup of your database,
- you run the LOAD TABLE ttt FROM 'C:\\DATA\\TEST\\ttt.TXT' as shown above,
- you back up the log file, and then
- you delete the data file C:\DATA\TEST\ttt.TXT.
Here's how you might try to recover your database:
- You copy your backup database file to replace the current database, and then
- you run dbsrv17 -ad to apply the backup log file to the restored database file.
Here are three alternatives to avoid that problem:
I. 01/04 09:18:32. Starting database "ddd17" (C:\$ web\blog\20181231 WITH FILE NAME LOGGING runs faster, but\ddd17.db) at Fri Jan 04 2019 09:18 I. 01/04 09:18:32. Performance warning: Database file "C:\$ web\blog\20181231 WITH FILE NAME LOGGING runs faster, but\ddd17.db" consists of 213 disk fragments I. 01/04 09:18:32. Database recovery in progress I. 01/04 09:18:32. Last checkpoint at Fri Jan 04 2019 08:29 I. 01/04 09:18:32. Checkpoint log... I. 01/04 09:18:33. Transaction log: C:\$ web\blog\20181231 WITH FILE NAME LOGGING runs faster, but\190104AB.LOG... I. 01/04 09:18:33. Starting checkpoint of "ddd17" (ddd17.db) at Fri Jan 04 2019 09:18 I. 01/04 09:18:33. Finished checkpoint of "ddd17" (ddd17.db) at Fri Jan 04 2019 09:18 I. 01/04 09:18:33. Starting checkpoint of "ddd17" (ddd17.db) at Fri Jan 04 2019 09:18 I. 01/04 09:18:33. Finished checkpoint of "ddd17" (ddd17.db) at Fri Jan 04 2019 09:18 I. 01/04 09:18:33. Starting checkpoint of "ddd17" (ddd17.db) at Fri Jan 04 2019 09:18 I. 01/04 09:18:33. Finished checkpoint of "ddd17" (ddd17.db) at Fri Jan 04 2019 09:18 I. 01/04 09:18:33. Transaction log: C:\$ web\blog\20181231 WITH FILE NAME LOGGING runs faster, but\ddd17.log... I. 01/04 09:18:33. Starting checkpoint of "ddd17" (ddd17.db) at Fri Jan 04 2019 09:18 I. 01/04 09:18:34. Finished checkpoint of "ddd17" (ddd17.db) at Fri Jan 04 2019 09:18 I. 01/04 09:18:34. Starting checkpoint of "ddd17" (ddd17.db) at Fri Jan 04 2019 09:18 I. 01/04 09:18:34. Finished checkpoint of "ddd17" (ddd17.db) at Fri Jan 04 2019 09:18 I. 01/04 09:18:34. Parallel recovery enabled I. 01/04 09:18:34. Starting checkpoint of "ddd17" (ddd17.db) at Fri Jan 04 2019 09:18 I. 01/04 09:18:34. Finished checkpoint of "ddd17" (ddd17.db) at Fri Jan 04 2019 09:18 E. 01/04 09:18:34. *** ERROR *** Assertion failed: 100948 (17.0.9.4882) Failed to redo a database operation (id=4, page_no=0x1, offset=0x128) - Error: Cannot access file 'C:\DATA\TEST\ttt.TXT' -- The system cannot find the file specified.
- Take a full backup right after each LOAD TABLE WITH FILE NAME LOGGING statement,
- preserve all the input files until they are no longer needed for recovery... which is after you eventually do take a full backup, or
- use LOAD TABLE WITH ROW LOGGING instead (or WITH CONTENT LOGGING).
If you regularly load millions of rows, however, you might need the speed of Alternatives 1 and 2. Here's how FILE NAME LOGGING and ROW LOGGING compare:
Here are the test scripts:
LOAD TABLE LOAD TABLE WITH FILE NAME LOGGING WITH ROW LOGGING ====================== ================ Elapsed time 118 seconds 214 seconds - much slower Log growth 0 pages 481,295 pages - huge growth
LOAD TABLE ttt FROM 'C:\\DATA\\TEST\\ttt.TXT';
BEGIN DECLARE @start_at TIMESTAMP; DECLARE @seconds BIGINT; DECLARE @pages BIGINT; DECLARE @growth BIGINT; -- Test A: LOAD TABLE SET @start_at = CURRENT TIMESTAMP; SET @pages = DB_EXTENDED_PROPERTY ( 'FileSize', 'translog' ); MESSAGE STRING ( CURRENT TIMESTAMP, ' Test A starting; @@VERSION = ', @@VERSION ) TO CONSOLE; LOAD TABLE ttt FROM 'C:\\DATA\\TEST\\ttt.TXT'; SET @seconds = DATEDIFF ( SECOND, @start_at, CURRENT TIMESTAMP ); SET @growth = DB_EXTENDED_PROPERTY ( 'FileSize', 'translog' ) - @pages; MESSAGE STRING ( CURRENT TIMESTAMP, ' Test A seconds = ', @seconds, ', log pages growth = ', @growth ) TO CONSOLE; END; I. 01/04 08:38:26. 2019-01-04 08:38:26.035 Test A starting; @@VERSION = 17.0.9.4882 I. 01/04 08:38:26. Starting checkpoint of "ddd17" (ddd17.db) at Fri Jan 04 2019 08:38 I. 01/04 08:38:26. Finished checkpoint of "ddd17" (ddd17.db) at Fri Jan 04 2019 08:38 I. 01/04 08:39:05. 2019-01-04 08:39:05.598 Test A starting; @@VERSION = 17.0.9.4882 I. 01/04 08:39:05. Starting checkpoint of "ddd17" (ddd17.db) at Fri Jan 04 2019 08:39 I. 01/04 08:39:05. Finished checkpoint of "ddd17" (ddd17.db) at Fri Jan 04 2019 08:39 I. 01/04 08:41:03. 2019-01-04 08:41:03.742 Test A seconds = 118, log pages growth = 0 I. 01/04 08:41:05. Starting checkpoint of "ddd17" (ddd17.db) at Fri Jan 04 2019 08:41 I. 01/04 08:41:05. Cache size adjusted to 2270460K I. 01/04 08:41:10. Cache size adjusted to 2256544K I. 01/04 08:41:26. Finished checkpoint of "ddd17" (ddd17.db) at Fri Jan 04 2019 08:41
LOAD TABLE ttt FROM 'C:\\DATA\\TEST\\ttt.TXT' WITH ROW LOGGING;
BEGIN DECLARE @start_at TIMESTAMP; DECLARE @seconds BIGINT; DECLARE @pages BIGINT; DECLARE @growth BIGINT; -- Test B: LOAD TABLE WITH ROW LOGGING SET @start_at = CURRENT TIMESTAMP; SET @pages = DB_EXTENDED_PROPERTY ( 'FileSize', 'translog' ); MESSAGE STRING ( CURRENT TIMESTAMP, ' Test B starting; @@VERSION = ', @@VERSION ) TO CONSOLE; LOAD TABLE ttt FROM 'C:\\DATA\\TEST\\ttt.TXT' WITH ROW LOGGING; SET @seconds = DATEDIFF ( SECOND, @start_at, CURRENT TIMESTAMP ); SET @growth = DB_EXTENDED_PROPERTY ( 'FileSize', 'translog' ) - @pages; MESSAGE STRING ( CURRENT TIMESTAMP, ' Test B seconds = ', @seconds, ', log pages growth = ', @growth ) TO CONSOLE; END; I. 01/04 09:41:29. 2019-01-04 09:41:29.472 Test B starting; @@VERSION = 17.0.9.4882 I. 01/04 09:41:29. Starting checkpoint of "ddd17" (ddd17.db) at Fri Jan 04 2019 09:41 I. 01/04 09:41:29. Finished checkpoint of "ddd17" (ddd17.db) at Fri Jan 04 2019 09:41 I. 01/04 09:45:03. Starting checkpoint of "ddd17" (ddd17.db) at Fri Jan 04 2019 09:45 I. 01/04 09:45:03. 2019-01-04 09:45:03.724 Test B seconds = 214, log pages growth = 481295 I. 01/04 09:45:04. Cache size adjusted to 2267588K I. 01/04 09:45:19. Finished checkpoint of "ddd17" (ddd17.db) at Fri Jan 04 2019 09:45 I. 01/04 09:46:04. Cache size adjusted to 2255732K