Wednesday, December 24, 2008

Loading Folders

Question: How do I load all the JPEG files from a folder into a table?

Answer: With a proxy table based on CLASS 'DIRECTORY' remote server definition.

Included in this article is the code from yesterday's posting, JPEG Width and Height, so that each image's height and width may be saved in the table:

CREATE TABLE jpeg (
file_name LONG VARCHAR NOT NULL PRIMARY KEY,
file_size BIGINT NOT NULL,
file_date_time TIMESTAMP NOT NULL,
file_contents LONG BINARY NOT NULL,
image_height UNSIGNED INTEGER NOT NULL DEFAULT 0,
image_width UNSIGNED INTEGER NOT NULL DEFAULT 0 );
Here's the code that loads all the C:\temp\*.jpg files into the jpeg table:
BEGIN
DECLARE @folder_name LONG VARCHAR;
DECLARE @image_height UNSIGNED INTEGER;
DECLARE @image_width UNSIGNED INTEGER;

SET @folder_name = 'C:\\temp';

EXECUTE IMMEDIATE STRING (
'CREATE SERVER folder CLASS ''DIRECTORY'' ',
'USING ''ROOT=',
@folder_name,
';SUBDIRS=0''' );

CREATE EXTERNLOGIN DBA TO folder;

CREATE EXISTING TABLE folder AT 'folder;;;.';

FOR f_fetch AS c_fetch NO SCROLL CURSOR FOR
SELECT folder.file_name AS @file_name,
folder.size AS @file_size,
folder.modified_date_time AS @file_date_time,
folder.contents AS @file_contents
FROM folder
WHERE folder.file_name LIKE '%.JPG'
ORDER BY folder.file_name
FOR READ ONLY
DO

CALL get_jpeg_dimensions (
@file_contents, -- IN
@image_height, -- OUT
@image_width ); -- OUT

INSERT jpeg VALUES (
@file_name,
@file_size,
@file_date_time,
@file_contents,
@image_height,
@image_width );

COMMIT;

END FOR;

END;
Here are some notes on the code:
  • The EXECUTE IMMEDIATE statement is necessary because the CREATE SERVER statement doesn't allow expressions in the USING clause (at least, I don't think it does, I can't remember trying :)

  • If you hard-coded the CREATE SERVER instead of using the EXECUTE IMMEDIATE, it would look like this:
    CREATE SERVER folder CLASS 'DIRECTORY' 
    USING 'ROOT=C:\\temp;SUBDIRS=0';

  • The CREATE SERVER ... CLASS 'DIRECTORY' syntax is described here.

  • The CREATE EXTERNLOGIN statement is required for CLASS 'DIRECTORY' remote servers... I'm not sure why, but it's the law.

  • The CREATE EXISTING TABLE statement uses 'folder;;;.' as the "AT" location. The first parameter refers back to the CREATE SERVER name "folder". The second and third parameters are empty; no database or owner names are required. The fourth parameter, the "object name" parameter, is a single dot meaning "this folder", referring back to the earlier ROOT parameter.

  • The CREATE EXISTING TABLE statement gives you a table that looks like this, as shown by Foxhound:
    CREATE EXISTING TABLE DBA.folder -- proxy
    -- (
    -- permissions VARCHAR ( 10 ) NOT NULL,
    -- size BIGINT NOT NULL,
    -- access_date_time TIMESTAMP NOT NULL,
    -- modified_date_time TIMESTAMP NOT NULL,
    -- create_date_time TIMESTAMP NOT NULL,
    -- owner VARCHAR ( 20 ) NOT NULL,
    -- file_name VARCHAR ( 260 ) NOT NULL,
    -- contents LONG BINARY NOT NULL
    -- )
    AT 'folder;;;.';

  • The FOR loop fetches 4 column values from the folder table.

  • The CALL to get_jpeg_dimensions (see JPEG Width and Height) fills in two more values.

  • ...and the INSERT puts another row into the jpeg table.
The time spent to execute this code depends on a lot of factors, not just the state of the database cache. That's my way of saying "I can't get consistent timing figures no matter what I do"... which is another way of saying "I don't know what's going on."

So here goes: 36 JPEG files took anywhere from 1.5 to 3.5 seconds to load into a SQL Anywhere 11 table on the same reasonably fast laptop used yesterday.

Here's the result in dbisql:
SELECT jpeg.file_name,
jpeg.file_size AS "size",
jpeg.file_date_time AS "date_time",
jpeg.file_contents,
jpeg.image_height AS "height",
jpeg.image_width AS "width"
FROM jpeg
ORDER BY jpeg.file_name;



And here's a new feature I didn't know about: SQL Anywhere 11's version of dbisql is smart enough to know that jpeg.file_contents is an "(IMAGE)", and when you click on the funky little ellipsis button "..." you get these marvellous displays:





So... dbisql understands JPEGs too!



For completeness, here is the supporting code from yesterday's article JPEG Width and Height:
CREATE PROCEDURE get_jpeg_dimensions (
IN @file_contents LONG BINARY,
OUT @image_height UNSIGNED INTEGER,
OUT @image_width UNSIGNED INTEGER )
BEGIN

DECLARE @hex_string LONG VARCHAR;
DECLARE @current_pos BIGINT;
DECLARE @jpeg_marker VARCHAR ( 4 );
DECLARE @block_length UNSIGNED INTEGER;

SET @image_height = 0; -- not found / error
SET @image_width = 0;

UNLOAD
SELECT @file_contents
INTO VARIABLE @hex_string
HEXADECIMAL ON;

SET @current_pos = 3; -- skip the leading 0x

-- Process the leading FFD8 SOI Start OF Image header.

CALL scan_jpeg_marker (
@hex_string, @current_pos, @jpeg_marker );

IF @jpeg_marker <> 'FFD8' THEN
RETURN; -- error
END IF;

-- Scan the second marker.

CALL scan_jpeg_marker (
@hex_string, @current_pos, @jpeg_marker );

IF @jpeg_marker = '' THEN
RETURN; -- error
END IF;

-- Look for the first FFCx SOFx Start Of Frame x,
-- stopping if FFD9 EOI End Of Image or
-- FFDA SOS Start Of Scanned data is found.

WHILE 1 = 1 LOOP

IF @jpeg_marker IN (
'FFC0', 'FFC1', 'FFC2', 'FFC3',
'FFC5', 'FFC6', 'FFC7',
'FFC9', 'FFCA', 'FFCB',
'FFCD', 'FFCE', 'FFCF' ) THEN

-- Start of Frame header
-- (note exclusions FFC4, FFC8, FFCC)
-- FFCx marker
-- xxxx block length
-- xx data precision - bits per sample
-- xxxx image height
-- xxxx image width
-- xx number of color components

SET @image_height = HEXTOINT ( SUBSTR (
@hex_string, @current_pos + 6, 4 ) );

SET @image_width = HEXTOINT ( SUBSTR (
@hex_string, @current_pos + 10, 4 ) );

RETURN; -- success

ELSEIF @jpeg_marker IN ( 'FFD9', 'FFDA' ) THEN

-- EOI, SOS

RETURN; -- not found

ELSE

-- Skip over this header to the next one.

SET @block_length = HEXTOINT ( SUBSTR (
@hex_string, @current_pos, 4 ) );

SET @current_pos
= @current_pos + ( @block_length * 2 );

CALL scan_jpeg_marker (
@hex_string, @current_pos, @jpeg_marker );

IF @jpeg_marker = '' THEN
RETURN; -- error
END IF;

END IF;

END LOOP; -- WHILE 1 = 1 LOOP

END; -- get_jpeg_dimensions

CREATE PROCEDURE scan_jpeg_marker (
IN @hex_string LONG VARCHAR,
INOUT @current_pos BIGINT,
OUT @jpeg_marker VARCHAR ( 4 ) )
BEGIN

DECLARE @padded_marker LONG VARCHAR;

-- Match one or more FFs followed by any hex
-- pair other than FF.

SET @padded_marker = REGEXP_SUBSTR (
@hex_string,
'(FF)+([0-9A-F][0-9A-E]|[0-9A-E][0-9A-F])',
@current_pos );

IF @padded_marker IS NOT NULL THEN
SET @jpeg_marker = STRING (
LEFT ( @padded_marker, 2 ),
RIGHT ( @padded_marker, 2 ) );
SET @current_pos
= @current_pos + LENGTH ( @padded_marker );
RETURN; -- success
ELSE
SET @jpeg_marker = '';
RETURN; -- error
END IF;

END; -- scan_jpeg_marker

No comments: