Datapump

Preparation

Create directory in server. This directory will be used for placing datapump output/input objects.

CREATE DIRECTORY dp_test AS 'c:\expdp' ;

Create test data. As we are trying to measure the comparative speed between the conventional export/import and datapump, the data volume has to be high enough to get any measurable amount of diference.

To achive this we copy the sh.sales table into current schema and insert the content multiple time until we get about 14,000,000 rows and the table is about 500 MB.

CREATE TABLE SALES AS SELECT * FROM sh.SALES

INSERT /*+ append */ INTO SALES SELECT * FROM SALES

Datapump export

Although Datapump provides its own API, we will use the Oracle supplied utility expdp/impdp for any measurement. The following is the output of the process.

C:\expdp>expdp skt/skt@skt tables=sales directory=dp_test dumpfile=sales.dp

Export: Release 10.2.0.1.0 - Production on Monday, 19 September, 2005 13:17:15

Connected to: Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - Produc

tion

With the Partitioning, OLAP and Data Mining options

Starting "SKT"."SYS_EXPORT_TABLE_02": skt/********@skt tables=sales directory=d

p_test dumpfile=sales.dp

Estimate in progress using BLOCKS method...

Processing object type TABLE_EXPORT/TABLE/TABLE_DATA

Total estimation using BLOCKS method: 563 MB

Processing object type TABLE_EXPORT/TABLE/TABLE

Processing object type TABLE_EXPORT/TABLE/STATISTICS/TABLE_STATISTICS

. . exported "SKT"."SALES" 473.9 MB 14701488 rows

Master table "SKT"."SYS_EXPORT_TABLE_02" successfully loaded/unloaded

******************************************************************************

Dump file set for SKT.SYS_EXPORT_TABLE_02 is:

C:\EXPDP\SALES.DP

Job "SKT"."SYS_EXPORT_TABLE_02" successfully completed at 13:19:12

Here we can see the entire data export took little less than 2 minutes.

Conventional export

To measure the performance difference we also do a conventional export process and measure the time.

C:\expdp>exp skt/skt@skt tables=sales file=sales.dmp

Export: Release 10.2.0.1.0 - Production on Mon Sep 19 13:21:09 2005

Connected to: Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - Produc

tion

With the Partitioning, OLAP and Data Mining options

Export done in WE8MSWIN1252 character set and AL16UTF16 NCHAR character set

About to export specified tables via Conventional Path ...

. . exporting table SALES 14701488 rows exported

Export terminated successfully without warnings.

C:\expdp>dir

Volume in drive C is System

Volume Serial Number is 804E-B1C2

Directory of C:\expdp

09/19/2005 01:21p <DIR> .

09/19/2005 01:21p <DIR> ..

09/19/2005 01:19p 1,030 export.log

09/19/2005 01:28p 570,497,024 sales.dmp

09/19/2005 01:19p 497,061,888 SALES.DP

3 File(s) 1,067,559,942 bytes

2 Dir(s) 4,575,313,920 bytes free

Here the process took about 7 minutes. Clearly Datapump export is faster than conventional export.

Datapump import

The next thing to test is the Datapump/conventional import. Typically importing of data takes much more time and pain. We drop the sales table and fire the Datapump import job.

C:\expdp>impdp skt/skt@skt tables=sales directory=dp_test dumpfile=sales.dp

Import: Release 10.2.0.1.0 - Production on Monday, 19 September, 2005 14:14:43

Connected to: Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - Produc

tion

With the Partitioning, OLAP and Data Mining options

Master table "SKT"."SYS_IMPORT_TABLE_01" successfully loaded/unloaded

Starting "SKT"."SYS_IMPORT_TABLE_01": skt/********@skt tables=sales directory=d

p_test dumpfile=sales.dp

Processing object type TABLE_EXPORT/TABLE/TABLE

Processing object type TABLE_EXPORT/TABLE/TABLE_DATA

. . imported "SKT"."SALES" 473.9 MB 14701488 rows

Processing object type TABLE_EXPORT/TABLE/STATISTICS/TABLE_STATISTICS

Job "SKT"."SYS_IMPORT_TABLE_01" successfully completed at 14:15:55

The entire process of import took little more than 1 minute!

Conventional import

To compare the process, we again drop the table and start with a import process.

C:\expdp>imp skt/skt@skt tables=sales file=sales.dmp

Import: Release 10.2.0.1.0 - Production on Mon Sep 19 14:16:41 2005

Connected to: Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - Produc

tion

With the Partitioning, OLAP and Data Mining options

Export file created by EXPORT:V10.02.01 via conventional path

import done in WE8MSWIN1252 character set and AL16UTF16 NCHAR character set

. importing SKT's objects into SKT

. . importing table "SALES" 14701488 rows imported

Import terminated successfully without warnings.

C:\expdp>time

The current time is: 14:21:04.53

Enter the new time:

This took about 5 minute. Datapump import process is significantly faster than conventional import process.

New features

The job

The data pump processes work as a background job in the database. You can run a data pump session, detach your self from the process and re-attach yourself to the data pump session to monitor. Also you can stop/pause the job. This requires some change in the data pump parameters we called earlier.

C:\expdp>expdp skt/skt@skt tables=sales directory=dp_test dumpfile=sales.dp job_name=test parallel=4

Here we added two new parameters, we named the data pump job as test (job_name=test) and we run a parallel processing for the export (parallel=4). Both features were not available in the conventional export process.

We get the output as

Export: Release 10.2.0.1.0 - Production on Tuesday, 20 September, 2005 13:23:38

Connected to: Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - Production

With the Partitioning, OLAP and Data Mining options

Starting "SKT"."TEST": skt/********@skt tables=sales directory=dp_test dumpfile=sales.dp job_name=test parallel=4

At this point we press control+c to get the interactive prompt. Had we tried before, we will get some error message and the process will fail (Oracle 10Gr2).

Export> status

Job: TEST

Operation: EXPORT

Mode: TABLE

State: EXECUTING

Bytes Processed: 0

Current Parallelism: 4

Job Error Count: 0

Dump File: C:\EXPDP\SALES.DP

bytes written: 4,096

Worker 1 Status:

State: EXECUTING

Thus we can query the status of the job. The job is still running in the background. Suppose we need to suspend the job we do like this

Export> stop_job

Are you sure you wish to stop this job ([yes]/no): y

And we get back to the command prompt. Now we can re-attach to the job as follows

C:\expdp>expdp skt/skt@skt attach=test

Export: Release 10.2.0.1.0 - Production on Tuesday, 20 September, 2005 13:33:08

Connected to: Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - Produc

tion

With the Partitioning, OLAP and Data Mining options

Job: TEST

Owner: SKT

Operation: EXPORT

Creator Privs: FALSE

GUID: 474B4B5E5FB24AEC914D9A7AEC3D7D6E

Start Time: Tuesday, 20 September, 2005 13:33:04

Mode: TABLE

Instance: skt

Max Parallelism: 4

EXPORT Job Parameters:

Parameter Name Parameter Value:

CLIENT_COMMAND skt/********@skt tables=sales directory=dp_test dumpfile=sales.dp job_name=test parallel=4

State: IDLING

Bytes Processed: 496,972,408

Percent Done: 99

Current Parallelism: 4

Job Error Count: 0

Dump File: c:\expdp\sales.dp

bytes written: 496,979,968

Worker 1 Status:

State: UNDEFINED

Worker 2 Status:

State: UNDEFINED

Worker 3 Status:

State: UNDEFINED

We are connected with the job but it is currently stopped. Let�s start the job

Export> start_job

Export> status

Job: TEST

Operation: EXPORT

Mode: TABLE

State: COMPLETING

Bytes Processed: 496,972,409

Percent Done: 100

Current Parallelism: 4

Job Error Count: 0

Dump File: c:\expdp\sales.dp

bytes written: 497,037,312

Worker 1 Status:

State: WORK WAITING

Had we wanted to detach from the process keeping the job running, the command is EXIT_CLIENT.

The entire list of parameters available with expdp can be found by issuing the command

C:\expdp>expdp help=y

Keyword	Description (Default)
ATTACH	Attach to existing job, e.g. ATTACH [=job name].
COMPRESSION	Reduce size of dumpfile contents where valid keyword values are: (METADATA_ONLY) and NONE.
CONTENT	Specifies data to unload where the valid keywords are: (ALL), DATA_ONLY, and METADATA_ONLY.
DIRECTORY	Directory object to be used for dumpfiles and logfiles.
DUMPFILE	List of destination dump files (expdat.dmp), e.g. DUMPFILE=scott1.dmp, scott2.dmp, dmpdir:scott3.dmp.
ENCRYPTION_PASSWORD	Password key for creating encrypted column data.
ESTIMATE	Calculate job estimates where the valid keywords are: (BLOCKS) and STATISTICS.
ESTIMATE_ONLY	Calculate job estimates without performing the export.
EXCLUDE	Exclude specific object types, e.g. EXCLUDE=TABLE:EMP.
FILESIZE	Specify the size of each dumpfile in units of bytes.
FLASHBACK_SCN	SCN used to set session snapshot back to.
FLASHBACK_TIME	Time used to get the SCN closest to the specified time.
FULL	Export entire database (N).
HELP	Display Help messages (N).
INCLUDE	Include specific object types, e.g. INCLUDE=TABLE_DATA.
JOB_NAME	Name of export job to create.
LOGFILE	Log file name (export.log).
NETWORK_LINK	Name of remote database link to the source system.
NOLOGFILE	Do not write logfile (N).
PARALLEL	Change the number of active workers for current job.
PARFILE	Specify parameter file.
QUERY	Predicate clause used to export a subset of a table.
SAMPLE	Percentage of data to be exported;
SCHEMAS	List of schemas to export (login schema).
STATUS	Frequency (secs) job status is to be monitored where the default (0) will show new status when available.
TABLES	Identifies a list of tables to export - one schema only.
TABLESPACES	Identifies a list of tablespaces to export.
TRANSPORT_FULL_CHECK	Verify storage segments of all tables (N).
TRANSPORT_TABLESPACES.	List of tablespaces from which metadata will be unloaded
VERSION	Version of objects to export where valid keywords are: (COMPATIBLE), LATEST, or any valid database version.

The following commands are valid while in interactive mode.

Note: abbreviations are allowed

Command	Description
ADD_FILE	Add dumpfile to dumpfile set.
CONTINUE_CLIENT	Return to logging mode. Job will be re-started if idle.
EXIT_CLIENT	Quit client session and leave job running.
FILESIZE	Default filesize (bytes) for subsequent ADD_FILE commands.
HELP	Summarize interactive commands.
KILL_JOB	Detach and delete job.
PARALLEL	Change the number of active workers for current job. PARALLEL=<number of workers>.
START_JOB	Start/resume current job.
STATUS	Frequency (secs) job status is to be monitored where the default (0) will show new status when available. STATUS[=interval]
STOP_JOB	Orderly shutdown of job execution and exits the client. STOP_JOB=IMMEDIATE performs an immediate shutdown of the Data Pump job.

Similar functionalities are available in impdp as well.

C:\expdp>impdp help=y

Keyword	Description (Default)
ATTACH	Attach to existing job, e.g. ATTACH [=job name].
CONTENT	Specifies data to load where the valid keywords are: (ALL), DATA_ONLY, and METADATA_ONLY.
DIRECTORY	Directory object to be used for dump, log, and sql files.
DUMPFILE	List of dumpfiles to import from (expdat.dmp), e.g. DUMPFILE=scott1.dmp, scott2.dmp, dmpdir:scott3.dmp.
ENCRYPTION_PASSWORD	Password key for accessing encrypted column data. This parameter is not valid for network import jobs.
ESTIMATE	Calculate job estimates where the valid keywords are: (BLOCKS) and STATISTICS.
EXCLUDE	Exclude specific object types, e.g. EXCLUDE=TABLE:EMP.
FLASHBACK_SCN	SCN used to set session snapshot back to.
FLASHBACK_TIME	Time used to get the SCN closest to the specified time.
FULL	Import everything from source (Y).
HELP	Display help messages (N).
INCLUDE	Include specific object types, e.g. INCLUDE=TABLE_DATA.
JOB_NAME	Name of import job to create.
LOGFILE	Log file name (import.log).
NETWORK_LINK	Name of remote database link to the source system.
NOLOGFILE	Do not write logfile.
PARALLEL	Change the number of active workers for current job.
PARFILE	Specify parameter file.
QUERY	Predicate clause used to import a subset of a table.
REMAP_DATAFILE	Redefine datafile references in all DDL statements.
REMAP_SCHEMA	Objects from one schema are loaded into another schema.
REMAP_TABLESPACE	Tablespace object are remapped to another tablespace.
REUSE_DATAFILES	Tablespace will be initialized if it already exists (N).
SCHEMAS	List of schemas to import.
SKIP_UNUSABLE_INDEXES	Skip indexes that were set to the Index Unusable state.
SQLFILE	Write all the SQL DDL to a specified file.
STATUS	Frequency (secs) job status is to be monitored where the default (0) will show new status when available.

STREAMS_CONFIGURATION	Enable the loading of Streams metadata
TABLE_EXISTS_ACTION	Action to take if imported object already exists. Valid keywords: (SKIP), APPEND, REPLACE and TRUNCATE.
TABLES	Identifies a list of tables to import.
TABLESPACES	Identifies a list of tablespaces to import.
TRANSFORM	Metadata transform to apply to applicable objects. Valid transform keywords: SEGMENT_ATTRIBUTES, STORAGE OID, and PCTSPACE.
TRANSPORT_DATAFILES	List of datafiles to be imported by transportable mode.
TRANSPORT_FULL_CHECK	Verify storage segments of all tables (N).
TRANSPORT_TABLESPACES	List of tablespaces from which metadata will be loaded. Only valid in NETWORK_LINK mode import operations.
VERSION	Version of objects to export where valid keywords are: (COMPATIBLE), LATEST, or any valid database version. Only valid for NETWORK_LINK and SQLFILE.

The following commands are valid while in interactive mode.

Note: abbreviations are allowed

Command	Description (Default)
CONTINUE_CLIENT	Return to logging mode. Job will be re-started if idle.
EXIT_CLIENT	Quit client session and leave job running.
HELP	Summarize interactive commands.
KILL_JOB	Detach and delete job.
PARALLEL	Change the number of active workers for current job. PARALLEL=<number of workers>.
START_JOB	Start/resume current job. START_JOB=SKIP_CURRENT will start the job after skipping any action which was in progress when job was stopped.
STATUS	Frequency (secs) job status is to be monitored where the default (0) will show new status when available. STATUS[=interval]
STOP_JOB	Orderly shutdown of job execution and exits the client. STOP_JOB=IMMEDIATE performs an immediate shutdown of the Data Pump job.

Do not like expdp/impdp? Use the API to make your own client

Oracle made the Datapump API available. So utilizing the supplied PL/SQL package dbms_datapump it is possible to create your own Datapump client. The only caveat is the third party data types should match with that of Oracle.

In the next few lines we will do a sample one table export by writing a PL/SQL script.

DECLARE

h1 NUMBER; -- Data Pump job handle

BEGIN

-- Create a (user-named) Data Pump job to do a schema export.

h1 := DBMS_DATAPUMP.OPEN(operation=>'EXPORT',

job_mode=>'TABLE',

remote_link=>NULL,

job_name=>'EXAMPLE1',

version=>'LATEST');

-- Specify a single dump file for the job (using the handle just returned)

-- and a directory object, which must already be defined and accessible

-- to the user running this procedure.

DBMS_DATAPUMP.ADD_FILE(handle=>h1,

filename=>'example1.dmp',

directory=>'DP_TEST',

filetype => DBMS_DATAPUMP.KU$_FILE_TYPE_DUMP_FILE);

--- We also add a log file

DBMS_DATAPUMP.ADD_FILE(handle=>h1,

filename=>'example1.log',

directory=>'DP_TEST',

filetype => DBMS_DATAPUMP.KU$_FILE_TYPE_LOG_FILE);

-- A metadata filter is used to specify the schema/table that will be exported.

DBMS_DATAPUMP.METADATA_FILTER(handle=>h1,

name =>'NAME_EXPR', value =>'IN (''SALES'')');

-- Start the job. An exception will be generated if something is not set up

-- properly.

DBMS_DATAPUMP.START_JOB(h1);

DBMS_DATAPUMP.detach(h1);

END;

First we create a datapump job. We define the type of operation we are going to do (export), type of job (table), give a name to the job, and some other parameters that are best left unchanged. One warning is as of date the oracle documentation about the procedure is incorrect about the parameter JOB_MODE, in the document it is called MODE but actually it is JOB_MODE.

h1 := DBMS_DATAPUMP.OPEN(operation=>'EXPORT',

job_mode=>'TABLE',

remote_link=>NULL,

job_name=>'EXAMPLE1',

version=>'LATEST');

People familiar with windows programming will note that the job handle is stored in a variable for subsequent use.

Next, we attach two files to the job. The first one is the export data file and the second one is the log file. Note that the directory object have to be defined in the database. This means the actual directory should be accessible to Oracle for writing. This means although the program can run in any client PC, the output is always generated in the server itself. Of course you can NFS mount/net share a remote directory in the server, but I consider this a limitation.

DBMS_DATAPUMP.ADD_FILE(handle=>h1,

filename=>'example1.dmp',

directory=>'DP_TEST',

filetype => DBMS_DATAPUMP.KU$_FILE_TYPE_DUMP_FILE);

DBMS_DATAPUMP.ADD_FILE(handle=>h1,

filename=>'example1.log',

directory=>'DP_TEST',

filetype => DBMS_DATAPUMP.KU$_FILE_TYPE_LOG_FILE);

Then we specify the name(s) of the tables to be exported. We can specify name of individual tables or a pattern of names. As in this example we were exporting a single table its name is specified here.

DBMS_DATAPUMP.METADATA_FILTER(handle=>h1,

name =>'NAME_EXPR', value =>'IN (''SALES'')');

Lastly we start the job and detach from it. The job will run in background.

DBMS_DATAPUMP.START_JOB(h1);

DBMS_DATAPUMP.DETACH(h1);

For more detail examples check out http://download-east.oracle.com/docs/cd/B19306_01/server.102/b14215/dp_api.htm#sthref460.