Datapump

Preparation

  1. Create directory in server. This directory will be used for placing datapump output/input objects.

CREATE DIRECTORY dp_test AS 'c:\expdp' ;

  1. Create test data. As we are trying to measure the comparative speed between the conventional export/import and datapump, the data volume has to be high enough to get any measurable amount of diference.

To achive this we copy the sh.sales table into current schema and insert the content multiple time until we get about 14,000,000 rows and the table is about 500 MB.

CREATE TABLE SALES AS SELECT * FROM sh.SALES

INSERT /*+ append */ INTO SALES SELECT * FROM SALES

 

Datapump export

Although Datapump provides its own API, we will use the Oracle supplied utility expdp/impdp for any measurement. The following is the output of the process.

C:\expdp>expdp skt/skt@skt tables=sales directory=dp_test dumpfile=sales.dp

 

Export: Release 10.2.0.1.0 - Production on Monday, 19 September, 2005 13:17:15

 

Copyright (c) 2003, 2005, Oracle. All rights reserved.

 

Connected to: Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - Produc

tion

With the Partitioning, OLAP and Data Mining options

Starting "SKT"."SYS_EXPORT_TABLE_02": skt/********@skt tables=sales directory=d

p_test dumpfile=sales.dp

Estimate in progress using BLOCKS method...

Processing object type TABLE_EXPORT/TABLE/TABLE_DATA

Total estimation using BLOCKS method: 563 MB

Processing object type TABLE_EXPORT/TABLE/TABLE

Processing object type TABLE_EXPORT/TABLE/STATISTICS/TABLE_STATISTICS

. . exported "SKT"."SALES" 473.9 MB 14701488 rows

Master table "SKT"."SYS_EXPORT_TABLE_02" successfully loaded/unloaded

******************************************************************************

Dump file set for SKT.SYS_EXPORT_TABLE_02 is:

C:\EXPDP\SALES.DP

Job "SKT"."SYS_EXPORT_TABLE_02" successfully completed at 13:19:12

Here we can see the entire data export took little less than 2 minutes.

Conventional export

To measure the performance difference we also do a conventional export process and measure the time.

C:\expdp>exp skt/skt@skt tables=sales file=sales.dmp

 

Export: Release 10.2.0.1.0 - Production on Mon Sep 19 13:21:09 2005

 

Copyright (c) 1982, 2005, Oracle. All rights reserved.

 

 

Connected to: Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - Produc

tion

With the Partitioning, OLAP and Data Mining options

Export done in WE8MSWIN1252 character set and AL16UTF16 NCHAR character set

 

About to export specified tables via Conventional Path ...

. . exporting table SALES 14701488 rows exported

Export terminated successfully without warnings.

 

C:\expdp>dir

Volume in drive C is System

Volume Serial Number is 804E-B1C2

 

Directory of C:\expdp

 

09/19/2005 01:21p <DIR> .

09/19/2005 01:21p <DIR> ..

09/19/2005 01:19p 1,030 export.log

09/19/2005 01:28p 570,497,024 sales.dmp

09/19/2005 01:19p 497,061,888 SALES.DP

3 File(s) 1,067,559,942 bytes

2 Dir(s) 4,575,313,920 bytes free

 

Here the process took about 7 minutes. Clearly Datapump export is faster than conventional export.

Datapump import

The next thing to test is the Datapump/conventional import. Typically importing of data takes much more time and pain. We drop the sales table and fire the Datapump import job.

C:\expdp>impdp skt/skt@skt tables=sales directory=dp_test dumpfile=sales.dp

 

Import: Release 10.2.0.1.0 - Production on Monday, 19 September, 2005 14:14:43

 

Copyright (c) 2003, 2005, Oracle. All rights reserved.

 

Connected to: Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - Produc

tion

With the Partitioning, OLAP and Data Mining options

Master table "SKT"."SYS_IMPORT_TABLE_01" successfully loaded/unloaded

Starting "SKT"."SYS_IMPORT_TABLE_01": skt/********@skt tables=sales directory=d

p_test dumpfile=sales.dp

Processing object type TABLE_EXPORT/TABLE/TABLE

Processing object type TABLE_EXPORT/TABLE/TABLE_DATA

. . imported "SKT"."SALES" 473.9 MB 14701488 rows

Processing object type TABLE_EXPORT/TABLE/STATISTICS/TABLE_STATISTICS

Job "SKT"."SYS_IMPORT_TABLE_01" successfully completed at 14:15:55

The entire process of import took little more than 1 minute!

Conventional import

To compare the process, we again drop the table and start with a import process.

C:\expdp>imp skt/skt@skt tables=sales file=sales.dmp

 

Import: Release 10.2.0.1.0 - Production on Mon Sep 19 14:16:41 2005

 

Copyright (c) 1982, 2005, Oracle. All rights reserved.

 

 

Connected to: Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - Produc

tion

With the Partitioning, OLAP and Data Mining options

 

Export file created by EXPORT:V10.02.01 via conventional path

import done in WE8MSWIN1252 character set and AL16UTF16 NCHAR character set

. importing SKT's objects into SKT

. importing SKT's objects into SKT

. . importing table "SALES" 14701488 rows imported

Import terminated successfully without warnings.

 

C:\expdp>time

The current time is: 14:21:04.53

Enter the new time:

This took about 5 minute. Datapump import process is significantly faster than conventional import process.

New features

The job

The data pump processes work as a background job in the database. You can run a data pump session, detach your self from the process and re-attach yourself to the data pump session to monitor. Also you can stop/pause the job. This requires some change in the data pump parameters we called earlier.

C:\expdp>expdp skt/skt@skt tables=sales directory=dp_test dumpfile=sales.dp job_name=test parallel=4

Here we added two new parameters, we named the data pump job as test (job_name=test) and we run a parallel processing for the export (parallel=4). Both features were not available in the conventional export process.

We get the output as

Export: Release 10.2.0.1.0 - Production on Tuesday, 20 September, 2005 13:23:38

 

Copyright (c) 2003, 2005, Oracle. All rights reserved.

 

Connected to: Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - Production

With the Partitioning, OLAP and Data Mining options

Starting "SKT"."TEST": skt/********@skt tables=sales directory=dp_test dumpfile=sales.dp job_name=test parallel=4

At this point we press control+c to get the interactive prompt. Had we tried before, we will get some error message and the process will fail (Oracle 10Gr2).

Export> status

 

Job: TEST

Operation: EXPORT

Mode: TABLE

State: EXECUTING

Bytes Processed: 0

Current Parallelism: 4

Job Error Count: 0

Dump File: C:\EXPDP\SALES.DP

bytes written: 4,096

 

Worker 1 Status:

State: EXECUTING

Thus we can query the status of the job. The job is still running in the background. Suppose we need to suspend the job we do like this

Export> stop_job

Are you sure you wish to stop this job ([yes]/no): y

And we get back to the command prompt. Now we can re-attach to the job as follows

 

C:\expdp>expdp skt/skt@skt attach=test

 

Export: Release 10.2.0.1.0 - Production on Tuesday, 20 September, 2005 13:33:08

 

Copyright (c) 2003, 2005, Oracle. All rights reserved.

 

Connected to: Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - Produc

tion

With the Partitioning, OLAP and Data Mining options

 

Job: TEST

Owner: SKT

Operation: EXPORT

Creator Privs: FALSE

GUID: 474B4B5E5FB24AEC914D9A7AEC3D7D6E

Start Time: Tuesday, 20 September, 2005 13:33:04

Mode: TABLE

Instance: skt

Max Parallelism: 4

EXPORT Job Parameters:

Parameter Name Parameter Value:

CLIENT_COMMAND skt/********@skt tables=sales directory=dp_test dumpfile=sales.dp job_name=test parallel=4

State: IDLING

Bytes Processed: 496,972,408

Percent Done: 99

Current Parallelism: 4

Job Error Count: 0

Dump File: c:\expdp\sales.dp

bytes written: 496,979,968

 

Worker 1 Status:

State: UNDEFINED

 

Worker 2 Status:

State: UNDEFINED

 

Worker 3 Status:

State: UNDEFINED

We are connected with the job but it is currently stopped. Let’s start the job

Export> start_job

 

Export> status

 

Job: TEST

Operation: EXPORT

Mode: TABLE

State: COMPLETING

Bytes Processed: 496,972,409

Percent Done: 100

Current Parallelism: 4

Job Error Count: 0

Dump File: c:\expdp\sales.dp

bytes written: 497,037,312

 

Worker 1 Status:

State: WORK WAITING

Had we wanted to detach from the process keeping the job running, the command is EXIT_CLIENT.

The entire list of parameters available with expdp can be found by issuing the command

C:\expdp>expdp help=y

 

 

Keyword

Description (Default)

ATTACH

Attach to existing job, e.g. ATTACH [=job name].

COMPRESSION

Reduce size of dumpfile contents where valid

keyword values are: (METADATA_ONLY) and NONE.

CONTENT

Specifies data to unload where the valid keywords are:

(ALL), DATA_ONLY, and METADATA_ONLY.

DIRECTORY

Directory object to be used for dumpfiles and logfiles.

DUMPFILE

List of destination dump files (expdat.dmp),

e.g. DUMPFILE=scott1.dmp, scott2.dmp, dmpdir:scott3.dmp.

ENCRYPTION_PASSWORD

Password key for creating encrypted column data.

ESTIMATE

Calculate job estimates where the valid keywords are:

(BLOCKS) and STATISTICS.

ESTIMATE_ONLY

Calculate job estimates without performing the export.

EXCLUDE

Exclude specific object types, e.g. EXCLUDE=TABLE:EMP.

FILESIZE

Specify the size of each dumpfile in units of bytes.

FLASHBACK_SCN

SCN used to set session snapshot back to.

FLASHBACK_TIME

Time used to get the SCN closest to the specified time.

FULL

Export entire database (N).

HELP

Display Help messages (N).

INCLUDE

Include specific object types, e.g. INCLUDE=TABLE_DATA.

JOB_NAME

Name of export job to create.

LOGFILE

Log file name (export.log).

NETWORK_LINK

Name of remote database link to the source system.

NOLOGFILE

Do not write logfile (N).

PARALLEL

Change the number of active workers for current job.

PARFILE

Specify parameter file.

QUERY

Predicate clause used to export a subset of a table.

SAMPLE

Percentage of data to be exported;

SCHEMAS

List of schemas to export (login schema).

STATUS

Frequency (secs) job status is to be monitored where the default (0) will show new status when available.

TABLES

Identifies a list of tables to export - one schema only.

TABLESPACES

Identifies a list of tablespaces to export.

TRANSPORT_FULL_CHECK

Verify storage segments of all tables (N).

TRANSPORT_TABLESPACES.

List of tablespaces from which metadata will be unloaded

VERSION

Version of objects to export where valid keywords are:

(COMPATIBLE), LATEST, or any valid database version.

 

The following commands are valid while in interactive mode.

Note: abbreviations are allowed

 

Command

Description

ADD_FILE

Add dumpfile to dumpfile set.

CONTINUE_CLIENT

Return to logging mode. Job will be re-started if idle.

EXIT_CLIENT

Quit client session and leave job running.

FILESIZE

Default filesize (bytes) for subsequent ADD_FILE commands.

HELP

Summarize interactive commands.

KILL_JOB

Detach and delete job.

PARALLEL

Change the number of active workers for current job.

PARALLEL=<number of workers>.

START_JOB

Start/resume current job.

STATUS

Frequency (secs) job status is to be monitored where the default (0) will show new status when available.

STATUS[=interval]

STOP_JOB

Orderly shutdown of job execution and exits the client.

STOP_JOB=IMMEDIATE performs an immediate shutdown of the Data Pump job.

Similar functionalities are available in impdp as well.

C:\expdp>impdp help=y

 

 

Keyword

Description (Default)

ATTACH

Attach to existing job, e.g. ATTACH [=job name].

CONTENT

Specifies data to load where the valid keywords are:

(ALL), DATA_ONLY, and METADATA_ONLY.

DIRECTORY

Directory object to be used for dump, log, and sql files.

DUMPFILE

List of dumpfiles to import from (expdat.dmp),

e.g. DUMPFILE=scott1.dmp, scott2.dmp, dmpdir:scott3.dmp.

ENCRYPTION_PASSWORD

Password key for accessing encrypted column data.

This parameter is not valid for network import jobs.

ESTIMATE

Calculate job estimates where the valid keywords are:

(BLOCKS) and STATISTICS.

EXCLUDE

Exclude specific object types, e.g. EXCLUDE=TABLE:EMP.

FLASHBACK_SCN

SCN used to set session snapshot back to.

FLASHBACK_TIME

Time used to get the SCN closest to the specified time.

FULL

Import everything from source (Y).

HELP

Display help messages (N).

INCLUDE

Include specific object types, e.g. INCLUDE=TABLE_DATA.

JOB_NAME

Name of import job to create.

LOGFILE

Log file name (import.log).

NETWORK_LINK

Name of remote database link to the source system.

NOLOGFILE

Do not write logfile.

PARALLEL

Change the number of active workers for current job.

PARFILE

Specify parameter file.

QUERY

Predicate clause used to import a subset of a table.

REMAP_DATAFILE

Redefine datafile references in all DDL statements.

REMAP_SCHEMA

Objects from one schema are loaded into another schema.

REMAP_TABLESPACE

Tablespace object are remapped to another tablespace.

REUSE_DATAFILES

Tablespace will be initialized if it already exists (N).

SCHEMAS

List of schemas to import.

SKIP_UNUSABLE_INDEXES

Skip indexes that were set to the Index Unusable state.

SQLFILE

Write all the SQL DDL to a specified file.

STATUS

Frequency (secs) job status is to be monitored where

the default (0) will show new status when available.

STREAMS_CONFIGURATION

Enable the loading of Streams metadata

TABLE_EXISTS_ACTION

Action to take if imported object already exists.

Valid keywords: (SKIP), APPEND, REPLACE and TRUNCATE.

TABLES

Identifies a list of tables to import.

TABLESPACES

Identifies a list of tablespaces to import.

TRANSFORM

Metadata transform to apply to applicable objects.

Valid transform keywords: SEGMENT_ATTRIBUTES, STORAGE

OID, and PCTSPACE.

TRANSPORT_DATAFILES

List of datafiles to be imported by transportable mode.

TRANSPORT_FULL_CHECK

Verify storage segments of all tables (N).

TRANSPORT_TABLESPACES

List of tablespaces from which metadata will be loaded.

Only valid in NETWORK_LINK mode import operations.

VERSION

Version of objects to export where valid keywords are:

(COMPATIBLE), LATEST, or any valid database version.

Only valid for NETWORK_LINK and SQLFILE.

The following commands are valid while in interactive mode.

Note: abbreviations are allowed

Command

Description (Default)

CONTINUE_CLIENT

Return to logging mode. Job will be re-started if idle.

EXIT_CLIENT

Quit client session and leave job running.

HELP

Summarize interactive commands.

KILL_JOB

Detach and delete job.

PARALLEL

Change the number of active workers for current job.

PARALLEL=<number of workers>.

START_JOB

Start/resume current job.

START_JOB=SKIP_CURRENT will start the job after skipping

any action which was in progress when job was stopped.

STATUS

Frequency (secs) job status is to be monitored where

the default (0) will show new status when available.

STATUS[=interval]

STOP_JOB

Orderly shutdown of job execution and exits the client.

STOP_JOB=IMMEDIATE performs an immediate shutdown of the

Data Pump job.

Do not like expdp/impdp? Use the API to make your own client

Oracle made the Datapump API available. So utilizing the supplied PL/SQL package dbms_datapump it is possible to create your own Datapump client. The only caveat is the third party data types should match with that of Oracle.

In the next few lines we will do a sample one table export by writing a PL/SQL script.

DECLARE

h1 NUMBER; -- Data Pump job handle

BEGIN

 

-- Create a (user-named) Data Pump job to do a schema export.

 

h1 := DBMS_DATAPUMP.OPEN(operation=>'EXPORT',

job_mode=>'TABLE',

remote_link=>NULL,

job_name=>'EXAMPLE1',

version=>'LATEST');

 

-- Specify a single dump file for the job (using the handle just returned)

-- and a directory object, which must already be defined and accessible

-- to the user running this procedure.

 

DBMS_DATAPUMP.ADD_FILE(handle=>h1,

filename=>'example1.dmp',

directory=>'DP_TEST',

filetype => DBMS_DATAPUMP.KU$_FILE_TYPE_DUMP_FILE);

--- We also add a log file

DBMS_DATAPUMP.ADD_FILE(handle=>h1,

filename=>'example1.log',

directory=>'DP_TEST',

filetype => DBMS_DATAPUMP.KU$_FILE_TYPE_LOG_FILE);

 

 

-- A metadata filter is used to specify the schema/table that will be exported.

DBMS_DATAPUMP.METADATA_FILTER(handle=>h1,

name =>'NAME_EXPR', value =>'IN (''SALES'')');

 

-- Start the job. An exception will be generated if something is not set up

-- properly.

 

DBMS_DATAPUMP.START_JOB(h1);

 

DBMS_DATAPUMP.detach(h1);

END;

/

First we create a datapump job. We define the type of operation we are going to do (export), type of job (table), give a name to the job, and some other parameters that are best left unchanged. One warning is as of date the oracle documentation about the procedure is incorrect about the parameter JOB_MODE, in the document it is called MODE but actually it is JOB_MODE.

h1 := DBMS_DATAPUMP.OPEN(operation=>'EXPORT',

job_mode=>'TABLE',

remote_link=>NULL,

job_name=>'EXAMPLE1',

version=>'LATEST');

People familiar with windows programming will note that the job handle is stored in a variable for subsequent use.

Next, we attach two files to the job. The first one is the export data file and the second one is the log file. Note that the directory object have to be defined in the database. This means the actual directory should be accessible to Oracle for writing. This means although the program can run in any client PC, the output is always generated in the server itself. Of course you can NFS mount/net share a remote directory in the server, but I consider this a limitation.


DBMS_DATAPUMP.ADD_FILE(handle=>h1,

filename=>'example1.dmp',

directory=>'DP_TEST',

filetype => DBMS_DATAPUMP.KU$_FILE_TYPE_DUMP_FILE);

DBMS_DATAPUMP.ADD_FILE(handle=>h1,

filename=>'example1.log',

directory=>'DP_TEST',

filetype => DBMS_DATAPUMP.KU$_FILE_TYPE_LOG_FILE);

Then we specify the name(s) of the tables to be exported. We can specify name of individual tables or a pattern of names. As in this example we were exporting a single table its name is specified here.

DBMS_DATAPUMP.METADATA_FILTER(handle=>h1,

name =>'NAME_EXPR', value =>'IN (''SALES'')');

Lastly we start the job and detach from it. The job will run in background.

DBMS_DATAPUMP.START_JOB(h1);

 

DBMS_DATAPUMP.DETACH(h1);

For more detail examples check out http://download-east.oracle.com/docs/cd/B19306_01/server.102/b14215/dp_api.htm#sthref460.