Preparation
CREATE DIRECTORY dp_test AS 'c:\expdp' ;
To achive this we copy the sh.sales table into current schema and insert the content multiple time until we get about 14,000,000 rows and the table is about 500 MB.
CREATE TABLE SALES AS SELECT * FROM sh.SALES
INSERT /*+ append */ INTO SALES SELECT * FROM SALES
Datapump export
Although Datapump provides its own API, we will use the Oracle supplied utility expdp/impdp for any measurement. The following is the output of the process.
C:\expdp>expdp skt/skt@skt tables=sales directory=dp_test dumpfile=sales.dp
Export: Release 10.2.0.1.0 - Production on
Copyright (c) 2003, 2005, Oracle. All rights reserved.
Connected to: Oracle Database 10g
tion
With the Partitioning, OLAP and Data Mining options
Starting "SKT"."SYS_EXPORT_TABLE_02": skt/********@skt tables=sales directory=d
p_test dumpfile=sales.dp
Estimate in progress using BLOCKS method...
Processing object type TABLE_EXPORT/TABLE/TABLE_DATA
Total estimation using BLOCKS method: 563 MB
Processing object type TABLE_EXPORT/TABLE/TABLE
Processing object type TABLE_EXPORT/TABLE/STATISTICS/TABLE_STATISTICS
. . exported "SKT"."SALES" 473.9 MB 14701488 rows
Master table "SKT"."SYS_EXPORT_TABLE_02" successfully loaded/unloaded
******************************************************************************
Dump file set for SKT.SYS_EXPORT_TABLE_02 is:
C:\EXPDP\SALES.DP
Job
"SKT"."SYS_EXPORT_TABLE_02" successfully completed at
Here we can see the entire data export took little less than 2 minutes.
Conventional export
To measure the performance difference we also do a conventional export process and measure the time.
C:\expdp>exp skt/skt@skt tables=sales file=sales.dmp
Export: Release 10.2.0.1.0 - Production on Mon
Sep 19
Copyright (c) 1982, 2005, Oracle. All rights reserved.
Connected to: Oracle Database 10g
tion
With the Partitioning, OLAP and Data Mining options
Export done in WE8MSWIN1252 character set and AL16UTF16 NCHAR character set
About to export specified tables via Conventional Path ...
. . exporting table SALES 14701488 rows exported
Export terminated successfully without warnings.
C:\expdp>dir
Volume in drive C is System
Volume Serial Number is 804E-B1C2
Directory of C:\expdp
3 File(s) 1,067,559,942 bytes
2 Dir(s) 4,575,313,920 bytes free
Here the process took about 7 minutes. Clearly Datapump export is faster than conventional export.
Datapump import
The next thing to test is the Datapump/conventional import. Typically importing of data takes much more time and pain. We drop the sales table and fire the Datapump import job.
C:\expdp>impdp skt/skt@skt tables=sales directory=dp_test dumpfile=sales.dp
Import: Release 10.2.0.1.0 - Production on
Copyright (c) 2003, 2005, Oracle. All rights reserved.
Connected to: Oracle Database 10g
tion
With the Partitioning, OLAP and Data Mining options
Master table "SKT"."SYS_IMPORT_TABLE_01" successfully loaded/unloaded
Starting "SKT"."SYS_IMPORT_TABLE_01": skt/********@skt tables=sales directory=d
p_test dumpfile=sales.dp
Processing object type TABLE_EXPORT/TABLE/TABLE
Processing object type TABLE_EXPORT/TABLE/TABLE_DATA
. . imported "SKT"."SALES" 473.9 MB 14701488 rows
Processing object type TABLE_EXPORT/TABLE/STATISTICS/TABLE_STATISTICS
Job
"SKT"."SYS_IMPORT_TABLE_01" successfully completed at
The entire process of import took little more than 1 minute!
Conventional import
To compare the process, we again drop the table and start with a import process.
C:\expdp>imp skt/skt@skt tables=sales file=sales.dmp
Import: Release 10.2.0.1.0 - Production on Mon
Sep 19
Copyright (c) 1982, 2005, Oracle. All rights reserved.
Connected to: Oracle Database 10g
tion
With the Partitioning, OLAP and Data Mining options
Export file created by EXPORT:V10.02.01 via conventional path
import done in WE8MSWIN1252 character set and AL16UTF16 NCHAR character set
. importing SKT's objects into SKT
. importing SKT's objects into SKT
. . importing table "SALES" 14701488 rows imported
Import terminated successfully without warnings.
C:\expdp>time
The current time is: 14:21:04.53
Enter the new time:
This took about 5 minute. Datapump import process is significantly faster than conventional import process.
New features
The job
The data pump processes work as a background job in the database. You can run a data pump session, detach your self from the process and re-attach yourself to the data pump session to monitor. Also you can stop/pause the job. This requires some change in the data pump parameters we called earlier.
C:\expdp>expdp skt/skt@skt tables=sales directory=dp_test dumpfile=sales.dp job_name=test parallel=4
Here we added two new parameters, we named the data pump job as test (job_name=test) and we run a parallel processing for the export (parallel=4). Both features were not available in the conventional export process.
We get the output as
Export: Release 10.2.0.1.0 - Production on
Copyright (c) 2003, 2005, Oracle. All rights reserved.
Connected to: Oracle Database 10g
With the Partitioning, OLAP and Data Mining options
Starting "SKT"."TEST": skt/********@skt tables=sales directory=dp_test dumpfile=sales.dp job_name=test parallel=4
At this point we press control+c to get the interactive prompt. Had we tried before, we will get some error message and the process will fail (Oracle 10Gr2).
Export> status
Job: TEST
Operation: EXPORT
Mode: TABLE
State: EXECUTING
Bytes Processed: 0
Current Parallelism: 4
Job Error Count: 0
Dump File: C:\EXPDP\SALES.DP
bytes written: 4,096
Worker 1 Status:
State: EXECUTING
Thus we can query the status of the job. The job is still running in the background. Suppose we need to suspend the job we do like this
Export> stop_job
Are you sure you wish to stop this job ([yes]/no): y
And we get back to the command prompt. Now we can re-attach to the job as follows
C:\expdp>expdp skt/skt@skt attach=test
Export: Release 10.2.0.1.0 - Production on
Copyright (c) 2003, 2005, Oracle. All rights reserved.
Connected to: Oracle Database 10g
tion
With the Partitioning, OLAP and Data Mining options
Job: TEST
Owner: SKT
Operation: EXPORT
Creator Privs: FALSE
GUID: 474B4B5E5FB24AEC914D9A7AEC3D7D6E
Start
Time:
Mode: TABLE
Instance: skt
Max Parallelism: 4
EXPORT Job Parameters:
Parameter Name Parameter Value:
CLIENT_COMMAND skt/********@skt tables=sales directory=dp_test dumpfile=sales.dp job_name=test parallel=4
State: IDLING
Bytes Processed: 496,972,408
Percent Done: 99
Current Parallelism: 4
Job Error Count: 0
Dump File: c:\expdp\sales.dp
bytes written: 496,979,968
Worker 1 Status:
State: UNDEFINED
Worker 2 Status:
State: UNDEFINED
Worker 3 Status:
State: UNDEFINED
We are connected with the job but it is currently stopped. Let’s start the job
Export> start_job
Export> status
Job: TEST
Operation: EXPORT
Mode: TABLE
State: COMPLETING
Bytes Processed: 496,972,409
Percent Done: 100
Current Parallelism: 4
Job Error Count: 0
Dump File: c:\expdp\sales.dp
bytes written: 497,037,312
Worker 1 Status:
State: WORK WAITING
Had we wanted to detach from the process keeping the job running, the command is EXIT_CLIENT.
The entire list of parameters available with expdp can be found by issuing the command
C:\expdp>expdp help=y
Keyword |
Description (Default) |
ATTACH |
Attach to existing job, e.g. ATTACH [=job name]. |
COMPRESSION |
Reduce size of dumpfile contents where valid keyword values are: (METADATA_ONLY) and NONE. |
CONTENT |
Specifies data to unload where the valid keywords are: (ALL), DATA_ONLY, and METADATA_ONLY. |
DIRECTORY |
Directory object to be used for dumpfiles and logfiles. |
DUMPFILE |
List of destination dump files (expdat.dmp), e.g. DUMPFILE=scott1.dmp, scott2.dmp, dmpdir:scott3.dmp. |
ENCRYPTION_PASSWORD |
Password key for creating encrypted column data. |
ESTIMATE |
Calculate job estimates where the valid keywords are: (BLOCKS) and STATISTICS. |
ESTIMATE_ONLY |
Calculate job estimates without performing the export. |
EXCLUDE |
Exclude specific object types, e.g. EXCLUDE=TABLE:EMP. |
FILESIZE |
Specify the size of each dumpfile in units of bytes. |
FLASHBACK_SCN |
SCN used to set session snapshot back to. |
FLASHBACK_TIME |
Time used to get the SCN closest to the specified time. |
FULL |
Export entire database (N). |
HELP |
Display Help messages (N). |
INCLUDE |
Include specific object types, e.g. INCLUDE=TABLE_DATA. |
JOB_NAME |
Name of export job to create. |
LOGFILE |
Log file name (export.log). |
NETWORK_LINK |
Name of remote database link to the source system. |
NOLOGFILE |
Do not write logfile (N). |
PARALLEL |
Change the number of active workers for current job. |
PARFILE |
Specify parameter file. |
QUERY |
Predicate clause used to export a subset of a table. |
SAMPLE |
Percentage of data to be exported; |
SCHEMAS |
List of schemas to export (login schema). |
STATUS |
Frequency (secs) job status is to be monitored where the default (0) will show new status when available. |
TABLES |
Identifies a list of tables to export - one schema only. |
TABLESPACES |
Identifies a list of tablespaces to export. |
TRANSPORT_FULL_CHECK |
Verify storage segments of all tables (N). |
TRANSPORT_TABLESPACES. |
List of tablespaces from which metadata will be unloaded |
VERSION |
Version of objects to export where valid keywords are: (COMPATIBLE), LATEST, or any valid database version. |
The following commands are valid while in interactive mode.
Note: abbreviations are allowed
Command |
Description |
ADD_FILE |
Add dumpfile to dumpfile set. |
CONTINUE_CLIENT |
Return to logging mode. Job will be re-started if idle. |
EXIT_CLIENT |
Quit client session and leave job running. |
FILESIZE |
Default filesize (bytes) for subsequent ADD_FILE commands. |
HELP |
Summarize interactive commands. |
KILL_JOB |
Detach and delete job. |
PARALLEL |
Change the number of active workers for current job. PARALLEL=<number of workers>. |
START_JOB |
Start/resume current job. |
STATUS |
Frequency (secs) job status is to be monitored where the default (0) will show new status when available. STATUS[=interval] |
STOP_JOB |
Orderly shutdown of job execution and exits the client. STOP_JOB=IMMEDIATE performs an immediate shutdown of the Data Pump job. |
Similar functionalities are available in impdp as well.
C:\expdp>impdp help=y
Keyword |
Description (Default) |
ATTACH |
Attach to existing job, e.g. ATTACH [=job name]. |
CONTENT |
Specifies data to load where the valid keywords are: (ALL), DATA_ONLY, and METADATA_ONLY. |
DIRECTORY |
Directory object to be used for dump, log, and sql files. |
DUMPFILE |
List of dumpfiles to import from (expdat.dmp), e.g. DUMPFILE=scott1.dmp, scott2.dmp, dmpdir:scott3.dmp. |
ENCRYPTION_PASSWORD |
Password key for accessing encrypted column data. This parameter is not valid for network import jobs. |
ESTIMATE |
Calculate job estimates where the valid keywords are: (BLOCKS) and STATISTICS. |
EXCLUDE |
Exclude specific object types, e.g. EXCLUDE=TABLE:EMP. |
FLASHBACK_SCN |
SCN used to set session snapshot back to. |
FLASHBACK_TIME |
Time used to get the SCN closest to the specified time. |
FULL |
Import everything from source (Y). |
HELP |
Display help messages (N). |
INCLUDE |
Include specific object types, e.g. INCLUDE=TABLE_DATA. |
JOB_NAME |
Name of import job to create. |
LOGFILE |
Log file name (import.log). |
NETWORK_LINK |
Name of remote database link to the source system. |
NOLOGFILE |
Do not write logfile. |
PARALLEL |
Change the number of active workers for current job. |
PARFILE |
Specify parameter file. |
QUERY |
Predicate clause used to import a subset of a table. |
REMAP_DATAFILE |
Redefine datafile references in all DDL statements. |
REMAP_SCHEMA |
Objects from one schema are loaded into another schema. |
REMAP_TABLESPACE |
Tablespace object are remapped to another tablespace. |
REUSE_DATAFILES |
Tablespace will be initialized if it already exists (N). |
SCHEMAS |
List of schemas to import. |
SKIP_UNUSABLE_INDEXES |
Skip indexes that were set to the Index Unusable state. |
SQLFILE |
Write all the SQL DDL to a specified file. |
STATUS |
Frequency (secs) job status is to be monitored where the default (0) will show new status when available. |
STREAMS_CONFIGURATION |
Enable the loading of Streams metadata |
TABLE_EXISTS_ACTION |
Action to take if imported object already exists. Valid keywords: (SKIP), APPEND, REPLACE and TRUNCATE. |
TABLES |
Identifies a list of tables to import. |
TABLESPACES |
Identifies a list of tablespaces to import. |
TRANSFORM |
Metadata transform to apply to applicable objects. Valid transform keywords: SEGMENT_ATTRIBUTES, STORAGE OID, and PCTSPACE. |
TRANSPORT_DATAFILES |
List of datafiles to be imported by transportable mode. |
TRANSPORT_FULL_CHECK |
Verify storage segments of all tables (N). |
TRANSPORT_TABLESPACES |
List of tablespaces from which metadata will be loaded. Only valid in NETWORK_LINK mode import operations. |
VERSION |
Version of objects to export where valid keywords are: (COMPATIBLE), LATEST, or any valid database version. Only valid for NETWORK_LINK and SQLFILE. |
The following commands are valid while in interactive mode.
Note: abbreviations are allowed
Command |
Description (Default) |
CONTINUE_CLIENT |
Return to logging mode. Job will be re-started if idle. |
EXIT_CLIENT |
Quit client session and leave job running. |
HELP |
Summarize interactive commands. |
KILL_JOB |
Detach and delete job. |
PARALLEL |
Change the number of active workers for current job. PARALLEL=<number of workers>. |
START_JOB |
Start/resume current job. START_JOB=SKIP_CURRENT will start the job after skipping any action which was in progress when job was stopped. |
STATUS |
Frequency (secs) job status is to be monitored where the default (0) will show new status when available. STATUS[=interval] |
STOP_JOB |
Orderly shutdown of job execution and exits the client. STOP_JOB=IMMEDIATE performs an immediate shutdown of the Data Pump job. |
Do not like expdp/impdp? Use the API to make your own client
Oracle made the Datapump API available. So utilizing the supplied PL/SQL package dbms_datapump it is possible to create your own Datapump client. The only caveat is the third party data types should match with that of Oracle.
In the next few lines we will do a sample one table export by writing a PL/SQL script.
DECLARE
h1 NUMBER; -- Data Pump job handle
BEGIN
-- Create a (user-named) Data Pump job to do a schema export.
h1 := DBMS_DATAPUMP.OPEN(operation=>'EXPORT',
job_mode=>'TABLE',
remote_link=>NULL,
job_name=>'EXAMPLE1',
version=>'LATEST');
-- Specify a single dump file for the job (using the handle just returned)
-- and a directory object, which must already be defined and accessible
-- to the user running this procedure.
DBMS_DATAPUMP.ADD_FILE(handle=>h1,
filename=>'example1.dmp',
directory=>'DP_TEST',
filetype => DBMS_DATAPUMP.KU$_FILE_TYPE_DUMP_FILE);
--- We also add a log file
DBMS_DATAPUMP.ADD_FILE(handle=>h1,
filename=>'example1.log',
directory=>'DP_TEST',
filetype => DBMS_DATAPUMP.KU$_FILE_TYPE_LOG_FILE);
-- A metadata filter is used to specify the schema/table that will be exported.
DBMS_DATAPUMP.METADATA_FILTER(handle=>h1,
name =>'NAME_EXPR', value =>'IN (''SALES'')');
-- Start the job. An exception will be generated if something is not set up
-- properly.
DBMS_DATAPUMP.START_JOB(h1);
DBMS_DATAPUMP.detach(h1);
END;
/
First we create a datapump job. We define the type of operation we are going to do (export), type of job (table), give a name to the job, and some other parameters that are best left unchanged. One warning is as of date the oracle documentation about the procedure is incorrect about the parameter JOB_MODE, in the document it is called MODE but actually it is JOB_MODE.
h1 := DBMS_DATAPUMP.OPEN(operation=>'EXPORT',
job_mode=>'TABLE',
remote_link=>NULL,
job_name=>'EXAMPLE1',
version=>'LATEST');
People familiar with windows programming will note that the job handle is stored in a variable for subsequent use.
Next, we attach two files to the job. The first one is the export data file and the second one is the log file. Note that the directory object have to be defined in the database. This means the actual directory should be accessible to Oracle for writing. This means although the program can run in any client PC, the output is always generated in the server itself. Of course you can NFS mount/net share a remote directory in the server, but I consider this a limitation.
DBMS_DATAPUMP.ADD_FILE(handle=>h1,
filename=>'example1.dmp',
directory=>'DP_TEST',
filetype => DBMS_DATAPUMP.KU$_FILE_TYPE_DUMP_FILE);
DBMS_DATAPUMP.ADD_FILE(handle=>h1,
filename=>'example1.log',
directory=>'DP_TEST',
filetype => DBMS_DATAPUMP.KU$_FILE_TYPE_LOG_FILE);
Then we specify the name(s) of the tables to be exported. We can specify name of individual tables or a pattern of names. As in this example we were exporting a single table its name is specified here.
DBMS_DATAPUMP.METADATA_FILTER(handle=>h1,
name =>'NAME_EXPR', value =>'IN (''SALES'')');
Lastly we start the job and detach from it. The job will run in background.
DBMS_DATAPUMP.START_JOB(h1);
DBMS_DATAPUMP.DETACH(h1);
For more detail examples check out http://download-east.oracle.com/docs/cd/B19306_01/server.102/b14215/dp_api.htm#sthref460.