Applies to:
Oracle Database - Enterprise Edition - Version 10.2.0.1 and later
Oracle Cloud Infrastructure - Database Service - Version N/A and later
Oracle Database Exadata Express Cloud Service - Version N/A and later
Oracle Database Cloud Service - Version N/A and later
Oracle Database Cloud Exadata Service - Version N/A and later
Information in this document applies to any platform.
NOTE: In the images and/or the document content below, the user information and data used represents fictitious data.
Any similarity to actual persons, living or dead, is purely coincidental and not intended in any manner.
Goal
This document explains how the SAMPLE command line parameter of Datapump Export can be used to retrieve only a percentage of the table data for export.
The SAMPLE parameter can prove very useful when e.g. a test database needs to be created which uses a subset of production data, and the amount of data in the production database is too large to handle. Smaller amounts of data make it easier to manage the testing, and this parameter allows extraction of a given percentage of the data from the tables.
The syntax of the command line parameter is:
SAMPLE=[[schema_name.]table_name:]sample_percent
where:
- schema_name is referring to the owner of the table to be sampled
- table_name is the table to be sampled
- sample_percent is the percentage to be extracted from the table
Both schema_name and table_name are optional clauses. If schema_name is left out, then the table(s) are searched for in the schema of the user currently running the export session. If the table_name is left out, then the sample is done for all tables exported in the current session.
Note: if you specify schema_name, then table_name is also mandatory.
The sample_percent has no default value, so when used, a sample percentage must always be provided. It can range from from .000001 up to, but not including, 100.
Note: the sample_percent merely indicates the probability that a row will be selected as part of the sample. It does not mean that the database will retrieve exactly that amount of rows from the table.
Solution
Some examples will be provided to demonstrate the use of the SAMPLE parameter and its impact:
Example #1
In the following example, it is desired to export only about 70% of the data of a table containing 1000 rows. For this purpose the SAMPLE=70 value can be used:
SQL> select count(*) from t1;
COUNT(*)
----------
1000
SQL> exit
$ expdp test/<PASSWORD> directory=DATA_PUMP_DIR dumpfile=<DUMPFILE1> tables=t1 sample=70
Export: Release 11.2.0.2.0 - Production on Fri Feb 24 14:41:32 2012
Copyright (c) 1982, 2009, Oracle and/or its affiliates. All rights reserved.
Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
Starting "TEST"."SYS_EXPORT_TABLE_01": test/******** directory=DATA_PUMP_DIR dumpfile=<DUMPFILE1> tables=t1 sample=70
Estimate in progress using BLOCKS method...
Processing object type TABLE_EXPORT/TABLE/TABLE_DATA
Total estimation using BLOCKS method: 44.79 KB
Processing object type TABLE_EXPORT/TABLE/TABLE
. . exported "TEST"."T1" 9.632 KB 678 rows
Master table "TEST"."SYS_EXPORT_TABLE_01" successfully loaded/unloaded
******************************************************************************
Dump file set for TEST.SYS_EXPORT_TABLE_01 is:
<DUMPFILE1>
Job "TEST"."SYS_EXPORT_TABLE_01" successfully completed at 14:41:59
We see that only 678 rows (roughly 70% of the total # of rows) were exported from table T1.
Example #2
In the following example, no specific table is mentioned, so the SAMPLE parameter will be applied to all tables in the schema of the exporting user. The sample schema contains 3 tables: T1, T2, and T3, each having a different # of rows. Once again, a sample percentage of 70 will be used:
SQL> select count(*) from t1;
COUNT(*)
----------
1000
SQL> select count(*) from t2;
COUNT(*)
----------
2000
SQL> select count(*) from t3;
COUNT(*)
----------
3000
SQL> exit
$ expdp test/<PASSWORD> directory=DATA_PUMP_DIR dumpfile=<DUMPFILE2> sample=70
Export: Release 11.2.0.2.0 - Production on Fri Feb 24 14:43:22 2012
Copyright (c) 1982, 2009, Oracle and/or its affiliates. All rights reserved.
Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
Starting "TEST"."SYS_EXPORT_SCHEMA_01": test/******** directory=DATA_PUMP_DIR dumpfile=<DUMPFILE2> sample=70
Estimate in progress using BLOCKS method...
Processing object type SCHEMA_EXPORT/TABLE/TABLE_DATA
Total estimation using BLOCKS method: 134.3 KB
Processing object type SCHEMA_EXPORT/USER
Processing object type SCHEMA_EXPORT/SYSTEM_GRANT
Processing object type SCHEMA_EXPORT/ROLE_GRANT
Processing object type SCHEMA_EXPORT/DEFAULT_ROLE
Processing object type SCHEMA_EXPORT/PRE_SCHEMA/PROCACT_SCHEMA
Processing object type SCHEMA_EXPORT/TABLE/TABLE
Processing object type SCHEMA_EXPORT/TABLE/INDEX/INDEX
Processing object type SCHEMA_EXPORT/TABLE/CONSTRAINT/CONSTRAINT
Processing object type SCHEMA_EXPORT/TABLE/INDEX/STATISTICS/INDEX_STATISTICS
Processing object type SCHEMA_EXPORT/TABLE/COMMENT
. . exported "TEST"."T1" 9.976 KB 729 rows
. . exported "TEST"."T2" 14.26 KB 1356 rows
. . exported "TEST"."T3" 19.49 KB 2121 rows
Master table "TEST"."SYS_EXPORT_SCHEMA_01" successfully loaded/unloaded
******************************************************************************
Dump file set for TEST.SYS_EXPORT_SCHEMA_01 is:
<DUMPFILE2>
Job "TEST"."SYS_EXPORT_SCHEMA_01" successfully completed at 14:44:45
We see that for each table in the TEST schema approximately 70% of the rows are exported. As indicated before, the SAMPLE parameter doesn't specify an exact amount of data, but merely a likelihood on whether or not rows will be exported from the table.