Please see my other blog for Oracle EBusiness Suite Posts - EBMentors

Search This Blog

Note: All the posts are based on practical approach avoiding lengthy theory. All have been tested on some development servers. Please don’t test any post on production servers until you are sure.

Thursday, January 04, 2018

Partitioning Oracle (12c) External Table [Over HDFS]


Partitioned external tables were introduced in Oracle Database 12c Release 2 (12.2), allowing external tables to benefit from partition pruning and partition-wise joins. With the exception of hash partitioning, many partitioning and subpartitioning strategies are supported with some restrictions. In this post I've created a test to get better performance of external table over HDFS.

Pre-requisites

Configuring NFS Gateway for HDFS [HDP]
Creating Oracle External Table (12c) on HDFS using HDP NFS Gateway


1- Create some data for the partitions for external table, I've chosen user_objects table for the purpose of this post.

SQL> select distinct trunc(created) from sys.user_objects order by 1;

TRUNC(CREATED)
------------------
26-JAN-17
18-MAY-17
19-MAY-17
29-MAY-17
...

31-DEC-17
01-JAN-18

21 rows selected.

SQL> select count(*) from sys.user_objects where trunc(created) = '26-JAN-17';

  COUNT(*)
----------
     51020

SQL> select count(*) from sys.user_objects where trunc(created) > '26-JAN-17';

  COUNT(*)
----------
        470

2- I have selected 2 dates (for 2 partitions to be created ) for our test. Now create the files, use the SQL CREATE TABLE AS SELECT statement to select the correct rows for the partition and then write those rows into the file using ORACLE_DATAPUMP driver.


create table xt_uo_dp_p1
organization external
(type oracle_datapump default directory local_os_dir location('xt_uo_dp_p1.dmp'))
as
select * 
from sys.user_objects p1
where trunc(created) = '26-JAN-17'


At this point now data for partition 1 is prepared on local file system. Prepare the data for the partition 2 also.



create table xt_uo_dp_p2
organization external
(type oracle_datapump default directory local_os_dir location('xt_uo_dp_p2.dmp'))
as
select * 
from sys.user_objects p2
where trunc(created) > '26-JAN-17'



3- Verify the files created by the above create table statements

SQL> host ls -ls /data/ora_ext_tab_dir/xt*

6836 -rw-r----- 1 oracle oinstall 7000064 Jan  2 08:40 /data/ora_ext_tab_dir/xt_uo_dp_p1.dmp
  76 -rw-r----- 1 oracle oinstall   77824 Jan  2 08:43 /data/ora_ext_tab_dir/xt_uo_dp_p2.dmp




You can select from each of these external tables to verify that it has the data you intended to write out to HDFS and to be used by new external table. After you have executed the SQL CREATE TABLE AS SELECT statement, you can drop these external tables.

select count(*) from xt_uo_dp_p1
select count(*) from xt_uo_dp_p2

drop table xt_uo_dp_p1
drop table xt_uo_dp_p2

4- After dropping the above 2 tables,if for some reasons, you want to create the external table on already existing data pump access driver file. You can use below statement.



create table xt_uo_dp_p2
(
OBJECT_NAME                                        VARCHAR2(128)
,SUBOBJECT_NAME                                     VARCHAR2(128)
,OBJECT_ID                                          NUMBER
,DATA_OBJECT_ID                                     NUMBER
,OBJECT_TYPE                                        VARCHAR2(23)
,CREATED                                            DATE
,LAST_DDL_TIME                                      DATE
,TIMESTAMP                                          VARCHAR2(19)
,STATUS                                             VARCHAR2(7)
,TEMPORARY                                          VARCHAR2(1)
,GENERATED                                          VARCHAR2(1)
,SECONDARY                                          VARCHAR2(1)
,NAMESPACE                                          NUMBER
,EDITION_NAME                                       VARCHAR2(128)
,SHARING                                            VARCHAR2(18)
,EDITIONABLE                                        VARCHAR2(1)
,ORACLE_MAINTAINED                                  VARCHAR2(1)
,APPLICATION                                        VARCHAR2(1)
,DEFAULT_COLLATION                                  VARCHAR2(100)
,DUPLICATED                                         VARCHAR2(1)
,SHARDED                                            VARCHAR2(1)
,CREATED_APPID                                      NUMBER
,CREATED_VSNID                                      NUMBER
,MODIFIED_APPID                                     NUMBER
,MODIFIED_VSNID                                     NUMBER
)
organization external
(
type oracle_datapump 
default directory local_os_dir 
location('xt_uo_dp_p2.dmp')
);

5- Copy the .dmp files created earlier to HDFS directory location (NFS_DIR)


SQL> host cp /data/ora_ext_tab_dir/xt* /data/hdfsloc/data/oraclenfs

6- Create the new partitioned ORACLE_DATAPUMP table, the table is partitioned on the CREATED column. A table may be partitioned by RANGE, HASH, LIST, SYSTEM, or Composite Range-Hash/List/Range (R+H/L/R).


drop table xt_uo_dp_nfs;

create table xt_uo_dp_nfs
(
OBJECT_NAME                                        VARCHAR2(128)
,SUBOBJECT_NAME                                     VARCHAR2(128)
,OBJECT_ID                                          NUMBER
,DATA_OBJECT_ID                                     NUMBER
,OBJECT_TYPE                                        VARCHAR2(23)
,CREATED                                            DATE
,LAST_DDL_TIME                                      DATE
,TIMESTAMP                                          VARCHAR2(19)
,STATUS                                             VARCHAR2(7)
,TEMPORARY                                          VARCHAR2(1)
,GENERATED                                          VARCHAR2(1)
,SECONDARY                                          VARCHAR2(1)
,NAMESPACE                                          NUMBER
,EDITION_NAME                                       VARCHAR2(128)
,SHARING                                            VARCHAR2(18)
,EDITIONABLE                                        VARCHAR2(1)
,ORACLE_MAINTAINED                                  VARCHAR2(1)
,APPLICATION                                        VARCHAR2(1)
,DEFAULT_COLLATION                                  VARCHAR2(100)
,DUPLICATED                                         VARCHAR2(1)
,SHARDED                                            VARCHAR2(1)
,CREATED_APPID                                      NUMBER
,CREATED_VSNID                                      NUMBER
,MODIFIED_APPID                                     NUMBER
,MODIFIED_VSNID                                     NUMBER
)
organization external
(
type oracle_datapump 
default directory NFS_DIR
)
partition by RANGE (CREATED)
(
     partition p1 VALUES LESS THAN (TO_DATE('27-JAN-2017','DD-MON-YYYY')) location('xt_uo_dp_p1.dmp'),
      partition p2 VALUES LESS THAN (TO_DATE('01-JAN-2018','DD-MON-YYYY')) location('xt_uo_dp_p2.dmp')
);

7- Set the parallelism of external table

alter table xt_uo_dp_nfs parallel (degree 8);


8- Check the table with queries

--Query the partition 1

SQL> select count(*) from sys.xt_uo_dp_nfs where created < TO_DATE('27-JAN-2017','DD-MON-YYYY')
  2  /

  COUNT(*)
----------
     51020

Elapsed: 00:00:00.65

--Query partition 2

SQL> select count(*) from sys.xt_uo_dp_nfs where created > TO_DATE('27-JAN-2017','DD-MON-YYYY') and created < TO_DATE('01-JAN-2018','DD-MON-YYYY');

  COUNT(*)
----------
       471

Elapsed: 00:00:00.40

9- If in future you need to add another partition or modify existing location to existing external table, you need to drop and create external table again with newly added partitions' dmp files or modified locations.

No comments: