Quicktostudy
11/18/2018   
Hive Tutorial
Home »  Hive Tutorial » Hive Dynamic Partition

Hive Dynamic Partition


Hive partition

By default, a simple query in Hive scans the whole Hive table. This slows down the performance when querying a large-size table. The issue could be resolved by creating Hive partition.

Types of partition in hive

Hive support two types of partition. Hive Static Partition and Hive Dynamic Partition

What is hive static partition?

Static Partition is used when the values for partition columns are known well in advance of loading the data into a Hive table.

What is hive dynamic partition?

Dynamic Partition is used when the values for partition columns are known only during loading of the data into a Hive table.

When inserting data to the partition, we need to specify the partition columns. Instead of specifying static values for static partition, Hive also supports dynamically giving partition values. Hive dynamic partition are useful when the data volume is large and we do not know what will be the partition values. For example, the date is dynamically used as partition columns.

Dynamic partition is not enabled by default. We need to set the following properties to make it work:

hive> SET hive.exec.dynamic.partition=true;

By default, the user must specify at least one hive static partition column. This is to avoid accidentally overwriting partition. To disable this restriction, we can set the partition mode to nonstrict from the default strict mode before inserting into hive dynamic partition as follows:

hive> SET hive.exec.dynamic.partition.mode=nonstrict;

When to use hive dynamic partition?

We have to decide when to use hive dynamic partition. These are the common scenario to use hive dynamic partition.

  • Load data from an existing table that is not partition : The user does not implemented partition initially because its small in size. After increases of size of table it affect the performance.This issue should be corrected by using a one time load to hive dynamic partition the table.
  • Do not know the value of partition column :
  • Modify the number of partition column : The user initially designs a table with limited partition column. As the data increases performance is decreases.So to increase performance we need to add partition column. This can be done in Hive by: 1. creating a new table with all required partition columns, 2. loading data into the new table from the already existing partition table, and 3. deleting the existing partition table. Hive dynamic partition should be used to perform step (2).

Steps to create hive dynamic partition

  • Create the table with partition :
    hive>CREATE TABLE table_partitionLog (logId INT, logDescription STRING, logModule STRING)
    > PARTITIONED BY (year INT, month INT, day INT)
    > ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' ;
    
  • Load the data or insert data to hive dynamic partition table :
    hive> INSERT OVERWRITE INTO TABLE table_partitionLog PARTITION (year, month, day)
    > SELECT logId, logDescription, logModule, year(log_date), month(log_date), day(log_date) FROM log;
    

Quick to study hive dynamic partition questions

What is hive partition?
What is hive static partition?
What is hive dynamic partition?
When to use hive dynamic partition?
What are the Steps to create hive dynamic partition
Hive dynamic partition limit
Hive dynamic partition insert
Hive dynamic partition load
Hive dynamic partition strict mode
Hive dynamic partition nonstrict
Hive dynamic partition example
Hive create table dynamic partition
Hive dynamic partition load data
Hive drop dynamic partition
Hive dynamic partition external table
Hive dynamic partition function
Hadoop hive dynamic partition
Hive dynamic partition location
Hive dynamic partition mode


Contact Us| About Us| Terms| Privacy Policies
Powered by Lorquins Technologies© 2017 QuickToStudy.com. All Rights Reserved