Quicktostudy
11/20/2018   
Hive Tutorial
Home »  Hive » Hive Partition

Hive Partition


By default, a simple query in Hive scans the whole Hive table. This slows down the performance when querying a large-size table. The issue could be resolved by creating Hive Partition, which is very similar to what is in the RDBMS. In Hive, each partition corresponds to a predefined partition column(s) and stores it as a subdirectory in the tables directory in HDFS. When the table gets queried, only the required partition (directory) of data in the table are queried, so the I/O and time of query is greatly reduced. It is very easy to implement Hive partition when the table is created and check the partition created, as follows.

Create partition when creating tables

hive> CREATE TABLE employee_partitioned(name string, work_place ARRAY,
> sex_age STRUCT,skills_score MAP,
> depart_title MAP>)
> PARTITIONED BY (Year INT, Month INT)
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY '|'
> COLLECTION ITEMS TERMINATED BY ','
> MAP KEYS TERMINATED BY ':';

--Show partitions
hive> SHOW PARTITIONS employee_partitioned;
+------------+
| partition  |
+------------+
+------------+

The partition is not enabled automatically. We have to use ALTER TABLE ADD PARTITION to add partitions to a table. The ADD PARTITION command changes the table’s metadata, but does not load data. If the data does not exist in the partition’s location, queries will not return any results. To drop the partition including both data and metadata, use the ALTER TABLE DROP PARTITION statement as follows.

--Add multiple partition
hive> ALTER TABLE employee_partitioned ADD
> PARTITION (year=2014, month=11)
> PARTITION (year=2014, month=12);

hive> SHOW PARTITIONS employee_partitioned;
+---------------------+
| partition           |
+---------------------+
| year=2016/month=11  |
| year=2016/month=12  |
+---------------------+

--Drop the partition
hive> ALTER TABLE employee_partitioned
> DROP IF EXISTS PARTITION (year=2016, month=11);

hive> SHOW PARTITIONS employee_partitioned;
+---------------------+
| partition           |
+---------------------+
| year=2016/month=12  |
+---------------------+

To load or overwrite data in partition, we can use the LOAD or INSERT OVERWRITE statements. The statement only overwrites the data in the specified partitions. Although partition columns are subdirectory names, we can query or specify them in the SELECT or WHERE statements to narrow down the result set. The following steps show how to load data to the partition table

Load data to the partition

hive>LOAD DATA LOCAL INPATH
> '/home/quicktostudy/employee.txt'
> OVERWRITE INTO TABLE employee_partitioned
> PARTITION (year=2016, month=12);

The alter table/partition statement for file format, location, protections, and concatenation has the same syntax as the alter table statements and is shown here
ALTER TABLE table_name PARTITION partition_spec SET FILEFORMAT file_format;
ALTER TABLE table_name PARTITION partition_spec ENABLE NO_DROP;
ALTER TABLE table_name PARTITION partition_spec ENABLE OFFLINE;
ALTER TABLE table_name PARTITION partition_spec DISABLE NO_DROP;
ALTER TABLE table_name PARTITION partition_spec DISABLE OFFLINE;
ALTER TABLE table_name PARTITION partition_spec CONCATENATE;


Contact Us| About Us| Terms| Privacy Policies
Powered by Lorquins Technologies© 2017 QuickToStudy.com. All Rights Reserved