Hive Create Table / Create Table in Hive
Hive internal table and Hive external table
Hive internal table : The concept of a table in Hive is very similar to the table in the relational database. Each table associates with a directory configured in /conf/hive-site.xml in HDFS. By default, it is /user/hive/warehouse in HDFS. For example, /user/hive/warehouse/employee is created by Hive in HDFS for the employee table. All the data in the table will be kept in the directory. The Hive table is also referred to as internal or managed table .
Hive external table :When there is data already in HDFS, an external Hive table can be created to describe the data. It is called EXTERNAL because the data in the external table is specified in the LOCATION properties instead of the default warehouse directory. When keeping data in the internal tables, Hive fully manages the life cycle of the table and data. This means the data is removed once the internal table is dropped. If the external table is dropped, the table metadata is deleted but the data is kept. Most of the time, an external table is preferred to avoid deleting data along with tables by mistake. The following are DDLs for Hive internal and external table examples:
Create internal table in hive / Create table in hive / Hive create table /Create hive internal table
hive> CREATE TABLE IF NOT EXISTS employee_internal > (name string,work_place ARRAY
,sex_age STRUCT , > skills_score MAP ,depart_title MAP >) > COMMENT 'This is an internal table' > ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' > COLLECTION ITEMS TERMINATED BY ',' > MAP KEYS TERMINATED BY ':' > STORED AS TEXTFILE;
load data into hive table
hive> LOAD DATA LOCAL INPATH '/home/hadoop/employee.txt' > OVERWRITE INTO TABLE employee_internal;
Create external table in hive / Create table in hive / Hive create table /Create hive external table
hive> CREATE EXTERNAL TABLE employee_external > (name string,work_place ARRAY
,sex_age STRUCT , > skills_score MAP ,depart_title MAP >) > COMMENT 'This is an external table' > ROW FORMAT DELIMITED > FIELDS TERMINATED BY '|' > COLLECTION ITEMS TERMINATED BY ',' > MAP KEYS TERMINATED BY ':' > STORED AS TEXTFILE > LOCATION '/user/quicktostudy/employee';
load data into external hive table
hive> LOAD DATA LOCAL INPATH '/home/hadoop/employee.txt' > OVERWRITE INTO TABLE employee_external;
Create the table as select in hive (CTAS):
hive> CREATE TABLE ctas_employee > AS SELECT * FROM employee_external;
CTAS copies the data as well as table definitions. The table created by CTAS is atomic; this means that other users do not see the table until all the query results are populated. CTAS has the following restrictions:
The table created cannot be a partitioned table
The table created cannot be an external table
The table created cannot be a list bucketing table
A CTAS statement will trigger a map job for populating the data; even SELECT * itself does not trigger any MapReduce job.
CTAS with Common Table Expression (CTE) can be created as follows:
hive> CREATE TABLE cte_employee AS WITH r1 AS > (SELECT name FROM r2 WHERE name = 'Michael'), > r2 AS (SELECT name FROM employee WHERE sex_age.sex= 'Male'), > r3 AS (SELECT name FROM employee WHERE sex_age.sex= 'Female') > SELECT * FROM r1 UNION ALL select * FROM r3;
CTE is available since Hive 0.13.0. It is a temporary result set derived from a simple SELECT query specified in a WITH clause, followed by SELECT or INSERT keyword to operate this result set. The CTE is defined only within the execution scope of a single statement. One or more CTEs can be used in a nested or chained way with Hive keywords, such as the SELECT, INSERT, CREATE TABLE AS SELECT, or CREATE VIEW AS SELECT statements.
Create Empty table in hive use CTAS
hive> CREATE TABLE empty_ctas_employee AS > SELECT * FROM employee_internal WHERE 1=2;
Create Empty table in hive use LIKE
hive> CREATE TABLE empty_like_employee > LIKE employee_internal;
Quick to study hive questions
How to create internal table in hive?
How to create external table in hive?
What is internal table in hive?
What is external table in hive?
What is external and internal table in hive?