Aries Research Note: Hadoop Study Note: Hive operations

Monday, March 17, 2014

Hadoop Study Note: Hive operations

In the big data era, hadoop is the main-stream architecture to organize and manage data analysis. Hive is a great tool designed for data warehouse building on top of hadoop, and it is very helpful to provide data summarization, query and analysis. Here, I document three main operations in Hive (according to the Apache Hive website), they are: Data Definition Language (DDL), Data Manipulation Language (DML), and SQL operation.

The interaction between DDL, DML and SQL is: DDL defines a meta-structure for tables, DML helps to manage/load the data into the tables, and SQL queries the tables to get insightful analysis.

Part 1. DDL Operation

Create/Drop/Alter Database
Create/Drop/Truncate Table
Alter Table/Partition/Column
Create/Drop/Alter View
Create/Drop/Alter Index
Create/Drop Function
Create/Drop/Grant/Revoke Roles and Privileges
Show
Describe

Part 2. DML Operation

Loading files into tables
Inserting data into Hive Tables from queries
Writing data into the filesystem from queries

There are two primary ways of modifying data in Hive:

LOAD
INSERT

Part 3. SQL Operation

This is quite similar as SQL language, basically it is based on the Select Syntax

WHERE Clause
ALL and DISTINCT Clauses
Partition Based Queries
HAVING Clause
LIMIT Clause
REGEX Column Specification

More about Select Syntax are:

GROUP BY
SORT BY, ORDER BY, CLUSTER BY, DISTRIBUTE BY
JOIN
UNION ALL
TABLESAMPLE
Subqueries
Virtual Columns
Operators and UDFs
LATERAL VIEW
Windowing, OVER, and Analytics
Common Table Expressions

1 comment:

UnknownSeptember 12, 2016 at 7:26 AM
Nice blog..thanks for sharing the info about hive..

Big data training in chennai
ReplyDelete
Replies

Add comment