| Stable release | 0.9.0 / April 24, 2012 |
|---|---|
| Development status | Active |
| Written in | Java |
| Operating system | Cross-platform |
| License | Apache License 2.0 |
| Website | hive.apache.org |
Apache Hive is a data warehouse infrastructure built on top of Hadoop for providing data summarization, query, and analysis.[1] While initially developed by Facebook, Apache Hive is now used and developed by other companies such as Netflix.[2][3] Amazon maintains a software fork of Apache Hive that is included in Amazon Elastic MapReduce on Amazon Web Services.[4]
Contents |
Apache Hive supports analysis of large datasets stored in Hadoop-compatible file systems such as Amazon S3 filesystem. It provides an SQL-like language called HiveQL while maintaining full support for map/reduce. To accelerate queries, it provides indexes, including bitmap indexes.[5]
By default, Hive stores metadata in an embedded Apache Derby database, and other client/server databases like MySQL can optionally be used.[6]
Currently, there are three file formats supported in Hive, which are TEXTFILE, SEQUENCEFILE and RCFILE.[7][8]
Other features of Hive[9] include:
While based on SQL, HiveQL does not strictly follow the full SQL-92 standard. HiveQL offers extensions not in SQL, including multitable inserts and create table as select, but only offers basic support for indexes. Also, HiveQL lacks support for transactions and materialized views, and only limited subquery support.[10][11]
Internally, a compiler translates HiveQL statements into a directed acyclic graph of MapReduce jobs, which are submitted to Hadoop for execution.[12]
|
|||||||||||||||||||||||||||||||||||||||||||||||
Here you can share your comments or contribute with more information, content, resources or links about this topic.