Apache Hive is a data warehouse system for Hadoop, which enables data summarization, querying, and analysis of data by using HiveQL (a query language similar to SQL). Hive can be used to interactively explore your data or to create reusable batch processing jobs. After you define the structure, you can use Hive to query that data without knowledge of Java or MapReduce. Hive understands how to work with structured and semi-structured data, such as text files where the fields are delimited by specific characters. Hive also supports custom serializer/deserializers (SerDe) for complex or irregularly structured data
Hive is developed on top of Hadoop as its data warehouse framework for querying and analysis of data that is stored in HDFS.
There are 3 major components in Hive as shown in the architecture diagram. They are hive clients, hive services and Meta Store. Under hive client, we can have different ways to connect to HIVE SERVER in hive services.
These are Thrift client, ODBC driver and JDBC driver. Coming to thrift client, it provides an easy environment to execute the hive commands from a vast range of programming languages. Thrift client bindings for Hive are available for C++, Java, PHP scripts, python scripts and Ruby. Similarly, JDBC and ODBC drivers can be used for communication between hive client and hive servers for compatible options.