Cloudera Impala

Open Source, Real-time Query for Hadoop

Cloudera Impala is an open source Massively Parallel Processing (MPP) query engine that runs natively on Apache Hadoop. The Apache-licensed Impala project brings scalable parallel database technology to Hadoop, enabling users to issue low-latency SQL queries to data stored in HDFS and Apache HBase without requiring data movement or transformation. Impala is integrated from the ground up as part of the Hadoop ecosystem and leverages the same flexible file and data formats, metadata, security and resource management frameworks used by MapReduce, Apache Hive, Apache Pig and other components of the Hadoop stack..

With Impala, analysts and data scientists now have the ability to perform real-time, “speed of thought” analytics on data stored in Hadoop via SQL or through Business Intelligence (BI) tools. The result is that large-scale data processing (via MapReduce) and interactive queries can be done on the same system using the same data and metadata – removing the need to migrate data sets into specialized systems and/or proprietary formats simply to perform analysis.

Key Benefits of Impala

Speed to Insight
Perform interactive analytics directly on data stored in Hadoop. Get answers as quickly as you can ask questions, without the bottlenecks caused by data movement and jumping between data silos.

Cost Savings
Reduce data movement as well as duplicate storage with specialized systems by performing interactive analysis directly on full fidelity data.

Full Fidelity Analysis
Ask questions of all your data - without loss of fidelity from aggregations or conforming to fixed schemas.

Familiarity
Leverage existing BI tools and employee skill sets (SQL) to interact with data stored in Hadoop.

Discoverability
Enable more users to interact with more data by providing a single repository and metadata store from source to analysis.

Unification
Leverage the same file and data formats, metadata, security and resource management frameworks you use for the rest of the Hadoop system.

Key Features of Impala

  • SQL queries on CDH in seconds
  • Native MPP query engine
  • Integration with leading BI tools
  • Support for HDFS and HBase
  • Support for a wide variety of file formats including text, SequenceFiles, Avro, RCFile, LZO and Parquet
  • In-memory data transfers
  • Leverages metadata, ODBC driver, SQL syntax and Beeswax GUI (in Hue) from Apache Hive
  • Kerberos authentication
  • 100% open source (Apache licensed)

Get Support for Impala with an RTQ Subscription

The RTQ (Real-time Query) subscription is the best way to leverage the power of Cloudera Impala. RTQ is an add-on subscription to Cloudera Enterprise. When you add RTQ to Cloudera Enterprise, you can take advantage of our market-leading technical support for Impala as well as actively influence the development of the project.

Learn More About RTQ