Cloudera Announces Game-Changing, Real-Time Query on Hadoop and Leads a New Era of Data Management

Oct/24/2012

Moves Hadoop Beyond Batch, Enabling Data-Driven Enterprises to Ask Bigger Questions at the Speed of Thought

NEW YORK, NY and PALO ALTO, CA--(Marketwire - Oct 24, 2012) - From the Strata + Hadoop World event in New York City, Cloudera, the category leader that sets the standard for Apache Hadoop in the enterprise, today unveiled the industry's first true real-time query engine for Hadoop. This major evolution to the company's Platform for Big Data, Cloudera Enterprise, makes this the first Big Data management solution that allows batch and real-time operations to be performed on any type of data -- unstructured and structured -- within one massively scalable system. This new approach of creating a single, centralized Big Data platform dramatically improves the economics and performance of large scale data management in the enterprise. For the first time, organizations can process data at petabyte scale and, on the same system, interact with that data in real time to deliver "speed-of-thought" insights.

"Mainstream enterprise adoption of Hadoop will inevitably raise expectations," said Tony Baer, Principal Analyst for Ovum. "Enterprises have grown accustomed to interactive querying and on-the-spot analytics with their existing data warehousing and BI infrastructures and will expect no less of Hadoop. With a real-time query capability powered by its new Impala engine, Cloudera is striving to level the playing field in performance and accessibility with massively parallel SQL platforms."

This latest innovation from Cloudera, which takes the market-leading Apache Hadoop platform decisively "beyond batch," is Cloudera Impala™, an Apache-licensed, real-time query engine for data stored in HDFS (Hadoop Distributed File System) and HBase, resulting from two years of in-house development. Cloudera Enterprise RTQ (Real-time Query) provides the management and support capabilities needed to effectively operate Cloudera Impala in production environments. Cloudera partners, including Capgemini Financial Services, Karmasphere, MicroStrategy, Pentaho, Qlikview, and Tableau, have already validated their solutions with Cloudera Enterprise RTQ powered by Impala.

"Today, the power of global business is increasingly fueled by the need to understand and make decisions in real-time, driving changes in the way enterprises analyze information and make decisions," said Jojy Mathew, Vice President and Chief Solution Strategist, Business Information Management at Capgemini Financial Services. "As a leading provider of BIM services and solutions to the global FS market, we are confident that Cloudera Enterprise Real-Time Query powered by Impala is uniquely responsive to the Big Data management needs of organizations. To that end, to leverage the enhanced performance and scalability that Cloudera offers, we are currently integrating the product with a wide array of proven Big Data use cases that we have developed for such critical domains as product, customer, risk management and finance."

"MicroStrategy is committed to helping enterprises excel when it comes to running smarter and faster Big Data queries," said Sanju Bansal, Executive Vice President, MicroStrategy. "For Cloudera's latest release, MicroStrategy has optimized its queries to take advantage of the analytical and performance benefits associated with Cloudera Impala. Now, business people can use MicroStrategy's intuitive analytics and data visualizations to make better decisions from all their data, structured and unstructured."

Enabling the Truly Data-Driven Enterprise
Big Data represents a significant opportunity for businesses. According to a recent Cloudera survey of over 100 customers, more than 70% of enterprises are actively exploring how to extract value from Big Data as a major business imperative. The survey reported operational IT efficiency and competitive advantage as the main business drivers for adopting Hadoop. However, 78% of customers said they need faster queries on Hadoop.

"We have already seen high levels of interest in, and adoption of, Hadoop by enterprises for low-cost storage and transformational processing of large volumes of data, but have argued that for Hadoop to gain more adoption for analytic workloads we need to see analytic tools taking full advantage of Hadoop's scalable parallel processing architecture," said Matt Aslett, research manager, data management and analytics, 451 Research. "Cloudera Enterprise RTQ and Cloudera Impala look to be a significant step in enabling enterprises to take advantage of existing SQL skills and tools to realize the potential of real-time analytics against large volumes of structured and unstructured data stored in Hadoop."

Apache Hadoop started as an offline, batch processing system. Subsequently, Hadoop was extended to service more interactive online workloads. First among these was HBase, the distributed, tabular data store. Impala, the new open source project for real-time workloads announced by Cloudera today, introduces a scalable, distributed query engine to the Hadoop ecosystem. The technology was developed by Lead Architect of the Impala project, Marcel Kornacker, who previously co-architected the query engine for the F1 project at Google. Cloudera Impala boasts a flexible data model so it can work over more complex data than a data warehouse and is efficient, with interactive queries expressed in industry-standard SQL. It can be used by IT and business analysts across a wide range of data types and data volumes to interact at the speed of thought with data stored in HDFS or HBase.

"Expedia manages over four petabytes of data using Cloudera Enterprise. With the addition of Cloudera Enterprise RTQ powered by Impala, we are able to work on one single platform for Big Data rather than many disparate systems for archiving, ETL and analytics. This evolution of Hadoop has enabled us to reduce our latency by 50% and produce a new real business insight service not previously viable," said Jeff Prather, Director of Global Business Intelligence and Data Warehousing Platforms at Expedia.

"Informatica and Cloudera deliver a proven combination of enterprise-ready data integration on Hadoop. Now with the real-time query execution of Cloudera Impala plus the speed of Informatica's real-time data integration, organizations can analyze data faster than ever by performing real-time analytics on Hadoop," said James Markarian, Chief Technology Officer, Informatica.

Now Business and IT Can Interact with Data in Real-time within Hadoop
The Cloudera vision is to enable enterprises to Ask Bigger Questions™ and get bigger answers from all their structured and unstructured business data. By introducing Cloudera Enterprise RTQ powered by Impala, Cloudera unveils a unique system for driving business insights. Moving beyond batch processing, Cloudera enables enterprises to simplify Big Data management to empower more users, avoid large costs for increased data insights, and allow business and IT to interact with data in real time in Hadoop.

"Apache Hadoop has already transformed the industry, unlocking value from Big Data for enterprises around the world," said Mike Olson, CEO of Cloudera. "Until now, enterprises had to limit the work they did with Hadoop because batch-mode processing using MapReduce was just too slow for some business problems. With today's release of Cloudera Enterprise Real-Time Query powered by Impala, we solve that problem. Cloudera Impala complements MapReduce and is the latest addition to our one hundred percent open source Big Data platform. You can now store all your data in Hadoop and use the same hardware to do both powerful analytics and run real-time queries using industry-standard tools and the SQL language. This groundbreaking new project delivers the crucial next step in realizing our vision -- to let our customers Ask Bigger Questions of all their data. Cloudera Enterprise with Real-Time Query powered by Impala is a major advance on the Hadoop platform and opens up new possibilities for Big Data in the enterprise."

Game-changing Advancements in Cloudera Enterprise Include:

  • Performance: Typically delivers 10x faster queries than Hive/MapReduce -- and can be much higher depending on the workload -- with very low latency and a flexible data model.

  • Cost Savings: Saves up to 90% on the incremental infrastructure costs required to process, explore and analyze Big Data. Reduces cost of ownership by running open source community-based technology on standard commodity hardware.

  • Real-time Interaction with Business Data - Cloudera Enterprise RTQ (Real-time Query): Improves usability of workloads in ways never previously possible on Hadoop by collapsing return times for queries from minutes in Hive/MapReduce to seconds powered by Apache-licensed Cloudera Impala.

Cloudera's Unwavering Commitment to the Open Source Community
In addition to contributing Impala to the community as an open source project, Cloudera recently added three of the main contributors to Apache HBase to its top engineering team, who will continue to focus on and commit their development work on HBase and Impala back to the community. With the addition of these new team members -- including Vice President of the HBase Project Management Committee, Michael Stack -- Cloudera accelerates its ability to develop and deliver leading edge technologies that enable and support real-time queries of Hadoop-managed data in the enterprise.

"Our vision at Cloudera has always been to allow our customers to use more than just MapReduce to analyze data stored in HDFS and HBase. We believe Cloudera Impala is the most exciting open source project since Hadoop, and it's the most important framework beyond MapReduce for analyzing data stored in HDFS and HBase. We've taken the hard road to low latency: rather than simply running an inefficient query engine over in-memory data, we've worked diligently to build an efficient query engine that can deliver low latency over data stored either in memory or on disk. Cloudera Impala is unique because it does not require a separate storage manager from HDFS and HBase, and it is unrivalled because it's the first open source, massively parallel query processing engine written from the ground up to incorporate the learnings from the past several decades of research in database systems. Cloudera can now support data ingest, data preparation, reporting and ad hoc query workloads within a single, open source system, and Cloudera Enterprise RTQ powered by Impala will ship having been integrated with key partner software and having already serviced real customer workloads," said Jeff Hammerbacher, cofounder at Cloudera.

About Cloudera
Cloudera, the standard for Apache Hadoop in the enterprise, empowers data-driven enterprises to Ask Bigger Questions™ and get bigger answers from all their data at the speed of thought. Cloudera Enterprise, the platform for Big Data, now with Cloudera Enterprise Real Time Queries powered by Impala, enables organizations to easily derive business value from structured and unstructured data to achieve a significant competitive advantage. Reinventing the economics and performance of Big Data management, Cloudera is the category leader in Apache Hadoop-based software, services and training. CDH (Cloudera's Distribution Including Apache Hadoop), available to download for free at www.cloudera.com/download, is the most comprehensive, tested, stable and widely deployed distribution of Hadoop in commercial and non-commercial environments. For the fastest path to reliably using this completely open source technology in production for Big Data analytics and answering previously un-addressable big questions, organizations can subscribe to Cloudera Enterprise, comprised of Cloudera Manager software and Cloudera Support. As the top contributor to the Apache open source community and with tens of thousands of nodes under management across customers in financial services, government, telecommunications, media, web, advertising, retail, energy, bioinformatics, pharma/healthcare, university research, oil and gas and gaming, Cloudera's depth of Big Data experience and expertise are unrivaled. www.cloudera.com

Connect with Cloudera
Read the blog: http://www.cloudera.com/blog/
Follow on Twitter: http://twitter.com/cloudera
Visit on Facebook: http://www.facebook.com/cloudera

Contact Information
Media Contact
Hope Nicora
Bhava Communications
cloudera@bhavacom.com
510-984-1527