Cloudera Introduces Fourth Generation of Its Big Data Platform to Drive Ease of Use, Integration and Adoption of Apache Hadoop for the Enterprise

Jun/05/2012

Release of CDH4 and Cloudera Enterprise 4.0 Sets the Standard for Hadoop in the Enterprise

PALO ALTO, CA – June 5, 2012 – Cloudera, the category leader that is setting the standard for Apache Hadoop in the enterprise, today unveiled the fourth generation of its flagship Apache Hadoop data management platform, Cloudera Enterprise. Cloudera Enterprise 4.0 combines the company’s uniquely powerful Cloudera Manager software with comprehensive round-the-clock expert technical support to deliver the first turnkey system for deploying and managing Hadoop in production. Concurrently, the company also announced the general availability of CDH4 (Cloudera’s Distribution Including Apache Hadoop, version 4), following the successful completion of a rigorous beta program that combined testing and feedback from its enterprise customers and partner ecosystem with the contributions of Cloudera’s engineering team and the greater Apache open source community. Used in conjunction, Cloudera Enterprise 4.0 and CDH4 form an end-to-end solution that enables enterprises to integrate Hadoop with their existing enterprise data management systems for mission-critical applications, offering extensive new features that support high availability, increased security, improved extensibility and sophisticated automation for management of large scale Hadoop clusters for unprecedented ease of use.

“Gartner sees increasing numbers of enterprise customers making the transition from experimenting with Hadoop to using it in production, and to support that, key features like high availability and more robust security are a must,” said Merv Adrian, Research VP, Information Management, Gartner. “These important enterprise features are beginning to appear in the Apache stack, and as they mature, will drive accelerated adoption. Enhanced management offerings to support these projects will be another critical maturity requirement in the face of scarce skills for deploying and managing large deployments.”

Big Data is Big Business – Cloudera Leads the Way in Hadoop Experience in the Enterprise

Big Data represents a significant market opportunity: 80 percent of data worldwide is currently unstructured and the digital universe is expected to reach 35 zettabytes by 2020. Today, more than half of the Fortune 50 run open source Apache Hadoop based on Cloudera. With tens of thousands of nodes in production, Cloudera has established itself as the unequivocal category leader that is setting the standard for Apache Hadoop in the enterprise. As an example, 70% of all the smartphones in the U.S. are more efficient and more reliable because of, and 80% of all online travel bookings are powered by Cloudera software. In addition to its market traction, Cloudera has built a rich, multifaceted partner ecosystem that has attracted more than 250 companies across diverse markets.

As a result of its ongoing innovation and strong business execution, Cloudera has more enterprise customers, users and nodes under management than all competitive Hadoop systems combined; is number one in training and certification; and leads the industry in contributors to open source projects across the entire Hadoop stack.

New Cloudera Product Advancements Make Open Source Hadoop Accessible for Every Enterprise

The fourth generation of Cloudera’s Big Data platform makes it easy for enterprises to use, integrate and adopt Apache Hadoop. It is a massively scalable and extensible data platform that integrates with existing data warehousing investments and operates in the predictable, reliable manner that enterprise data managers expect from their enterprise software.

“We have a non-trivial system operations team and any help we can get in terms of operational support and tooling for the management of our cluster is critical. I definitely see the value in Cloudera Enterprise,” said Doug Meil, Chief Architect, Explorys Medical.

CDH4 marks a major leap forward in the evolution of CDH, the world’s most widely deployed distribution of Apache Hadoop in commercial and non-commercial environments. CDH4 is a 100-percent open source solution that combines Apache Hadoop with other open source applications within the Hadoop stack to deliver advanced, enterprise-grade features, including:

  • High Availability: Offers increased usability for mission critical use cases and applications with a highly available NameNode that eliminates the only remaining single point of failure in HDFS. Heterogeneous clusters minimize downtime and allow users to run different nodes on different versions of Hadoop.
  • Increased Security: Allows for more sensitive data to be stored in CDH with more granular access control to support multi-tenancy. HBase table and column permissions secure which users and groups have access to HBase columns and tables, and Fair Scheduler ACLs secure which groups can administer or submit jobs into different Fair Scheduler pools.
  • Improved Extensibility: Helps solve a broader range of scenarios through coprocessors that enable more sophisticated applications in real time and open resource management (a.k.a. MR2) that allows for multiple data processing frameworks to run on the same Hadoop cluster, inevitably saving costs on storage.
  • Other new features from the Hadoop stack: Common compression codec (Snappy), common file format (Apache Avro), REST over HTTP access to HDFS, web shell (for Apache Pig and Apache HBase), slot-less resource manager, faster and easier user web access to Hadoop systems, 100% gain in filesystem I/O performance, 100% speedup in HBase random reads, 200% improvement in Apache Flume data ingest rate and a 30% faster Apache MapReduce shuffle.

“We use CDH to process and analyze hundreds of terabytes across over 275 nodes. The technical expertise and management functionality offered by the Cloudera Enterprise package gives us the peace of mind we need to run business critical processes on Hadoop,” said Jeremy Lizt, VP Engineering at Rapleaf.

“Informatica was among the first Cloudera technology partners to become certified on CDH,” said Alex Gorelik, senior vice president, Research and Development, Informatica. “Now Cloudera and Informatica are bringing enterprise-ready Hadoop to the mainstream with the latest release of CDH4 and Informatica 9.5. Together, we are making it easier for enterprises to adopt Hadoop so they can gain unprecedented insights from all their transaction and interaction data by delivering all the enterprise capabilities they need to be successful – ease of use, performance, reliability, security, manageability, data quality and data governance.”

Cloudera Manager 4, the latest version of the management component of Cloudera Enterprise, marks the industry’s first and only enterprise-ready management application for Apache Hadoop. Offering unprecedented ease of use, enterprises can now store, process and analyze all their data and do so with complete freedom and improved time to value.

“As Hadoop practitioners look to expand successful proof-of-concept deployments into full-scale production, including migrating mission-critical applications to Hadoop environments, it is critical they have confidence in Hadoop as a manageable, secure, enterprise-ready Big Data platform,” said Wikibon analyst Jeff Kelly. “With the addition of hot-failover capabilities, table and column-level user access controls and integration with popular management tools, Cloudera Enterprise 4.0 has taken huge strides towards achieving this end.”

Cloudera Manager 4’s new features include:

  • Easier Deployment and Management: 3-Step HA Configuration guides setup for the NameNode in three simple steps, Multi-Cluster Management allows for management of multiple clusters from a single instance of Cloudera Manager, and Backwards Compatibility offers flexibility in management with support for both CDH3 and CDH4.
  • Rich Visualizations and Sophisticated Automations for Large-Scale Clusters: Heatmaps enable administrators to quickly identify problem nodes within large clusters and take action, while Federated NameNode Management simplifies the process of growing CDH clusters to billions of files across thousands of nodes.
  • Seamless Integration: The Cloudera Manager API integrates smoothly with existing enterprise management and monitoring tools. Cloudera Manager also includes support for LDAP authentication and now supports additional databases, including Oracle and PostgreSQL and includes new packages for Ubuntu and Debian.
  • Other new features: New features include comprehensive host monitoring, client configuration management and extensive Hadoop setup readiness checks.

“Cloudera is changing the world one petabyte at a time. Big Data is something that the enterprise can no longer afford to ignore, so Cloudera’s products and support empower organizations to maximize the value of data with confidence and ease,” said Charles Zedlewski, VP of Product at Cloudera. “CDH has been tried and tested in over 65% of the world’s largest commercial Hadoop deployments. With the new enhancements we have made, CDH4 and Cloudera Manager 4 together deliver a proven Hadoop platform that is hardened for enterprise use with deep consideration for high availability, scalability, performance, ease of use and other things enterprises have come to expect from any solution deployed to run mission critical processes. We get you going quickly with the ability to store all your data, control the complete Hadoop stack, and process and analyze all your data easily and reliably.”

To learn more about Cloudera Enterprise 4, visit http://www.cloudera.com/products-services/enterprise/. To learn more about CDH4, visit http://www.cloudera.com/cdh4 or download CDH4 for free at http://www.cloudera.com/hadoop/.

About Cloudera

Cloudera, the leader in Apache Hadoop-based software and services, enables data driven enterprises to easily derive business value from all their structured and unstructured data. Cloudera's Distribution including Apache Hadoop (CDH), available to download for free at www.cloudera.com/downloads, is the most comprehensive, tested, stable and widely deployed distribution of Hadoop in commercial and non-commercial environments. For the fastest path to reliably using this completely open source technology in production for Big Data analytics and answering previously un-addressable big questions, organizations can subscribe to Cloudera Enterprise, comprised of Cloudera Manager software and Cloudera Support. Cloudera also offers training and certification on Apache technologies, as well as consulting services. As the top contributor to the Apache open source community and with tens of thousands of nodes under management across customers in financial services, government, telecommunications, media, web, advertising, retail, energy, bioinformatics, pharma/healthcare, university research, oil and gas and gaming, Cloudera's depth of experience and commitment to sharing expertise are unrivaled. www.cloudera.com

Connect with Cloudera

Read the blog: http://www.cloudera.com/blog/
Follow on Twitter: http://twitter.com/cloudera
Visit on Facebook: http://www.facebook.com/cloudera