Frequently Asked Questions (FAQs)
FAQs About Cloudera
- What is Cloudera?
- What is Big Data?
- Who uses Cloudera?
- Why do customers choose Cloudera?
- How does Cloudera work with my existing data management infrastructure?
- What is Cloudera University?
FAQs About Apache Hadoop
- What is Hadoop?
- What are some common use cases for Hadoop?
- What is the best way to get started with Hadoop (for developers and operators)?
FAQs About Cloudera Products
- What are Cloudera's products?
- What makes Cloudera's products unique?
- Why do I need a Cloudera Enterprise subscription?
FAQs About Cloudera and Open Source
- Why does open source matter for customers?
- Is Cloudera's platform open source?
- What does Cloudera’s open-source leadership mean for customers?
Cloudera offers a powerful and integrated Big Data platform comprising software, support, training, professional services, and indemnity. This platform, which has open source Apache Hadoop software at its core, allows customers to store, process, and analyze far more data, of more types and formats, and to do so more affordably than legacy technology -- allowing them to “ask bigger questions”.
Cloudera was founded in 2008 by some of the top minds in Big Data, including Chief Strategy Officer Mike Olson (SleepyCat, Oracle), Chief Scientist Jeff Hammerbacher (Facebook), and Chief Technology Officer Amr Awadallah (Yahoo!). Cloudera's chief architect, Doug Cutting, is a founder of the Hadoop project.
Customer success is Cloudera's highest priority. We’ve enabled long-term, successful deployments for hundreds of customers, with petabytes of data collectively under management, across diverse industries.
Learn more about Cloudera here.
Generally speaking, the term “Big Data” refers to any data that for whatever reason (not just volume) cannot be affordably managed by your traditional systems. The key point is that Big Data is a relative concept and is highly contextual to the environment. For example, even if your organization doesn’t accumulate data on a Facebook-like scale, or even if it primarily collects just one type of data, it may well have Big Data challenges as well as opportunities.
Learn more about Big Data here.
Cloudera’s Big Data platform is the most popular in the world, with hundreds of customers in financial services, government, telecommunications, media, web, advertising, retail, energy, bioinformatics, pharma/healthcare, university research, oil and gas, and gaming.
Learn more about the companies using Cloudera here.
Cloudera was the first commercial provider of Hadoop-related software and services and has the most customers with enterprise requirements, and the most experience supporting them, in the industry. Cloudera’s combined offering of differentiated software (open and closed source), support, training, professional services, and indemnity brings customers the greatest business value, in the shortest amount of time, at the lowest TCO.
Learn more about why customers choose Cloudera here.
The Cloudera Connect Partner Program, more than 600 companies strong, and is designed to champion partner advancement and solution development for the Big Data ecosystem. With more partners than any other Hadoop vendor and the only Hadoop provider with a technology certification program, Cloudera ensures consistency, reliability, and tight integration with enterprise environments.
Learn more about Cloudera’s partners here.
Cloudera University offers public, private, and virtual classroom training and free online resources to give data professionals access to the most comprehensive selection of Hadoop learning materials. It has trained more than 15,000 people and certified more than 5,000 people since 2009, including data professionals from over half of the Fortune 100 and from all 20 of the top global technology firms.
Learn more about training from Cloudera University here.
The Hadoop project, which Doug Cutting founded in 2006, is an effort to create open source implementations of internal systems used by Web-scale companies such as Google, Yahoo!, and Facebook to manage and process massive data volumes. In summary, Hadoop enables distributed, parallel processing of huge amounts of data across industry-standard servers (with storage and processing occurring on the same machines), and it can scale indefinitely.
With Hadoop, you can continually store more data of all varieties, and then add multiple processing and analytic frameworks on top of it -- rather than moving your data to them, which is typical and expensive.
Learn more about Hadoop here.
Hadoop allows you to process and access more data than ever before, so it has many near-term (operational) as well as long-term (strategic) use cases across multiple industries. Generally, Hadoop workloads fall into these broad categories:
- Transformation: Hadoop helps you transform large amounts of data more quickly, reliably, and affordably (e.g., for loading into the data warehouse).
- Active archiving: Hadoop gives you access to data that would otherwise be taken offline (e.g., to tape) due to the high cost of actively managing it.
- Exploration/Analytics: Hadoop lets you analyze and get value from data that otherwise could not be easily modeled in rigid relational systems.
Learn more about real-world applications for Hadoop here.
The Cloudera QuickStart VM -- which contains a single-node Hadoop cluster based on Cloudera Standard, a guest OS, and examples -- lets users install and run Hadoop on their desktop in minutes. It’s a great way to explore Hadoop concepts as well as how Cloudera’s platform works.
Learn more about QuickStart VM here.
Cloudera also offers an open source SDK called the Cloudera Development Kit (CDK). The CDK contains is a set of libraries, tools, examples, and documentation that make it easier for developers to build systems on top of the Hadoop stack.
Learn more about CDK here.
Finally, see "What is Cloudera University?" for information about training.
Cloudera’s platform -- available in the form of CDH, Cloudera Standard, or Cloudera Enterprise -- is designed to specifically address customer opportunities and challenges in Big Data. The platform combines Hadoop with other open source ecosystem projects as well as closed source data management components to create a single, massively scalable system. All the integration work is done for you, and the entire solution is thoroughly tested for enterprise requirements and fully documented.
Learn more about Cloudera products here.
Cloudera’s platform has several differentiating attributes that make it unique, including:
- Differences from commercial alternatives: Cloudera offers differentiating capabilities such as production-grade interactive SQL and Search on Hadoop; comprehensive system management with rolling upgrades, automated disaster recovery, centralized security, proactive health checks, and multi-cluster management; and simplified data management with granular auditing and access control capabilities.
- Differences from stock Apache Hadoop: Although Cloudera's platform contains the same code that can be found in the “upstream” Hadoop ecosystem projects, on a regular (quarterly) basis, Cloudera ships (“backports”) new bug fixes and stable features for users of its platform. Thus, Cloudera customers get predictable and regular access to platform improvements, along with the assurances of rigorous testing and upstream compatibility.
Learn more about Cloudera products here.
Cloudera Enterprise subscriptions, which include access to differentiated management software, 8x5 or 24x7 support, and indemnity, is an essential ingredient in any long-term, sustainable deployment. Furthermore, all Cloudera subscriptions are up for renewal annually, so Cloudera must continually re-prove its value to you.
Learn more about Cloudera support here.
Open source licensing and development offers customers powerful benefits, including freedom from lock-in, free no-obligation evaluation, rapid innovation on a global scale, and community-driven development. Freedom from lock-in is particularly important for customers where components that store and process data are involved.
Learn more about why open source matters here.
The core of Cloudera’s platform, CDH, is open source (Apache License), so users always have the option to move their data to an alternative -- and thus Cloudera must continually earn your business based on merit. In fact, Cloudera is an open source leader in Big Data, with its employees collectively contributing more code to the Hadoop ecosystem than those of any other company.
Cloudera complements this open core with closed source management software that provides key enterprise functionality requested by customers such as support for rolling upgrades, auditing management, and disaster recovery. That software, however, does not store or process data and thus lock-in is not an issue.
Open source benefits, such as freedom from lock-in, are tangible and time-tested. That said, they are just “table stakes” when deploying a strategic open source platform like Hadoop.
Cloudera also leads the way to ensure that customer needs for performance, availability, security, and recoverability are met by new features in the Apache code base, and then shipping/supporting those features for customers in our platform. To make that goal possible, Cloudera employs more ecosystem committers, establishes more successful new ecosystem projects, and contributes more code to that ecosystem, than any other vendor.
Learn more about why open source matters here.