Download Spark

Apache Spark

Fast Analytics and Stream Processing

Apache Spark (incubating) is an open source, parallel data processing framework that complements Apache Hadoop to make it easy to develop fast, unified Big Data applications combining batch, streaming, and interactive analytics on all your data. In collaboration with Databricks – the company leading the development of Spark – Cloudera offers commercial support for Spark with Cloudera Enterprise.

Fast, Powerful Data Processing

For analysts and data scientists who rely on iterative algorithms (e.g. clustering/classification), Spark is 10-100x faster than MapReduce delivering faster time to insight on more data, resulting in better business decisions and user outcomes.

Spark is:

  • Fast: Data processing up to 100x faster than MapReduce, both in-memory and on disk
  • Powerful: Write sophisticated parallel applications quickly in Java, Scala, or Python without having to think in terms of only “map” and “reduce” operators
  • Integrated: Spark is deeply integrated with CDH, able to read any data in HDFS and deployed through Cloudera Manager

Easy, Real-Time Stream Processing

Spark Streaming extends Spark with an API for working with streams, providing exactly-once semantics and full fault tolerance for mission-critical environments. With common code across your batch and streaming applications, you can build sophisticated unified analytic applications quickly and easily.

Spark Streaming is:

  • Easy: Built on Spark’s lightweight yet powerful APIs, Spark Streaming lets you rapidly develop streaming applications
  • Fault tolerant: Unlike other streaming solutions (e.g. Storm), Spark Streaming recovers lost work and delivers exactly-once semantics out of the box with no extra code or configuration
  • Integrated: Reuse the same code for batch and stream processing, even joining streaming data to historical data

Unified Analytics with Cloudera’s Enterprise Data Hub

Organizations need to use more data and more types of data to increase their competitive edge and reduce costs. Their use of data typically spans multiple use cases: reporting on what has happened, deep interactive analysis and data mining to discover why things are happening, and increasingly sophisticated applications to deliver real-time insights to decision makers.

Faster Decisions (Interactive) Better Decisions (Batch) Real-Time Action (Streaming and Applications)
Web Security Why is my website slow? What are the common causes of performance issues? How can I detect and block malicious attacks in real-time?
Retail What are our top selling items across channels? What products and services to customers buy together? How can I deliver relevant promotions to buyers at the point of sale?
Financial Services Who opened multiple accounts in the past 6 months? What are the leading indicators of fraudulent activity? How can I protect my customers from identity theft in real-time?

 

With Cloudera’s enterprise data hub including Spark, you can implement powerful end-to-end analytic workflows, comprising batch data processing, interactive query, deep data mining, and real-time applications all from a single common platform. No need to maintain separate systems – with separate data, metadata, security, management – that quickly lead to complexity and cost.

Cloudera Enterprise Data Hub Spark enables faster batch processing, analytics, and stream processing on Hadoop.

 

Get Support for Spark with Cloudera Enterprise

Cloudera Enterprise is the best way to leverage the power of Spark in production environments. When you deploy Spark with Cloudera Enterprise Flex Edition or Data Hub Edition as part of an enterprise data hub, you can rely on our unique ability to support Spark, as well as actively influence the future of the project.