New to Hadoop
Knowing where to start can be difficult, but this curated list of resources will help.
1. Read Up on Background
Getting a bit of background information first is always a good idea.
- Ask Bigger Questions: A Round Table Discussion
- Hadoop: What It Is, How It Works, and What it Can Do (via O'Reilly)
- Hadoop FAQ - Getting Started Version (via Gwen Shapira)
- MOOC: Intro to Hadoop and MapReduce
- Google's original MapReduce paper
- Cloudera Glossary
- Video: Overview of Hadoop Platform Components
- Presentation: Apache Hadoop in Theory and Practice (via Adam Kawa)
CDH is Cloudera's 100% open-source, enterprise-ready distro of Apache Hadoop and related projects. Install it directly for the best Hadoop experience, or test-drive it in VM form first (or spin-up a cluster in the cloud).
- Explore CDH components (project homepages, docs, blogs, Q&A, downloads)
- Download the QuickStart VM
- How-to: Install CDH and Impala on EC2 using Cloudera Manager Free Edition
- How-to: Deploy a CDH Cluster in Skytap Cloud
- How-to: Create a Hadoop Cluster POC using CDH on EC2 (via Randy Zwitch)
- How-to: Deploy CDH on Windows Azure Virtual Machines using Cloudera Manager (via Thomas Conte)
It's the quickest way to become dangerous.
- See Hadoop Tutorial
- See all How-to's
- See Online Learning
- Read Tom White's "How to Hadoop" series in Dr. Dobb's
When "being dangerous" isn't good enough, it's time to train with Cloudera University.
Reading books, or at least keeping them around for reference, is the best way to progressively deepen your knowledge.
- Hadoop, The Definitive Guide - by Tom White
- Hadoop Operations - by Eric Sammer
- HBase, The Definitive Guide - by Lars George
- Apache Sqoop Cookbook - by Kathleen Ting & Jarek Cecho
- HBase in Action - by Nick Dimiduk & Amandeep Khurana
- Cloudera Impala (e-book) - by John Russell
Make an impact on the quality and direction of the Hadoop stack - by reporting bugs and/or becoming an active contributor to a project.
- See the How to Contribute page