Cloudera Takes Hadoop Security to the Next Level With Sentry: Fine-Grained Authorization for Impala and Apache Hive

Jul/24/2013

New, Open Source Security Module for SQL Query Engines Offers Unprecedented User Access Control; Fulfills Enterprise RBAC Requirements

PALO ALTO, CA--(Marketwired - Jul 24, 2013) - Cloudera, the leader in enterprise analytic data management powered by Apache Hadoop™, today unveiled the next step in the evolution of enterprise-class big data security, introducing Sentry: a new Apache licensed open source project that delivers the industry's first fine-grained authorization framework for Hadoop. An independent security module that integrates with open source SQL query engines Apache Hive and Cloudera Impala, Sentry delivers advanced authorization controls to enable multi-user applications and cross-functional processes for enterprise datasets. This level of granular control, available for the first time in Hadoop, is imperative to meet enterprise Role Based Access Control (RBAC) requirements of highly regulated industries, like healthcare, financial services and government. Sentry alleviates the security concerns that have prevented some organizations from opening Hadoop data systems to a more diverse set of users, extending the power of Hadoop and making it suitable for new industries, organizations and enterprise use cases. Concurrently, the company confirmed it plans to submit the Sentry security module to the Apache Incubator at the Apache Software Foundation later this year.

Apache Hadoop and the Information Security Dilemma
As more enterprises look to embrace the power and benefits of Hadoop, addressing the platform's inherent security vulnerabilities has become paramount. For data safeguards to be deemed compliant with standard data regulatory requirements, each of the four functional areas of information security must be achieved:

    1.  Perimeter: Guarding access to the cluster itself through network security, firewalls and ultimately, authentication to confirm user identities.
    2.  Data: Protecting the data in the cluster from unauthorized visibility through masking and encryption, both at rest and in transit.
    3.  Access: Defining what authenticated users and applications can do with the data in the cluster through file system ACLs and fine-grained authorization.
    4.  Visibility: Reporting on the origins of data and on data usage through centralized auditing and lineage capabilities.

Recent developments by the Hadoop community, as well as integration with solution providers, have addressed the Perimeter and Data elements through authentication, encryption and masking. The release of Cloudera Navigator earlier this year brought Visibility to Hadoop with centralized auditing for files, records and metadata. But the fourth element, Access, has been largely unaddressed until now. While Hadoop has had strong security at the file system level for some time, it lacked the level of granularity required to adequately secure access to data through SQL and BI tools -- cases where assigning privileges at a sub-file level is required. This problem forced companies to make a choice: either leave data unprotected or lock users out entirely. Sentry eliminates that tradeoff by providing the granular, role-based authorization required to provide precise levels of access to the right users and applications.

"As Hadoop crosses over to the enterprise, it will be expected to deliver the same level of security for protecting sensitive data as any mission-critical data platform," said Tony Baer, principal analyst at Ovum. "With announcement of Sentry, Cloudera is addressing an important piece of the puzzle, especially with regard to role-based data access privileges below file level for SQL-on-Hadoop paths like Hive and Impala. It is one of the pieces that must fall into place if Hadoop is to fulfill its promise as a powerful extension of analytic computing environments."

Introducing Sentry: Fine-Grained Authorization for Advanced Hadoop Security
Sentry is the industry's first fine-grained authorization solution for Apache Hadoop, giving database administrators truly holistic, granular user access control that addresses the limitations of previous solutions. With open source authorization for SQL query engines like Apache Hive and Cloudera Impala, Sentry represents a quantum leap forward for Hadoop and Cloudera's Platform for Big Data, enabling enterprises in the public and private sectors to now leverage the power of Hadoop and remain compliant with regulatory requirements like HIPAA, SOX and PCI, among others. Features of the Sentry security module include:

  • Secure Authorization - Gives administrators the ability to prevent authenticated users from accessing data and/or having privileges on data.
  • Fine-grained Authorization: Grants Hadoop administrators unprecedented, comprehensive and precise control to specify user access rights to subsets of data within a database (e.g., blocking credit card numbers while keeping the rest of the data visible).
  • Role-based Authorization: Simplifies permissions management by allowing administrators to create and assign templatized privileges based on functional roles.
  • Multi-tenant Administration: Empowers central administrators to deputize individual administrators to manage security settings for each separate database or schema.
"Today's cyber security applications require a Big Data infrastructure that can deliver granular controls, especially in the federal space," said Wayne Wheeles, Six3 Systems' analytic, infrastructure and enrichment developer for cyber security. "But until now there has been concern over vulnerability of data in Hadoop. Sentry supports sensitivity to highly regulated data and fine-grained authorization, making Hadoop a mature, enterprise-ready environment for cyber security."

A 100% Open Source Authorization Solution Derived from HiveServer2
A primary committer to numerous projects in the Apache Hadoop stack, Cloudera has worked closely with the open source community to expand Hadoop's security capabilities, recently introducing enhanced security features in a new HiveServer2 release. Originally built and released to open source in 2012 by Cloudera as an Apache 2.0 licensed project, HiveServer2 delivers concurrency (enabling system access by multiple users simultaneously) and Kerberos-based authentication (the ability to confirm the identity of users and prevent unauthorized access) for Hadoop. However, until now users have had only two sub-optimal choices to support user authorization: insecure Advisory Authorization (wherein users grant themselves permissions, leaving the door open for malicious misuse of data), or coarse-grained HDFS Impersonation (which offers inflexible control at the file level, where users can either access files in their entirety or not at all).

It was a conundrum that left administrators with no definitive way to fortify and control user access to sensitive data, which can often be interspersed with non-sensitive data in the same file. In many cases, this issue has stalled Hadoop adoption in highly regulated industries, such as healthcare, financial services or government agencies, where fine-grained authorization is required for organizations to ensure regulatory compliance, while keeping non-sensitive data accessible to workers who need it.

"We originally built HiveServer2 to address issues of concurrency, authentication and authorization," said Charles Zedlewski, vice president, Products, Cloudera. "Sentry is the critical piece that provides true fine-grained authorization for Hadoop. As a top committer to Apache Hadoop and related open source projects, Cloudera is dedicated to supporting the continuing efforts of the Apache open source community, where Hive remains an important focus for us. We are proud to be donating our internal work on Sentry to open source under the Apache 2.0 license today, so it can be integrated with other frameworks across the stack. In the near future, we plan to submit the project for incubation with the Apache Software Foundation, where the entire community can contribute to its development and work together to further advance the security of the Hadoop platform for enterprise applications."

Utilizing the Apache Hive metastore, Sentry offers an extensible plug-in for HiveServer2 that expands the foundation for Hadoop security. Sentry's uniquely fine-grained authorization capabilities enable organizations to specify role-based access permissions with server, database, table, or view granularity. Never before possible in Hadoop, this granular level of security provisioning will enable enterprises to more easily secure all of their data -- regardless of where it lives.

Tom Reilly, Cloudera's CEO, recently joined the company with a strong enterprise security background. His aim for many years has been to help customers protect their brand, protect their customers and protect their operations through compliance. With that heritage in mind, Reilly said, "Security is a top priority for large enterprises that are increasingly using Hadoop to manage Big Data. Cloudera continues to lead the way in advancing the Hadoop platform for enterprise use. With Sentry and future releases in our product roadmap, we are continuing to address the complete security picture around Hadoop, delivering on our vision to make the platform safe and compliant for enterprise use, in even the most highly regulated industries."

Availability
Sentry is immediately available for free download as an add-on for CDH 4.3. It can be used in conjunction with Hive and Impala 1.1 and is supported as part of the base Cloudera Enterprise subscription. For more information about Sentry and Impala 1.1, visit: http://cloudera.com/content/cloudera/en/Campaign/introducing-sentry.html.

About Cloudera
Founded in 2008, Cloudera pioneered the business case for Hadoop with CDH, the world's most comprehensive, thoroughly tested and widely deployed 100% open source distribution of Apache Hadoop in both commercial and non-commercial environments. Now, the company is redefining data management with its Platform for Big Data, Cloudera Enterprise, empowering enterprises to Ask Bigger Questions™ and gain rich, actionable insights from all their data, to quickly and easily derive real business value that translates into competitive advantage. As the top contributor to the Apache open source community and leading educator of data professionals with the broadest array of Hadoop training and certification programs, Cloudera also offers comprehensive consulting services. Over 600 partners across hardware, software and services have teamed with Cloudera to help meet organizations' big data goals. With tens of thousands of nodes under management and hundreds of customers across diverse markets, Cloudera is the category leader that has set the standard for Hadoop in the enterprise. www.cloudera.com

Connect with Cloudera
Read our blog: http://www.cloudera.com/blog/
Follow us on Twitter: http://twitter.com/cloudera
Visit us on Facebook: http://www.facebook.com/cloudera

Contact Information

Press Contacts

North America
Hope Nicora
Bhava Communications for Cloudera
cloudera@bhavacom.com
+1-510-984-1527

Europe
Richard Botley
Ketchum for Cloudera
LON-Cloudera@ketchum.com
+44 (0) 20 7611 3788