AWS Lake Formation makes it easy for customers to build, secure, and manage data lakes. You can use Lake Formation to centrally define security, governance, and auditing policies in one place, versus doing these tasks per service. You can enforce these policies for your users across their analytics applications.

Beginning with EMR 5.26, EMR has a (beta) integration with Lake Formation, allowing you to enforce Lake Formation policies for Spark SQL through EMR Notebooks and Apache Zeppelin.

When combined with EMR’s SAML-based single sign-on (SSO) feature, you can securely run Spark applications on shared multi-tenant clusters with column-level access to data stored in S3.

EMR clusters enabled with Lake Formation offer the following benefits:
  1. Database, table, and column-level access controls for Spark SQL
  2. SSO integration with EMR Notebooks and Apache Zeppelin

Launching an EMR cluster with Lake Formation is easy. Follow this step-by-step tutorial to get started.

Interested in learning more? Fill the form below to request a briefing with an Amazon EMR specialist.

“We wanted to create a data platform with the ability to manage the security settings for all the different applications in our environment. With AWS Lake Formation, we can now define policies once and enforce them in the same way, everywhere, for multiple services we use. The enhanced level of control gives us secure access to data and meta-data for columns and tables, not just for bulk objects, which is an important part of our data security and governance standard.”


Anand Desikan

Director of Cloud and Data Services, Panasonic

Resources

  • Tutorial

    Learn how to integrate Amazon EMR with AWS Lake Formation (Beta).

  • Tutorial

    Step-by-step tutorial to learn how to launch an EMR cluster with Lake Formation using the EMR console.

  • Tutorial

    Step-by-step tutorial to learn how to use EMR Notebooks with Lake Formation.