AWS Lake Formation Preview

AWS Lake Formation

Try our preview capabilities: Transactions, Row-level Security, and Acceleration

Sign up for an invitation to preview

AWS Lake Formation is now available in preview. You will need an AWS Account Number in order to request access. Please submit the information below to request an invitation to the preview. We will contact you with instructions if you are accepted.

Sign up to preview three new capabilities in AWS Lake Formation: transactions for concurrent updates and consistent query results, row-level security policies for granular access control, and accelerated access though inline filtering, aggregations, and automatic file compaction.

Transactions - Insert, delete, and modify rows concurrently

Data lakes need to show users the correct view of data at all times, even while there are simultaneous real-time or frequent updates to the data. A common pattern in data lakes is to organize data into tables comprised of rows that can include structured or semi-structured data. To load streaming data or quickly incorporate changes from source data systems, you need to insert, delete, and modify rows across multiple tables in parallel. Today, developers write custom application code or use open source tools to manage these updates. These solutions are complex and difficult to scale because writing application code that maintains consistency when concurrently reading and writing the same data is tedious, brittle, and error prone.

AWS Lake Formation introduces new APIs that support atomic, consistent, isolated, and durable (ACID) transactions using a new data lake table type, called a ‘governed table.’ A governed table allows multiple users to concurrently insert, delete, and modify rows across tables, while still allowing other users to simultaneously run analytical queries and machine learning (ML) models on the same data sets that return consistent and up-to-date results. The ability to update and delete individual rows in governed tables, like a row (record) of customer data after they have asked to be forgotten, helps users comply with “right to be forgotten” provisions in privacy laws like GDPR and CCPA.

Row-level security

Making sure users have access to only the right data in a data lake is difficult. Some users need access to all data within a dataset, while other users are restricted from seeing columns of sensitive information like social security numbers or rows of data like sales records from other regions. Data lake administrators often maintain multiple copies of data to apply different security policies for different users. This adds complexity, operational overhead, and extra storage costs.

AWS Lake Formation already allows you to set access policies to hide data, such as hiding a column with social security numbers, from users who do not have permission to view that data. With row-level security, you can now set row-level policies in addition to column-level policies. For example, you can now set a policy that gives a regional sales manager access to only the sales data for their region.

Acceleration - Better performance with filtering, aggregations, and automatic file compaction

Analytics performance, at times, can be impacted by inefficient storage of many small files that are automatically created as new data is written to the data lake. Processing these many small files creates additional overhead for analytics services and causes slower query responses.

With this preview, Lake Formation includes a new storage optimizer that automatically combines small files into larger files to speed up queries by up to 7x. This process, commonly known as compaction, is performed in the background so that there is no performance impact on your production workloads while this is taking place.

Preview support

In the preview, these new capabilities are available via new, open, and public update and access APIs for data lakes. Once generally available, these APIs may be used by AWS services, third parties, and custom applications that directly read from and write to Amazon S3 data lakes.

Please submit the information below to request an invitation to the preview. We will contact you with instructions if you are approved.