How to Build a Data Lake in Amazon S3 & Amazon Glacier


ON-DEMAND


How to Build a Data Lake in Amazon S3 & Amazon Glacier



Broadcast Date:
February 1, 2018

Level 200 | Service How To
In this session, we discuss best practices for data ingestion, storage, cataloging and analysis on Amazon object storage services. We examine ways to reduce or eliminate costly extract, transform, and load (ETL) processes using query-in-place technology, such as Amazon S3 Select, Amazon Glacier Select, Amazon Athena, and Amazon Redshift Spectrum. We also review custom analytics integration using Apache Spark, Apache Hive, Presto, and other technologies in Amazon EMR.

Learning Objectives:
• Understand the options for building an analytics platform that leverages Amazon S3 & Amazon Glacier
• Learn about the key considerations for ETL and other core analytics functions
• Determine if query-in-place capabilities like Amazon S3 Select, Amazon Glacier Select, Amazon Athena, and Amazon Redshift Spectrum are a good fit for your use case

Suited For: Storage Administrators, Data Scientists, Analytics Professionals

Speaker(s): PD Dutta, Sr. Product Manager, Amazon S3, AWS


Having trouble with this page? Please email us at aws-webcasts@amazon.com

Download the Slide Deck