Join us for the AWS Dev Day Online on Data and Analytics!
In this one-day online event you will learn about our latest developments in Data and Analytics.
What can you expect
There are five sessions which you can attend that cover different areas of Data and Analytics on AWS. You can check out the agenda below.
Who should attend
This content is for Data Engineers, or for developers, dev team managers, and solution architects keen on gaining Deeper Knowledge of the AWS Data & Analytics services. The sessions are level 300 for advanced users.
Analyze & Explore Data Without Managing Servers
In this session we start our data journey, build a serverless data lake and look at the tools and services that allow you to go from raw data to a dataset ready for analysis, including data exploration and preparation, ETL, data catalog, in-place SQL queries, and reporting.
Featured services: AWS Glue DataBrew, AWS Glue Crawlers/Catalog, Amazon Athena, Amazon Quicksight
Speaker: Alex Cassalboni, Snr Developer Advocate, Italy, AWS
Automating your ELT Workflows
ELT using tools like Presto or Athena is a very popular way of ingesting and transforming your data with just SQL. However, automating the end-to-end workflow can be a bit tricky. In this session we will show how to use Apache Airflow in combination with Presto on top of EMR, or with Amazon Athena, to build robust ELT workflows.
Featured open source: Apache Airflow, Presto.
Featured services: Amazon MWAA, Amazon Athena, Amazon EMR)
Speaker: Ricardo Sueiras, Prin. Advocate, Open Source, AWS
Beyond the immutable data lake and into the Data Lakehouse
We already know traditional data lakes are very flexible and scalable, but they are designed for immutable historical data, rather than to enable fast queries to changing data. That's why the industry is shifting to the concept of a Data Lake House, which enables you to query data across your data warehouse, data lake, and operational databases to gain faster and deeper insights. In this session we will show how a modern data lake on AWS can support transactional workloads, and can seamlessly work with Redshift to provide an efficient Data Lake House.
Featured services: Lake Formation, S3, Redshift. A stretch goal is adding AWS Glue Elastic Views or federated redshift queries if possible
Speaker: Suman Debnath, Principal Developer Advocate, AWS
Deep Dive on serverless streaming analytics with SQL
Batch data processing is very powerful, but working with streaming data has its own challenges. In this session we will show you how to combine Amazon Kinesis Data Streams and Amazon Kinesis Data Analytics for scenarios like user sessionization, rolling metrics, or anomaly detection in real-time using just SQL and without having to worry about managing any servers.
Featured services: Kinesis Data Streams and Kinesis Data Analytics for SQL (Might add Quicksight for visualization, but focus would be on advanced streaming SQL usage)
Speaker: Donnie Prakoso, Senior Developer Advocate, AWS
Real-time analytics with Apache Kafka & Flink
Sometimes you need to do poweful transforms and analytics on top of your streaming data, and SQL might not be enough. Luckily, we can use open source tools like Apache Kafka, Apache Flink, and Open Distro for Elasticsearch to achieve state-of-the-art results. In this session we will demo how to implement a clickstream analitics pipeline using the Amazon-managed versions of those three products.
Featured open source: Apache Kafka, Apache Flink, Open Distro for Elasticsearch.
Featured services: MSK, Kinesis Data Analytics for Apache Flink, Amazon Elasticsearch service
Speaker: Rajeev Chakrabarti, Principal Streaming Architect, AWS