Amazon EMR Notebooks, a managed environment based on Jupyter and Jupyter-lab notebooks, enables users to interactively analyze and visualize data, collaborate with peers, and build applications using EMR clusters. EMR Notebooks is designed for Apache Spark. It supports Spark Magic kernels, which allows you to remotely run queries and code on your EMR cluster using languages like PySpark, Spark SQL, Spark R, and Scala.

With EMR Notebooks, there is no software or instances to manage. You can either attach the notebook to an existing cluster or provision a new cluster directly from the console. You can attach multiple notebooks to a single cluster, and detach notebooks and re-attach them to new clusters.

EMR Notebooks allows you to:
  1. Monitor and debug Spark jobs directly from your notebook.
  2. Install notebook-scoped libraries on a running EMR cluster.
  3. Associate Git repositories with your notebook for version control, and to simplify code collaboration and reuse.
  4. Compare and merge two notebooks using the nbdime utility

There is no additional cost for using EMR Notebooks. You only pay for the EMR cluster attached to the notebook. It’s easy to create multiple notebooks directly from the EMR console. Follow this step-by-step tutorial to get started.

Interesting in learning more? Fill the form below to request a briefing with an Amazon EMR Specialist.

“By leveraging Redshift Spectrum's ability to query data directly into our Amazon S3 data lake, we have been able to easily integrate new data sources in hours, not days or weeks. This has not only reduced our time to insight, but helped us control our infrastructure costs.”


Elliott Cordo

VP of Data Analytics, Equinox Fitness

Resources

  • Blog

    EMR Notebooks: A managed analytics environment based on Jupyter notebooks.

  • Tutorial

    Associate Git repositories with EMR Notebooks

  • Blog

    Install Python libraries on a running cluster with EMR Notebooks.