Monday, September 9, 2024

Data Engineering with AWS (Nanodegree Program)

Colleagues, in the “Data Engineering with AWS - Nanodegree Program you will learn to design data models, build data warehouses and data lakes, automate data pipelines, and work with massive datasets. Skill-based courses include: 1) Data Modeling - create relational and NoSQL data models to fit the diverse needs of data consumers. Use ETL to build databases in PostgreSQL and Apache Cassandra, Introduction to Data Modeling - understand the purpose of data modeling, the strengths and weaknesses of relational databases, and create schemas and tables in Postgres, 3) NoSQL Data Models - when to use non-relational databases based on the data business needs, their strengths and weaknesses, and how to creates tables in Apache Cassandra (Project: Data Modeling with Apache Cassandra); 4) Cloud Data Warehouses - create cloud-based data warehouses. You’ll sharpen your data warehousing skills, deepen your understanding of data infrastructure, and be introduced to data engineering on the cloud using Amazon Web Services (AWS) - Introduction to Cloud Data Warehouses, Introduction to Data Warehouses, you'll be introduced to the business case for data warehouses as well as architecture, extracting, transforming, and loading data, data modeling, and data warehouse technologies, 5) ELT and Data Warehouse Technology in the Cloud - learn about ELT, the differences between ETL and ELT, and general cloud data warehouse technologies, 6) AWS Data Warehouse Technologies - to set up Amazon S3, IAM, VPC, EC2, and RDS. You'll build a Redshift data warehouse cluster and learn how to interact with it, 6) Implementing a Data Warehouse on AWS - implement a data warehouse on AWS (Project: Data Warehouse. You will build an ETL pipeline that extracts data from S3, stages data in Redshift, and transforms data into a set of dimensional tables for an analytics team); 7) Spark and Data Lakes - learn about the big data ecosystem and how to use Spark to work with massive datasets. You’ll also learn about how to store big data in a data lake and query it with Spark. Introduction to Spark and Data Lakes - learn how Spark evaluates code and uses distributed computing to process and transform data. You'll work in the big data ecosystem to build data lakes and data lake houses, 8) Big Data Ecosystem, Data Lakes, and Spark - learn about the problems that Apache Spark is designed to solve. You'll also learn about the greater Big Data ecosystem and how Spark fits into it, 9) Spark Essentials - use Spark for wrangling, filtering, and transforming distributed data with PySpark and Spark SQL - Using Spark in AWS, learn to use Spark and work with data lakes with Amazon Web Services using S3, AWS Glue, and AWS Glue Studio, 10) Ingesting and Organizing Data in a Lakehouse. In this lesson you'll work with Lakehouse zones. You will build and configure these zones in AWS (Project: STEDI Human Balance Analytics - work with sensor data that trains a machine learning model. You'll load S3 JSON data from a data lake into Athena tables using Spark and AWS Glue, 11) Automate Data Pipelines. In this course, you'll build pipelines leveraging Airflow DAGs to organize your tasks along with AWS resources such as S3 and Redshift, 12) Automating Data Pipelines - build data pipelines, 13) Data Pipelines. In this lesson, you'll learn about the components of a data pipeline including Directed Acyclic Graphs (DAGs). You'll practice creating data pipelines with DAGs and Apache Airflow, 14) Airflow and AWS - create connections between Airflow and AWS first by creating credentials, then copying S3 data, leveraging connections and hooks, and building S3 data to the Redshift DAG, 15) Data Quality - track data lineage and set up data pipeline schedules, partition data to optimize pipelines, investigating Data Quality issues, and write tests to ensure data quality, 16) Production Data Pipelines - build Pipelines with maintainability, reusability and monitoring,  in mind. They will also learn about pipeline monitoring (Project: Data Pipelines - work on a music streaming company’s data infrastructure by creating and automating a set of data pipelines with Airflow, monitoring and debugging production pipelines. 

Enroll today (teams & executives are welcome): https://tinyurl.com/4pxyw9vt


Download your free Data Science  - Career Transformation Guide.


Explore our Data-Driven Organizations Audible and Kindle book series on Amazon:


1 - Data-Driven Decision-Making  (Audible) (Kindle)


2 - Implementing Data Science Methodology: From Data Wrangling to Data Viz (Audible) (Kindle)


Much career success, Lawrence E. Wilson - Data Science Academy (share with your team) https://tinyurl.com/hh7bf4m9 

No comments:

Post a Comment

Become a Probability and Statistics Master (training)

Colleagues, in the “ Become a Probability and Statistics Master ” program you will learn everything from Probability and Statistics, then te...