e.g. This is primarily aimed at first- and second-year undergraduates interested in engineering or science, along with high school students and professionals with an interest in programmingGain the skills for building efficient and scalable data pipelines. Explore essential data engineering platforms (Hadoop, Spark, and Snowflake) as well as learn how to optimize and manage them. Delve into Databricks, a powerful platform for executing data analytics and machine learning tasks, while honing your Python data science skills with PySpark. Finally, discover the key concepts of MLflow, an open-source platform for managing the end-to-end machine learning lifecycle, and learn how to integrate it with Databricks.
This course is designed for learners who want to pursue or advance their career in data science or data engineering, or for software developers or engineers who want to grow their data management skill set. In addition to the technologies you will learn, you will also gain methodologies to help you hone your project management and workflow skills for data engineering, including applying Kaizen, DevOps, and Data Ops methodologies and best practices.
With quizzes to test your knowledge throughout, this comprehensive course will help guide your learning journey to become a proficient data engineer, ready to tackle the challenges of today's data-driven world.
In this module, you will learn how to work with different data engineering platforms, such as Hadoop and Spark, and apply their concepts to real-world scenarios. First, you will explore the fundamentals of Hadoop to store and process big data. Next, you will delve into Spark concepts, distributed computing, deferred execution, and Spark SQL. By the end of the week, you will gain hands-on experience with PySpark DataFrames, DataFrame methods, and deferred execution strategies.
涵盖的内容
10个视频10篇阅读材料7个作业1个讨论话题2个非评分实验室
显示有关单元内容的信息
10个视频•总计25分钟
Meet your Co-Instructor: Kennedy Behrman•1分钟
Meet your Co-Instructor: Noah Gift•1分钟
Overview of Big Data Platforms•2分钟
Getting Started with Hadoop•1分钟
Getting Started with Spark•2分钟
Introduction to Resilient Distributed Datasets (RDD)•2分钟
Resilient Distributed Datasets (RDD) Demo•4分钟
Introduction to Spark SQL•2分钟
PySpark Dataframe Demo: Part 1•3分钟
PySpark Dataframe Demo: Part 2•7分钟
10篇阅读材料•总计100分钟
Welcome to Data Engineering Platforms with Python!•10分钟
Report a problem with the course•10分钟
What is Apache Hadoop?•10分钟
What is Apache Spark?•10分钟
Use Apache Spark in Azure Databricks (optional)•10分钟
Choosing between Hadoop and Spark•10分钟
What are RDDs?•10分钟
Getting Started: Creating RDD's with PySpark•10分钟
Spark SQL, Dataframes and Datasets•10分钟
PySpark and Spark SQL•10分钟
7个作业•总计210分钟
Big Data Platforms•30分钟
Apache Hadoop Concepts•30分钟
Apache Spark Concepts•30分钟
RDD Concepts•30分钟
Spark SQL Concepts•30分钟
PySpark Dataframe Concepts•30分钟
PySpark•30分钟
1个讨论话题•总计10分钟
Meet and Greet (optional)•10分钟
2个非评分实验室•总计120分钟
Practice: Creating RDD's with PySpark•60分钟
Practice: Reading Data into Dataframes•60分钟
Snowflake
第 2 单元•小时 后完成
单元详情
In this module, you will explore the Snowflake platform, gaining insights into its architecture and key concepts. Through hands-on practice in the Snowflake Web UI, you'll learn to create tables, manage warehouses, and use the Snowflake Python Connector to interact with tables. By the end of this week, you'll solidify your understanding of Snowflake's architecture and practical applications, emerging with the ability to effectively navigate and leverage the platform for data management and analysis.
涵盖的内容
8个视频5篇阅读材料6个作业
显示有关单元内容的信息
8个视频•总计27分钟
What is Snowflake?•2分钟
Snowflake Layers•2分钟
Snowflake Web UI•4分钟
Navigating Snowflake•4分钟
Creating a Table in Snowflake•5分钟
Snowflake Warehouses•4分钟
Writing to Snowflake•3分钟
Reading from Snowflake•3分钟
5篇阅读材料•总计50分钟
Accessing Snowflake•10分钟
Detailed View Inside Snowflake•10分钟
Snowsight: The Snowflake Web Interface•10分钟
Working with Warehouses•10分钟
Python Connector Documentation•10分钟
6个作业•总计180分钟
Snowflake Architecture•30分钟
Snowflake Layers•30分钟
Navigating Snowflake•30分钟
Creating a Table•30分钟
Writing to Snowflake•30分钟
Snowflake•30分钟
Azure Databricks and MLFLow
第 3 单元•小时 后完成
单元详情
In this module, you will practice the essential skills for seamlessly managing machine learning workflows using Databricks and MLFlow. First, you will create a Databricks workspace and configure a cluster, setting the stage for efficient data analysis. Next, you will load a sample dataset into the Databricks workspace using the power of PySpark, enabling data manipulation and exploration. Finally, you will install MLFlow either locally or within the Databricks environment, gaining the ability to orchestrate the entire machine learning lifecycle. By the end of this week, you will be able to craft, track, and manage machine learning experiments within Databricks, ensuring precision, reproducibility, and optimal decision-making throughout your data-driven journey.
涵盖的内容
16个视频7篇阅读材料4个作业1个非评分实验室
显示有关单元内容的信息
16个视频•总计72分钟
Accessing Databricks•1分钟
Spark Notebooks with Databricks•5分钟
Using Data with Databricks•5分钟
Working with Workspaces in Databricks•3分钟
Advanced Capabilities of Databricks•2分钟
PySpark Introduction on Databricks•7分钟
Exploring Databricks Azure Features•4分钟
Using the DBFS to AutoML Workflow•4分钟
Load, Register and Deploy ML Models•3分钟
Databricks Model Registry•3分钟
Model Serving on Databricks•2分钟
What is MLOps?•13分钟
Exploring Open-Source MLFlow Frameworks•6分钟
Running MLFlow with Databricks•6分钟
End to End Databricks MLFlow•4分钟
Databricks Autologging with MLFlow•4分钟
7篇阅读材料•总计70分钟
What is Azure Databricks?•10分钟
Introduction to Databricks Machine Learning•10分钟
What is the Databricks File System (DBFS)?•10分钟
Serverless Compute with Databricks•10分钟
MLOps Workflow on Azure Databricks•10分钟
Run MLFlow Projects on Azure Databricks•10分钟
Databricks Autologging•10分钟
4个作业•总计120分钟
PySpark SQL•30分钟
PySpark DataFrames•30分钟
MLFlow with Databricks•30分钟
DataBricks•30分钟
1个非评分实验室•总计60分钟
ETL-Part-1: Keyword Extractor Tool to HashTag Tool •60分钟
DataOps and Operations Methodologies
第 4 单元•小时 后完成
单元详情
In this module, you will explore the concepts of Kaizen, DevOps, and DataOps and how these methodologies synergistically contribute to efficient and seamless data engineering workflows. Through practical examples, you will learn how Kaizen's continuous improvement philosophy, DevOps' collaborative practices, and DataOps' focus on data quality and integration converge to enhance the development, deployment, and management of data engineering platforms. By the end of this week, you will have the knowledge and perspective needed to optimize data engineering processes and deliver scalable, reliable, and high-quality solutions.
涵盖的内容
21个视频7篇阅读材料4个作业1个非评分实验室
显示有关单元内容的信息
21个视频•总计502分钟
Kaizen Methodology for Data•4分钟
Introducing GitHub CodeSpaces•9分钟
Compiling Python in GitHub Codespaces•18分钟
Walking through Sagemaker Studio Lab•29分钟
Pytest Master Class (Optional)•166分钟
What is DevOps?•2分钟
DevOps Key Concepts•36分钟
Continuous Integration Overview•32分钟
Build an NLP in Cloud9 with Python•43分钟
Build a Continuously Deployed Containerized FastAPI Microservice•44分钟
Hugo Continuous Deploy on AWS•19分钟
Container Based Continuous Delivery•9分钟
What is DataOps?•1分钟
DataOps and MLOps with Snowflake•62分钟
Building Cloud Pipelines with Step Functions and Lambda•17分钟
What is a Data Lake?•2分钟
Data Warehouse vs. Feature Store•2分钟
Big Data Challenges•1分钟
Types of Big Data Processing•1分钟
Real-World Data Engineering Pipeline•2分钟
Data Feedback Loop•1分钟
7篇阅读材料•总计70分钟
GitHub Codespaces Overview•10分钟
Getting Started with Amazon SageMaker Studio Lab•10分钟
Teaching MLOps at Scale with GitHub (Optional)•10分钟
Getting Started with DevOps and Cloud Computing•10分钟
Duke University has about 13,000 undergraduate and graduate students and a world-class faculty helping to expand the frontiers of knowledge. The university has a strong commitment to applying knowledge in service to society, both near its North Carolina campus and around the world.
When will I have access to the lectures and assignments?
To access the course materials, assignments and to earn a Certificate, you will need to purchase the Certificate experience when you enroll in a course. You can try a Free Trial instead, or apply for Financial Aid. The course may offer 'Full Course, No Certificate' instead. This option lets you see all course materials, submit required assessments, and get a final grade. This also means that you will not be able to purchase a Certificate experience.
What will I get if I subscribe to this Specialization?
When you enroll in the course, you get access to all of the courses in the Specialization, and you earn a certificate when you complete the work. Your electronic Certificate will be added to your Accomplishments page - from there, you can print your Certificate or add it to your LinkedIn profile.
Is financial aid available?
Yes. In select learning programs, you can apply for financial aid or a scholarship if you can’t afford the enrollment fee. If fin aid or scholarship is available for your learning program selection, you’ll find a link to apply on the description page.