In this hands-on, 1-hour project-based course, you will master real-time data processing using Apache Spark Structured Streaming. This course is designed for data engineers and developers who want to gain practical experience in building streaming data pipelines. You will begin by setting up the Spark environment and learn how to configure micro-batches and fault tolerance mechanisms through checkpointing. Next, you’ll dive into transforming streaming data by applying filters, maps, and aggregations to extract meaningful insights. You'll also handle out-of-order data with watermarks, ensuring the accuracy of your real-time analytics. The course will introduce you to querying streaming data using SQL, allowing you to perform transformations and aggregations on live data. Finally, you will learn to deploy your streaming pipeline to production by writing results to an external sink like Parquet files. This is an intermediate level project and in order to succeed in this course it is recommended to have basic understanding of Apache Spark and API PySpark, proficiency in programming and big data as well and some basic knowledge on writing SQL queries. This is the perfect opportunity for anyone looking to dive into real-time data processing and Spark Structured Streaming!


您将学到什么
Set up and configure a real-time data processing pipeline
Perform transformations, aggregations, and SQL queries on streaming data
Implement fault-tolerance mechanisms and ensure the pipeline remains resilient under high workloads and data inconsistencies
您将练习的技能
要了解的详细信息

添加到您的领英档案
仅桌面可用
了解顶级公司的员工如何掌握热门技能

在不到 2 个小时的时间内学习、练习和应用为就业做好准备的技能
- 接受行业专家的培训
- 获得解决实训工作任务的实践经验
- 使用最新的工具和技术来建立信心

关于此指导项目
分步进行学习
在与您的工作区一起在分屏中播放的视频中,您的授课教师将指导您完成每个步骤:
Task 1: Setting Up the Environment for Real-Time Data Streaming
Task 2: Managing Triggers and Checkpoints
Task 3: Transforming Streaming Data
Practice Activity
Task 4: Performing Transformations, Aggregations, and Advanced SQL Queries
Task 5: Writing and Deploying the Pipeline
Cumulative Challenge
推荐体验
Experience with Apache Spark and API Pyspark. Python coding skills. SQL query proficiency. Big Data concepts. Kafka basics.
7个项目图片
位教师

学习方式
基于技能的实践学习
通过完成与工作相关的任务来练习新技能。
专家指导
使用独特的并排界面,按照预先录制的专家视频操作。
无需下载或安装
在预配置的云工作空间中访问所需的工具和资源。
仅在台式计算机上可用
此指导项目专为具有可靠互联网连接的笔记本电脑或台式计算机而设计,而不是移动设备。
人们为什么选择 Coursera 来帮助自己实现职业发展




您可能还喜欢
- 状态:免费试用
- 状态:免费试用
École Polytechnique Fédérale de Lausanne
常见问题
购买指导项目后,您将获得完成指导项目所需的一切,包括通过 Web 浏览器访问云桌面工作空间,工作空间中包含您需要了解的文件和软件,以及特定领域的专家提供的分步视频说明。
由于您的工作空间包含适合笔记本电脑或台式计算机使用的云桌面,因此指导项目不在移动设备上提供。
指导项目授课教师是特定领域的专家,他们在项目的技能、工具或领域方面经验丰富,并且热衷于分享自己的知识以影响全球数百万的学生。