PySpark in Action: Hands-on Data Processing is a practical course that equips you to work confidently with large-scale data using PySpark and distributed data processing frameworks. You’ll discover the fundamentals of Big Data, Apache Hadoop, and Apache Spark, then build on this knowledge through real-world exercises where you’ll process and analyze massive datasets.


您将学到什么
Explore the fundamental concepts of Big Data and the components of the Hadoop ecosystem.
Explain the architecture and key principles of Apache Spark and its role in big data processing.
Utilize RDD transformations and actions to effectively process large-scale datasets with PySpark.
Execute advanced DataFrame operations, including data manipulation and aggregation techniques.
您将获得的技能
要了解的详细信息

添加到您的领英档案
17 项作业
了解顶级公司的员工如何掌握热门技能

积累特定领域的专业知识
- 向行业专家学习新概念
- 获得对主题或工具的基础理解
- 通过实践项目培养工作相关技能
- 获得可共享的职业证书

该课程共有5个模块
This module introduces you to the fundamental concepts of Big Data and Hadoop. You will explore the Hadoop ecosystem, its components, and the Hadoop Distributed File System (HDFS), setting the foundation for understanding big data processing and storage solutions.
涵盖的内容
15个视频5篇阅读材料4个作业1个讨论话题1个插件
Dive into the core of PySpark by learning about Resilient Distributed Datasets (RDDs). This module covers the fundamentals of RDDs, how they work, and their key transformations and actions, enabling efficient distributed data processing in PySpark.
涵盖的内容
25个视频4篇阅读材料4个作业3个讨论话题
This module covers the creation and manipulation of DataFrames in PySpark. You will learn how to perform basic and advanced operations, including aggregation, grouping, and handling missing data, with a focus on optimizing large-scale data processing tasks.
涵盖的内容
22个视频4篇阅读材料4个作业1个讨论话题
In this module, you will explore the SQL capabilities of PySpark. Learn how to perform CRUD operations, execute SQL commands, and merge and aggregate data using PySpark SQL. You'll also discover best practices for using SQL with PySpark to enhance data workflows.
涵盖的内容
28个视频4篇阅读材料4个作业2个讨论话题
This module is meant to test how well you understand the different ideas and lessons you've learned in this course. You will undertake a project based on these PySpark concepts and complete a comprehensive quiz that will assess your confidence and proficiency in Data Processing with PySpark.
涵盖的内容
1个视频1篇阅读材料1个作业1个讨论话题1个插件
获得职业证书
将此证书添加到您的 LinkedIn 个人资料、简历或履历中。在社交媒体和绩效考核中分享。
从 Data Analysis 浏览更多内容
- 状态:预览
Edureka
- 状态:免费试用
Edureka
- 状态:免费
Coursera Project Network
- 状态:免费试用
人们为什么选择 Coursera 来帮助自己实现职业发展




常见问题
You will need access to a computer with Python and Apache Spark installed. Detailed setup instructions will be provided at the beginning of the course.
This course is designed for individuals new to big data and PySpark, providing a solid foundation to start working with distributed data processing.
While prior SQL knowledge is beneficial, it is not mandatory. The course will introduce SQL concepts as they relate to PySpark and provide practice with SQL queries.
更多问题
提供助学金,