What will l learn from this course?

<text variant="body1">Throughout this course, you will be able to familiarize yourself with topics such as Big Data, Working with Hadoop, working with Spark, Spark architecture, and Data processing implementation with PySpark.

What are the prerequisites for this course?

<text variant="body1">This is an introductory course designed for absolute beginners. While prior knowledge of Python is advantageous, participation is not mandatory.

What is this course about?

<text variant="body1">This course offers comprehensive insights into Data Processing with PySpark. This course is designed to empower learners with the knowledge and skills needed to get started with Data processing with PySpark.

Who is this course designed for?

<text variant="body1">This course caters to a diverse audience, embracing those new to the field as Freshers. Data Analysts and Data Scientists will enhance their skills in Big data Processing, while Data Engineers will gain insights into seamless Spark architecture and data processing with PySpark.

What will I get if I purchase the Certificate?

When you purchase a Certificate you get access to all course materials, including graded assignments. Upon completing the course, your electronic Certificate will be added to your Accomplishments page - from there, you can print your Certificate or add it to your LinkedIn profile.

Is financial aid available?

Yes. In select learning programs, you can apply for financial aid or a scholarship if you can’t afford the enrollment fee. If fin aid or scholarship is available for your learning program selection, you’ll find a link to apply on the description page.

Introduction to PySpark

位教师：Edureka

7,119 人已注册

包含在中

了解更多

1个模块

深入了解一个主题并学习基础知识。

51 条评论

初级等级

推荐体验

4 小时完成

灵活的计划

自行安排学习进度

1个模块

深入了解一个主题并学习基础知识。

51 条评论

初级等级

推荐体验

4 小时完成

灵活的计划

自行安排学习进度

您将学到什么

Understand PySpark fundamentals to process big data efficiently using Python APIs.
Apply real-time data processing techniques for actionable insights.
Explore Spark architecture for distributed computing and scalability.
Build hands-on skills with PySpark through practical assignments.

您将获得的技能

您将学习的工具

要了解的详细信息

可分享的证书

添加到您的领英档案

作业

5 项作业

授课语言：英语（English）

91%

of learners achieved a positive career outcome

了解顶级公司的员工如何掌握热门技能

了解关于 Coursera for Business 的更多信息

Petrobras, TATA, Danone, Capgemini, P&G 和 L'Oreal 的徽标

该课程共有1个模块

Welcome to Introduction to PySpark, a short course strategically crafted to empower you with the skills needed to assess the concepts of Big Data Management and efficiently perform data analysis using PySpark. Throughout this short course, you will acquire the expertise to perform data processing with PySpark, enabling you to efficiently handle large-scale datasets, conduct advanced analytics, and derive valuable insights from diverse data sources.

During this short course, you will explore the industry-specific applications of PySpark. By the end of this course, you will be able to: 1. Attain a basic understanding of the introduction of big data, including its characteristics, challenges, and importance in modern data-driven environments. 2. Familiarize with Spark architecture and its components, such as Spark Core and Spark SQL. 3. Familiarize with distributed computing concepts and how they apply to Spark's parallel processing model. 4. Explore PySpark and big data concepts to solve data-related challenges. 5. Write PySpark code to solve real-world data analysis and processing tasks. This short course is designed for Data Analysts, Data Engineers, Data Scientists, and Big Data Developers seeking to enhance their skills in utilizing PySpark for data processing and analysis. Prior experience with Python and Hadoop is beneficial but not mandatory for this course. Join us on this journey to enhance your PySpark skills and elevate your analytical and design capabilities.

Welcome to Introduction to PySpark. In this short course, you will learn the fundamental concepts of PySpark and Bigdata, and learn to perform real-time data processing with PySpark to gain useful insights from the data.

涵盖的内容

27个视频7篇阅读材料5个作业2个讨论话题

27个视频总计128分钟

Course Introduction3分钟
What is Big Data?4分钟
Applications of Big Data5分钟
What is Hadoop?5分钟
Hadoop ecosystem2分钟
Working of HDFS5分钟
Introduction to Apache spark7分钟
Apache Spark Architecture7分钟
Master-slave Architecture2分钟
Data Processing with Apache Spark6分钟
Introduction to Directed Acyclic Graph (DAG)5分钟
Introduction to Spark ecosystem5分钟
What is PySpark?5分钟
Key features of Pyspark7分钟
Basics of Python6分钟
Introduction to Data frames in spark5分钟
Applications of Dataframes 2分钟
Basic PySpark operations5分钟
Basic PySpark operations (hands-on)3分钟
DataFrame operations: Selecting, Filtering, Aggregating5分钟
DataFrame operations: Selecting, Filtering, Aggregating (hands-on)5分钟
Advanced DataFrame operations: Joins, Grouping, Sorting4分钟
Advanced DataFrame operations: Joins, Grouping, Sorting (hands-on)6分钟
Initiating Spark session7分钟
Getting data insights using PySpark3分钟
Getting sales patterns using PySpark7分钟
Course Summary of Introduction to PySpark2分钟

7篇阅读材料总计43分钟

Course Overview6分钟
Case study on Hadoop5分钟
Spark SQL5分钟
Introduction to Python7分钟
PySpark RDD5分钟
Discover more about PySpark dataframes5分钟
Practice Project : Welmart sales insights10分钟

5个作业总计32分钟

Module-End Quiz: Advanced Analytics with Spark and Python20分钟
Practice Quiz: Big Data Essentials3分钟
Practice Quiz: Apache Spark Fundamentals3分钟
Practice Quiz: PySpark & Python Programming3分钟
Practice Quiz: Spark Optimization and DataFrames3分钟

2个讨论话题总计5分钟

Introduce Yourself2分钟
Describe Your Learning Journey3分钟

位教师

授课教师评分

(7个评价)

Edureka

191 门课程176,539 名学生

提供方

Edureka

从 Software Development 浏览更多内容

EDUCBA
PySpark & Python: Hands-On Guide to Data Processing
课程
EDUCBA
PySpark: Apply & Analyze Advanced Data Processing
课程
Coursera
PySpark Foundations: Process, analyze, and summarize data
指导项目
EDUCBA
Spark and Python for Big Data with PySpark
专项课程

人们为什么选择 Coursera 来帮助自己实现职业发展

Felipe M.

自 2018开始学习的学生

''能够按照自己的速度和节奏学习课程是一次很棒的经历。只要符合自己的时间表和心情，我就可以学习。'

Jennifer J.

自 2020开始学习的学生

''我直接将从课程中学到的概念和技能应用到一个令人兴奋的新工作项目中。'

Larry W.

自 2021开始学习的学生

''如果我的大学不提供我需要的主题课程，Coursera 便是最好的去处之一。'

Chaitanya A.

''学习不仅仅是在工作中做的更好：它远不止于此。Coursera 让我无限制地学习。'

通过订阅解锁 10,000 多门课程的访问权限
通过在线学位推动您的职业生涯
获取世界一流大学的学位 - 100% 在线
加入全球超过 4,700 家选择 Coursera for Business 的公司

常见问题

PySpark is used on various platforms, including cloud services like Amazon Web Services (AWS), Microsoft Azure, and Google Cloud Platform (GCP), as well as on-premises clusters and local machines, providing flexibility for distributed data processing across different environments.

Yes, PySpark is an open-source distributed computing framework that is freely available. It allows users to process large-scale data sets efficiently using Python APIs on Apache Spark's distributed processing engine.

The course lasts approximately three hours and covers topics such as Big Data, Hadoop, Spark architecture, and PySpark.

To access the course materials, assignments and to earn a Certificate, you will need to purchase the Certificate experience when you enroll in a course. You can try a Free Trial instead, or apply for Financial Aid. The course may offer 'Full Course, No Certificate' instead. This option lets you see all course materials, submit required assessments, and get a final grade. This also means that you will not be able to purchase a Certificate experience.

Introduction to PySpark

Introduction to PySpark

您将学到什么

您将获得的技能

您将学习的工具

要了解的详细信息

了解顶级公司的员工如何掌握热门技能

该课程共有1个模块