Machine Learning with PySpark introduces the power of distributed computing for machine learning, equipping learners with the skills to build scalable machine learning models. Through hands-on projects, you will learn how to use PySpark for data processing, model building, and evaluating machine learning algorithms.

推荐体验
推荐体验
中级
Prior experience with Python and basic machine learning concepts is recommended. Familiarity with distributed computing will be helpful.
推荐体验
推荐体验
中级
Prior experience with Python and basic machine learning concepts is recommended. Familiarity with distributed computing will be helpful.
您将学到什么
Implement machine learning models using PySpark MLlib.
Implement linear and logistic regression models for predictive analysis.
Apply clustering methods to group unlabeled data using algorithms like K-means.
Explore real-world applications of PySpark MLlib through practical examples.
您将获得的技能
您将学习的工具
要了解的详细信息

添加到您的领英档案
14 项作业
了解顶级公司的员工如何掌握热门技能

积累特定领域的专业知识
- 向行业专家学习新概念
- 获得对主题或工具的基础理解
- 通过实践项目培养工作相关技能
- 获得可共享的职业证书

该课程共有4个模块
This module will instruct you on setting up of an environment for the implementation of machine learning algorithms using PySpark MLlib. You will gain a fundamental understanding of the importance of machine learning in the context of big data and explore the implementation of machine learning models using PySpark.
涵盖的内容
27个视频5篇阅读材料4个作业3个讨论话题
27个视频•总计138分钟
- Course Introduction: Machine Learning with PySpark•2分钟
- BigData and Distributed Systems•7分钟
- Introduction to MLlib•4分钟
- Key Features and Applications of MLlib•4分钟
- What is Machine Learning•6分钟
- Types of Machine Learning•7分钟
- Applications of Machine Learning•6分钟
- Why PySpark for ML•7分钟
- Supervised Machine Learning•3分钟
- Applications of Supervised Machine Learning•4分钟
- Unsupervised Machine Learning•7分钟
- Significance of Unsupervised Machine Learning•2分钟
- Semi-supervised Machine Learning•5分钟
- Significance of Semi-supervised Machine Learning•4分钟
- Machine Learning Pipelines•5分钟
- Benefits of ML Pipeline•4分钟
- Linear Regression•5分钟
- Linear Regression Use case•5分钟
- Implementing Linear Regression•7分钟
- Data Ingestion•6分钟
- Logistic Regression•5分钟
- Applications of Logistic Regression•3分钟
- Key Concepts of Logistic Regression•3分钟
- Logistic Regression Use case•6分钟
- Evaluating Logistic Regression•7分钟
- Decision Trees•7分钟
- Decision Trees Use case•6分钟
5篇阅读材料•总计50分钟
- Welcome to Machine Learning with PySpark•10分钟
- Introduction to Machine Learning with PySpark•10分钟
- Machine Learning Algorithms•10分钟
- Importance of Machine Learning with PySpark•10分钟
- Module Summary: Introduction to PySpark Machine Learning•10分钟
4个作业•总计38分钟
- Knowledge Check: Introduction to PySpark MLlib•20分钟
- Practice Quiz: Overview of PySpark MLlib•6分钟
- Practice Quiz: Introduction to Machine Learning•6分钟
- Practice Quiz: Implementing Machine Learning Algorithms•6分钟
3个讨论话题•总计30分钟
- Introduce Yourself•10分钟
- Working with PySpark•10分钟
- Which Machine Learning algorithm do you think is widely used?•10分钟
In this module, you will be able to explore the foundations of unsupervised machine learning, focusing on techniques for analyzing unlabeled data. You will dive into clustering algorithms like K-means, learning how to group data points based on similarities. Additionally, you will discover the power of Association Rule Mining, uncovering hidden patterns and relationships in datasets without predefined labels.
涵盖的内容
26个视频6篇阅读材料5个作业1个讨论话题
26个视频•总计134分钟
- What is Unsupervised Learning•6分钟
- Applications of Unsupervised Learning•3分钟
- How Algorithms differ between Supervised and Unsupervised •5分钟
- Unsupervised Machine Learning Algorithms•5分钟
- What is Distance metrics•4分钟
- Significance of Distance metrics•3分钟
- Types of Distance metrics•7分钟
- Labeling techniques•5分钟
- Significance of Data Labeling•5分钟
- One hot encoding•7分钟
- What is Clustering•6分钟
- Types of Clustering•4分钟
- K-means Clustering•6分钟
- Applications of K-means Clustering•4分钟
- Elbow method•5分钟
- Silhouette Method•4分钟
- Demonstration of K-means Clustering•5分钟
- Model Evaluation for K-means Clustering•5分钟
- What is Association Rule Mining•7分钟
- Implementing of FP Growth•4分钟
- Basic concepts of Association Rule Mining•5分钟
- Lift and Frequent Itemsets•4分钟
- Applications of Association Rule Mining•5分钟
- Use cases of Association Rule Mining•3分钟
- Demonstration of Association Rule Mining•7分钟
- Association Rule Mining using FP Growth•7分钟
6篇阅读材料•总计65分钟
- Unsupervised Machine Learning with PySpark•10分钟
- Explore DBSCAN•15分钟
- Dimensionality Reduction•10分钟
- Gaussian Mixture Models (GMM)•10分钟
- FP-Growth Algorithm in PySpark•10分钟
- Module Summary: Advanced PySpark Machine Learning•10分钟
5个作业•总计44分钟
- Knowledge Check: Overview of Unsupervised Learning•20分钟
- Practice Quiz: Unsupervised Machine Learning•6分钟
- Practice Quiz: Data Labeling•6分钟
- Practice Quiz: Overview of Clustering•6分钟
- Practice Quiz: Overview of Association Rule Mining•6分钟
1个讨论话题•总计10分钟
- Have you ever applied dimensionality reduction techniques to high-dimensional data?•10分钟
The course will equip you with the skills to evaluate machine learning models using various performance metrics and techniques in PySpark MLlib. You will also explore the future scope and potential applications of MLlib in real-world scenarios, gaining insights into how it can be applied to different industries and problem domains. Through case studies, you will analyze practical examples of machine learning implementations.
涵盖的内容
18个视频2篇阅读材料4个作业2个讨论话题
18个视频•总计99分钟
- Evaluating ML Models•2分钟
- Steps for Evaluating Models•7分钟
- RMSE•7分钟
- R-squared•6分钟
- Significance of R-Squared•2分钟
- Saving and Loading Models•7分钟
- Future scope of MLlib•7分钟
- Applications of Social Media•6分钟
- Applications of Entertainment•7分钟
- Applications of Business•6分钟
- Applications of Finance•5分钟
- Customer churn prediction•5分钟
- Model Training for Customer Churn Prediction •7分钟
- Model Evaluation for Customer Churn Prediction•5分钟
- Market Basket Analysis•4分钟
- FP Growth for Market Basket Analysis•6分钟
- Predictive Maintenance•4分钟
- Model Evaluation in Predictive Maintainance•5分钟
2篇阅读材料•总计20分钟
- Model Evaluation Techniques in Machine Learning•10分钟
- Module Summary: Applications and Case-Studies•10分钟
4个作业•总计38分钟
- Knowledge Check: Applications and Case studies of MLlib•20分钟
- Practice Quiz: Machine Learning Model Evaluation•6分钟
- Practice Quiz: Future developments and Applications of MLlib•6分钟
- Practice Quiz: Case-studies of MLlib•6分钟
2个讨论话题•总计20分钟
- Have you had experience applying machine learning in various industries?•10分钟
- In what ways do you think MLlib is utilized in the Health and Medical sector?•10分钟
This module is meant to test how well you understand the different ideas and lessons you've learned in this course. You will undertake a project based on these PySpark concepts and complete a comprehensive quiz that will assess your confidence and proficiency in Machine Learning with PySpark.
涵盖的内容
1个视频2篇阅读材料1个作业1个讨论话题
1个视频•总计2分钟
- Summary: Machine Learning with PySpark•2分钟
2篇阅读材料•总计70分钟
- Project - House Price Prediction •60分钟
- Course Overview of Machine Learning with PySpark•10分钟
1个作业•总计20分钟
- End Course Knowledge Check: Machine Learning with PySpark•20分钟
1个讨论话题•总计10分钟
- Describe your Learning Journey•10分钟
获得职业证书
将此证书添加到您的 LinkedIn 个人资料、简历或履历中。在社交媒体和绩效考核中分享。
位教师

提供方

提供方

Edureka is an online education platform focused on delivering high-quality learning to working professionals. We have the highest course completion rate in the industry and we strive to create an online ecosystem for our global learners to equip themselves with industry-relevant skills in today’s cutting edge technologies.
从 Machine Learning 浏览更多内容
人们为什么选择 Coursera 来帮助自己实现职业发展

Felipe M.

Jennifer J.

Larry W.

Chaitanya A.
通过在线学位推动您的职业生涯
获取世界一流大学的学位 - 100% 在线
常见问题
This course assumes basic knowledge of Python programming, SQL, and an understanding of machine learning concepts. Familiarity with big data and distributed systems will be helpful but is not mandatory.
PySpark MLlib is Apache Spark’s scalable machine learning library, designed for large-scale data processing. Learning PySpark MLlib helps you implement machine learning algorithms in a distributed computing environment, making it essential for big data applications.
While the course provides a foundation in PySpark and machine learning, it is more suitable for learners who have a basic understanding of machine learning concepts and Python programming.
PySpark is specifically designed for distributed computing and big data processing, making it suitable for handling large datasets across multiple machines. Scikit-learn, on the other hand, is used for smaller datasets and single-machine environments. PySpark’s MLlib leverages Apache Spark for parallel processing, while scikit-learn is more focused on traditional machine learning workflows.
PySpark is a highly in-demand skill in the field of big data analytics and machine learning. Proficiency in PySpark opens up career opportunities in data engineering, data science, and machine learning roles, particularly in organizations dealing with large-scale data.
A machine with at least 8 GB RAM and a quad-core processor is recommended for smooth local practice. You will need Python 3.x, Java 8 or higher, and Apache Spark installed. Alternatively, you can use Dockerized Spark or virtualized environments for ease of setup.
PySpark shines when datasets grow beyond the limits of single-machine processing (typically several gigabytes and above). For very small datasets, libraries like Pandas or scikit-learn may be more efficient.
While job outcomes vary, the skills taught in PySpark, distributed computing, and big data pipelines are highly valued in Data Engineering and Big Data roles. Completing the course will strengthen your portfolio and significantly improve your chances of securing interviews.
Yes, you can preview the first video and view the syllabus before you enroll. You must purchase the course to access content not included in the preview.
If you decide to enroll in the course before the session start date, you will have access to all of the lecture videos and readings for the course. You’ll be able to submit assignments once the session starts.
Once you enroll and your session begins, you will have access to all videos and other resources, including reading items and the course discussion forum. You’ll be able to view and submit practice assessments, and complete required graded assignments to earn a grade and a Course Certificate.
If you complete the course successfully, your electronic Course Certificate will be added to your Accomplishments page - from there, you can print your Course Certificate or add it to your LinkedIn profile.
This course is currently available only to learners who have paid or received financial aid, when available.
Yes. In select learning programs, you can apply for financial aid or a scholarship if you can’t afford the enrollment fee. If fin aid or scholarship is available for your learning program selection, you’ll find a link to apply on the description page.
更多问题
提供助学金,






