返回到 Scalable Machine Learning on Big Data using Apache Spark

Scalable Machine Learning on Big Data using Apache Spark

This course will empower you with the skills to scale data science and machine learning (ML) tasks on Big Data sets using Apache Spark. Most real world machine learning work involves very large data sets that go beyond the CPU, memory and storage limitations of a single computer. Apache Spark is an open source framework that leverages cluster computing and distributed storage to process extremely large data sets in an efficient and cost effective manner. Therefore an applied knowledge of working with Apache Spark is a great asset and potential differentiator for a Machine Learning engineer. After completing this course, you will be able to: - gain a practical understanding of Apache Spark, and apply it to solve machine learning problems involving both small and big data - understand how parallel code is written, capable of running on thousands of CPUs. - make use of large scale compute clusters to apply machine learning algorithms on Petabytes of data using Apache SparkML Pipelines. - eliminate out-of-memory errors generated by traditional machine learning frameworks when data doesn’t fit in a computer's main memory - test thousands of different ML models in parallel to find the best performing one – a technique used by many successful Kagglers - (Optional) run SQL statements on very large data sets using Apache SparkSQL and the Apache Spark DataFrame API. Enrol now to learn the machine learning techniques for working with Big Data that have been successfully applied by companies like Alibaba, Apple, Amazon, Baidu, eBay, IBM, NASA, Samsung, SAP, TripAdvisor, Yahoo!, Zalando and many others. NOTE: You will practice running machine learning tasks hands-on on an Apache Spark cluster provided by IBM at no charge during the course which you can continue to use afterwards. Prerequisites: - basic python programming - basic machine learning (optional introduction videos are provided in this course as well) - basic SQL skills for optional content The following courses are recommended before taking this class (unless you already have the skills) https://hua.dididi.sbs/learn/python-for-applied-data-science or similar https://hua.dididi.sbs/learn/machine-learning-with-python or similar https://hua.dididi.sbs/learn/sql-data-science for optional lectures

状态：Data Processing

状态：Descriptive Statistics

中级课程小时

精选评论

5.0评论日期：Feb 25, 2020

After completing this course you will be able to use Apache Spark to build ML models (e.g., Linear Regression, Gaussian Mixture Model, etc.).

4.0评论日期：Jan 22, 2021

Depending on the student, this can either be an easy or a difficult course. Some parts needs update, and it would be great if there are more explanation on the algorithms.

5.0评论日期：Apr 30, 2020

I like the example given and step by step tutorial given. The explanation of why things are the way they are designed certainly helped me understand the concept. Kudos.

4.0评论日期：Feb 22, 2020

for the last assignment we should have got the opportunity to code in the notebook instead of just running it and reporting results.

4.0评论日期：Apr 15, 2020

He has good knowledge. Though his language is ok , He covered very important topics in very short span of time with high quality

5.0评论日期：Dec 11, 2019

Really really REALLY enjoyed this course! The instructor does a masterful job of going from simple examples and building up complexity in a very logical and thorough way.

4.0评论日期：Mar 27, 2020

I found difficult to understand the concepts, for sure I must have to review the class.Thanks for the dedication in helping us.Itamar

4.0评论日期：Jul 15, 2020

Nice introduction to Big Data processing, No coding skill required. A little more focus on the theory would be nice as the Python coding exercises are a little redundant.

4.0评论日期：Apr 5, 2020

It is a good course for beginners in the domain of Apache Spark and Apache Spark ML. Programming assignments could have been better if they were applied to "Big Data" and not on toy datasets.

4.0评论日期：Jul 25, 2020

In some videos, it shows one thing in the video and then there is a prompt to follow another one. It gets a little bit confusing there.

4.0评论日期：May 31, 2020

It was a very interesting and skillful course. Thanks to IBM and Coursera for such a wonderful course. Special thanks to Mr. Romeo Kienzer for explaining it so well.

4.0评论日期：Feb 17, 2020

I found this course incredibly beneficial. Moving forward, I would like to see a bit more explanation of concepts and few extra workable examples.

所有审阅

显示：20/319

Lewis morris

2.0

评论日期：Nov 11, 2019

So far the questions and quizes seem unrelated to machine learning. The videos are poorly set out, with breif explanations and the whole thing seems rushed.

Ruslan Idelfonso Magana Vsevolodovna

1.0

评论日期：Nov 9, 2019

Apache spark is great and powerful but the lectures are not clear and long.

Nicolás Fornasari

1.0

评论日期：Dec 4, 2019

Trully disappointing!! Waste of time and money. Really poor video material and exercises.

I definetelly dont recommend this course!!

Benhur Ortiz Jaramillo

2.0

评论日期：Oct 8, 2019

Too superficial. The python example codes are very cryptic and not very well commented. The programming videos are very difficult to follow because the instructor is literally reading the code instead of explaining it.

Pierre Pocry

2.0

评论日期：Apr 1, 2020

I am quite disappointed by this course, which could be greatly improved.

Here are a few remarks :

1) Half of the course is about basic statistical distribution analysis : always nice to learn about that, but it has nothing to do with spark and is more general datascience knowledge

2) Most of the topics are just mentionned but the instructor, which never goes into explaining interesting details. It's always "don't worry if you don't understand that, it doesn' matter". Well, actually if it's only to learn basic stuff I don't need coursera...

3) In several videos, the instructor just records himself coding with spark for several minutes. How boring ! At what point do you think this kind of video is of any interest for your student ? What are the practice lab for ?

4) The quizz are annoying ! Not that they are "easy" but most of the answer of the questions are either ambiguous or not related to the course and must be found elsewhere

5) The final quizz is a joke and you just have to run a notebook and then copy/past the answer. Even there, some question are ambiguous : you are asked to change a model and then "compare the models", but which one ? The one we changed or the initial one ? And so one.

In conclusion, I learn some things and practice a bit apache spark with this course, but 90% of what I learned come from other sources than coursera... I put excellent marks on other courses so don't think I am just ungrateful.

Syazwan Z

1.0

评论日期：Dec 8, 2019

Video is outdated. The course material is not structured properly. Lack of explanation on the code. The code reference is confusing.

Batyr

1.0

评论日期：Jan 24, 2020

Demotivating course, the worst I studied on Coursera. If not "AI Engineering" path I would have dropped it

1.0

评论日期：Nov 22, 2019

a very low quality course, not recommended. 质量很差的一个课程，讲师带严重口语也就算了，连英文字母有时候都缺失或者对不上，作业质量也很差，不建议上，看的累

Petr Jaškovský

1.0

评论日期：Jan 13, 2020

Obsolete presentations with low quality of explanations. I would not recommend this course to anyone.

Jay Pimprikar

1.0

评论日期：Sep 29, 2019

Horrible

Philippe Desautels

4.0

评论日期：Oct 23, 2019

The course is interesting but I think a few things could be improved.

For instance, some code examples from the videos are outdated because of a newer spark version. The video was edited to mention that the github repo was updated but I was unable to find the updated code.

One (maybe more?) of the videos was done in a car; It makes the whole thing feel unprofessional even though the teacher's skills far exceed the requirements for teaching this course.

As others have mentioned, the teacher's accent can be a bit difficult to understand at times but to me, this does not affect the quality of the course. The teacher always seems interested and is smiling most of the time which might seem unimportant but it still sets a positive mood for the lectures which is great.

All in all, the course is interesting and it provides a good introduction to Machine Learning using Apache Spark.

Yasser El Haddar

3.0

评论日期：Oct 27, 2019

Really interesting content

Unclear coding explanations

Limitations with the free access in IBM Watson Studio

Abdelrahman Gamal Eldin Fadel

3.0

评论日期：Sep 16, 2019

the accent of the instructor was very hard to understand him during explanation but he was good instructor at all

Denis Uspenskiy

2.0

评论日期：Jan 3, 2020

Very primitive tasks

Suresh Chaudhary

3.0

评论日期：Nov 2, 2019

There should be more details about Apache spark and some examples

Ujjwal Garg

5.0

评论日期：Nov 11, 2019

For a intorductory course it is very good. Do not expect anything too advanced.

Gherbi Hicham

4.0

评论日期：Nov 15, 2019

A very good course and will recommend it for anyone who has Apache Spark experience and wants to get an introduction to ML lib and machine learning in Apache Spark, the assignment submissions need some work but other than that a very good introductive course.

Farrukh Naveed Anjum

3.0

评论日期：Nov 6, 2019

Course can be improved by focusing more on ML algorithms.... Explanation of GBT and Random Forest was not provided. But they were used.

Justin Miller

3.0

评论日期：Dec 14, 2019

The Videos/tutorials need to be updated for the new pyspark. Some are still using python 2 which reaches end of life in a month!

Jair Julio Condori Cotrina

3.0

评论日期：Oct 19, 2019

Very Good, but I think the course needs more challenging exams