This course introduces distributed computing frameworks and big data visualization techniques. Learners will explore MapReduce, work with Apache Spark, implement transformations with PySpark, and use Spark SQL for large-scale analysis. The course concludes with building compelling dashboards and reports using Power BI for actionable business insights.
By the end of this course, you will be able to:
- Explain distributed computing and MapReduce concepts
- Process large datasets using Apache Spark and PySpark
- Apply Spark SQL for advanced queries and transformations
- Create dashboards and visualizations using Power BI
Tools & Software:
Apache Spark, PySpark, Azure Databricks, Power BI
Skills:
Distributed computing, Data analysis, PySpark, Spark SQL, Data visualization
Distributed Computing and MapReduce Concepts explores the foundational principles that enable modern organizations to process massive datasets that have outgrown the limits of single-machine computing. Through real-world examples, visual walkthroughs, hands-on labs, and guided design activities, you'll examine how data is broken into parallel tasks and executed across clusters of machines, how the Map, shuffle, and Reduce phases work together, and how common MapReduce patterns—such as counting, filtering, joining, and aggregation—solve practical big data problems efficiently and at scale.
涵盖的内容
6个视频3篇阅读材料8个作业
显示有关单元内容的信息
6个视频•总计36分钟
The Scale Challenge in Modern Computing•4分钟
Visualizing Distributed Processing Workflows•7分钟
Simplifying Complex Problems with MapReduce•5分钟
Tracing MapReduce Execution Flow•8分钟
MapReduce Patterns in Production Systems•5分钟
Implementing MapReduce Patterns•8分钟
3篇阅读材料•总计30分钟
Distributed Computing Principles for Big Data•10分钟
MapReduce Programming Model Deep Dive•10分钟
Essential MapReduce Patterns and Algorithms•10分钟
8个作业•总计240分钟
Distributed Computing Analysis•30分钟
Distributed Computing Concepts Assessment•30分钟
MapReduce Algorithm Design•30分钟
MapReduce Execution Tracing•30分钟
MapReduce Programming Model Assessment•30分钟
MapReduce Solution Design•30分钟
MapReduce Patterns and Applications Assessment•30分钟
Distributed Computing and MapReduce Mastery Graded Quiz•30分钟
Apache Spark Architecture and Fundamentals
第 2 单元•小时 后完成
单元详情
Apache Spark Architecture and Fundamentals provides a comprehensive introduction to the distributed processing engine that revolutionized big data analytics by overcoming traditional MapReduce limitations. Through real-world examples, visual walkthroughs, hands-on labs, and guided design activities, you'll examine Spark's core components, including the driver, executors, and cluster manager, explore how in-memory processing delivers dramatic performance improvements, and learn to configure and manage Spark clusters and applications for efficient large-scale data processing.
涵盖的内容
7个视频3篇阅读材料9个作业
显示有关单元内容的信息
7个视频•总计40分钟
Spark's Revolution in Big Data Processing•4分钟
Spark Cluster Setup and Configuration – Part 1•6分钟
Spark Cluster Setup and Configuration – Part 2•5分钟
Data Processing with PySpark RDDs and DataFrames focuses on practical data processing using PySpark's Python API for Apache Spark. Through real-world examples, visual walkthroughs, hands-on labs, and guided design activities, you'll implement data processing operations using both RDDs and DataFrames, develop transformation pipelines, apply common data cleaning and preparation techniques, and optimize PySpark code for better performance across enterprise-scale big data scenarios.
涵盖的内容
6个视频3篇阅读材料10个作业
显示有关单元内容的信息
6个视频•总计37分钟
Python Meets Big Data with PySpark•4分钟
PySpark Development Workflow•9分钟
DataFrames: Structured Big Data Made Simple•4分钟
DataFrame Operations and Schema Management•8分钟
Advanced Analytics with PySpark Transformations•5分钟
Building Complex Transformation Pipelines•7分钟
3篇阅读材料•总计30分钟
PySpark Development Environment and Best Practices•10分钟
Advanced Data Processing with Spark SQL introduces Spark SQL as a powerful interface for structured data processing in distributed environments. Through real-world examples, visual walkthroughs, hands-on labs, and guided design activities, you'll master SQL operations at scale, from basic queries to complex analytical operations, learn to create and manage temporary views and tables, and optimize query performance for production workloads that would overwhelm traditional database systems.
涵盖的内容
6个视频3篇阅读材料10个作业
显示有关单元内容的信息
6个视频•总计35分钟
SQL at Scale with Spark SQL•4分钟
Spark SQL Environment and Basic Queries•7分钟
Enterprise Analytics with Advanced Spark SQL•5分钟
Implementing Complex Analytical Queries•7分钟
Optimizing Spark SQL for Production Performance•5分钟
Query Performance Analysis and Tuning•7分钟
3篇阅读材料•总计30分钟
Spark SQL Architecture and Programming Model•10分钟
Advanced Spark SQL Operations and Optimization•10分钟
Spark SQL Performance Tuning and Optimization•10分钟
Data Visualization for Big Data with Power BI introduces comprehensive visualization techniques specifically designed for big data environments using Microsoft Power BI. Through real-world examples, visual walkthroughs, hands-on labs, and guided design activities, you'll learn to connect Power BI to various big data sources, create effective visualizations for large datasets, build interactive dashboards that enable self-service analytics, and implement best practices for handling performance challenges when visualizing massive datasets.
Our goal at Microsoft is to empower every individual and organization on the planet to achieve more.
In this next revolution of digital transformation, growth is being driven by technology. Our integrated cloud approach creates an unmatched platform for digital transformation. We address the real-world needs of customers by seamlessly integrating Microsoft 365, Dynamics 365, LinkedIn, GitHub, Microsoft Power Platform, and Azure to unlock business value for every organization—from large enterprises to family-run businesses. The backbone and foundation of this is Azure.
When will I have access to the lectures and assignments?
To access the course materials, assignments and to earn a Certificate, you will need to purchase the Certificate experience when you enroll in a course. You can try a Free Trial instead, or apply for Financial Aid. The course may offer 'Full Course, No Certificate' instead. This option lets you see all course materials, submit required assessments, and get a final grade. This also means that you will not be able to purchase a Certificate experience.
What will I get if I subscribe to this Certificate?
When you enroll in the course, you get access to all of the courses in the Certificate, and you earn a certificate when you complete the work. Your electronic Certificate will be added to your Accomplishments page - from there, you can print your Certificate or add it to your LinkedIn profile.