The course "Multicore and GPGPU Programming" provides a foundational understanding of parallel programming, focusing on developing high-performance, multi-threaded applications in both CPU and GPU environments. Beginning with a review of multicore processor architectures, caching mechanisms, and Non-Uniform Memory Access (NUMA) systems, students will learn the essentials of shared memory programming, synchronisation techniques, and the use of locks to ensure data integrity across threads.
The course delves into designing shared memory data structures and introduces advanced synchronisation concepts, including lazy synchronisation, crucial for scalable and efficient concurrent applications. Additionally, students will explore the architecture and programming model of General-Purpose Graphics Processing Units (GPGPUs) and learn CUDA programming to leverage GPU parallelism for compute-intensive tasks. By the end of the course, students will be adept in optimising multi-threaded and many-core applications, balancing workload across CPUs and GPUs to achieve high throughput and efficient resource utilisation. This course is essential for those aiming to develop expertise in high-performance computing and parallel programming for modern multi-core and GPU-based systems.
In this module, the learners will be introduced to the course and its syllabus, setting the foundation for their learning journey. The course's introductory video will provide them with insights into the valuable skills and knowledge they can expect to gain throughout the duration of this course. Additionally, the syllabus reading will comprehensively outline essential course components, including course values, assessment criteria, grading system, schedule, details of live sessions, and a recommended reading list that will enhance the learner’s understanding of the course concepts. Moreover, this module offers the learners the opportunity to connect with fellow learners as they participate in a discussion prompt designed to facilitate introductions and exchanges within the course community.
涵盖的内容
4个视频1篇阅读材料1个讨论话题
显示有关单元内容的信息
4个视频•总计51分钟
Course Introductory Video•2分钟
Meet Your Instructor - Dr. Gargi Prabhu •1分钟
Meet Your Instructor - Dr. Kunal Korgaonkar•1分钟
Recording of Multicore and GPGPU Programming: Week 1 - Live Session on 25-05-23 18:32:50 [47:25]•47分钟
1篇阅读材料•总计10分钟
Course Overview•10分钟
1个讨论话题•总计10分钟
Meet Your Peers•10分钟
Introduction to Parallel and Multicore Programming
第 2 单元•小时 后完成
单元详情
In this module, students will gain foundational knowledge of parallel and multi-threaded programming, exploring the core principles that underlie the efficient utilisation of modern multi-core and many-core processors. Beginning with an overview of parallel programming concepts, this module covers different types of parallelism, including data parallelism, task parallelism, and pipeline parallelism. Students will also examine critical performance metrics like speedup, efficiency, and scalability, which help in evaluating the benefits and trade-offs of parallel approaches.
涵盖的内容
12个视频2篇阅读材料12个作业1个讨论话题
显示有关单元内容的信息
12个视频•总计73分钟
Need for Ever-Increasing Performance•8分钟
Parallel Systems and Parallel Programs•8分钟
Concurrent, Parallel, Distributed Systems•5分钟
Types of Parallelism: Data, Task and Pipeline Parallelism•8分钟
Speedup and Efficiency•5分钟
Amdahl’s Law •5分钟
Gustafson’s Law •5分钟
Scalability in Parallel Systems•5分钟
Cost of Parallelisation•7分钟
Sources of Overhead in Parallel Programs •5分钟
Timing Parallel Programs: Methods and Best Practices•7分钟
GPU Performance•5分钟
2篇阅读材料•总计120分钟
Recommended Reading: Fundamentals of Parallel Computing•60分钟
Recommended Reading: Introduction to Performance Metrics in Parallel Computing•60分钟
12个作业•总计36分钟
Need for Ever-Increasing Performance•3分钟
Parallel Systems and Parallel Programs•3分钟
Concurrent, Parallel, Distributed Systems•3分钟
Types of Parallelism: Data, Task and Pipeline Parallelism•3分钟
Speedup and Efficiency•3分钟
Amdahl’s Law •3分钟
Gustafson’s Law •3分钟
Scalability in MIMD Systems•3分钟
Cost of Parallelisation•3分钟
Sources of Overhead in Parallel Programs•3分钟
Taking Timings of Parallel Programs•3分钟
GPU Performance•3分钟
1个讨论话题•总计30分钟
Why Parallelism? Revisiting the Roots of Multicore Programming•30分钟
Multicore Processor Architectures and Caching Mechanisms
第 3 单元•小时 后完成
单元详情
This module provides an in-depth exploration of multicore processor architectures, examining the design principles, performance considerations, and challenges involved in building efficient multicore systems. Students will study how multiple cores interact within a processor, focusing on memory hierarchies, caching mechanisms, and the role of parallelism in improving computational performance.
Parallel Software: Coordinating Process and Threads•12分钟
Distributed Memory Software•3分钟
1个讨论话题•总计30分钟
From Von Neumann to Multicore: Evolving Architectures and Memory Realities•30分钟
GPGPU Architecture and Programming Model Overview
第 4 单元•小时 后完成
单元详情
This module introduces students to the architectural principles of General-Purpose GPU (GPGPU) systems and the CUDA programming model. It explores the hardware components, including Streaming Multiprocessors (SMs), CUDA cores, and memory hierarchy, which form the foundation of GPU computing. The module also provides an overview of the CUDA programming model, emphasising its thread hierarchy, grid, and block organisation. By understanding these fundamental concepts, students will develop the ability to harness GPU architecture for high-performance parallel computing.
涵盖的内容
15个视频2篇阅读材料14个作业1个讨论话题
显示有关单元内容的信息
15个视频•总计127分钟
GPUs and GPGPU•5分钟
GPU Architecture•5分钟
Heterogeneous Computing•4分钟
Paradigm of Heterogeneous Computing•5分钟
Introduction to CUDA•5分钟
Structure of a CUDA Program•8分钟
Threads, Blocks, and Grid•9分钟
Managing Memory•7分钟
Writing and Verifying Your Kernel•6分钟
Compiling and Running CUDA Program•4分钟
Nvidia Compute Capabilities and Device Architecture•6分钟
Timing Your Kernel•7分钟
Organising Parallel Threads•5分钟
Managing Devices•4分钟
Recording of Multicore and GPGPU Programming: Week 3 - Live Session on 25-06-06 18:31:21 [44:50]•45分钟
2篇阅读材料•总计75分钟
Recommended Reading: GPGPU Architecture and CUDA•15分钟
Recommended Reading: Programming Model Overview•60分钟
14个作业•总计48分钟
GPUs and GPGPU•6分钟
GPU Architecture•3分钟
Heterogeneous Computing•3分钟
Paradigm of Heterogeneous Computing•3分钟
Introduction to CUDA•3分钟
Structure of a CUDA Program•3分钟
Threads, Blocks, and Grid•6分钟
Managing Memory•3分钟
Writing and Verifying Your Kernel•3分钟
Compiling and Running CUDA Program•3分钟
Nvidia Compute Capabilities and Device Architecture•3分钟
Timing Your Kernel•3分钟
Organising Parallel Threads•3分钟
Managing Devices•3分钟
1个讨论话题•总计30分钟
Harnessing GPU Power: Exploring CUDA and the Architecture of Parallelism•30分钟
Cuda Execution Model
第 5 单元•小时 后完成
单元详情
This module provides a comprehensive understanding of how CUDA executes programs on GPUs. It covers key concepts such as warps, warp scheduling, and resource partitioning, which are critical for understanding GPU hardware behaviour. The module delves into branch divergence and its impact on performance, offering strategies to minimise its effects. It also emphasises exposing parallelism effectively by leveraging CUDA’s hierarchical execution model. Students will learn how to design and optimise GPU programs by aligning with the underlying execution model to maximise efficiency and throughput.
涵盖的内容
15个视频2篇阅读材料15个作业1个讨论话题
显示有关单元内容的信息
15个视频•总计135分钟
Introduction to CUDA Execution Model•7分钟
Warps and Thread Blocks•4分钟
Warp Divergence•9分钟
Resource Partitioning•6分钟
Latency Hiding•10分钟
Occupancy•5分钟
Synchronization•4分钟
Scalability•5分钟
Exposing Parallelism•10分钟
Checking Active Warps with Nvprof•6分钟
Checking Memory Operations with Nvprof•7分钟
Avoiding Branch Divergence•3分钟
The Parallel Reduction Problem and Thread Divergence•7分钟
Improving Divergence in Parallel Reduction•6分钟
Recording of Multicore and GPGPU Programming: Week 4 - Live Session on 25-06-13 18:32:39 [49:37]•45分钟
2篇阅读材料•总计120分钟
Recommended Reading: Structure of a CUDA Program•60分钟
Recommended Reading: Exposing Parallelism and Avoiding Branch Divergence•60分钟
15个作业•总计105分钟
Graded Quiz - Modules 3 and 4 •60分钟
Introduction to CUDA Execution Model•3分钟
Warps and Thread Blocks •3分钟
Warp Divergence•3分钟
Resource Partitioning•6分钟
Latency Hiding•3分钟
Occupancy•3分钟
Synchronization•3分钟
Scalability•3分钟
Exposing Parallelism•3分钟
Checking Active Warps with Nvprof•3分钟
Checking Memory Operations with Nvprof•3分钟
Avoiding Branch Divergence•3分钟
The Parallel Reduction Problem and Thread Divergence•3分钟
Improving Divergence in Parallel Reduction•3分钟
1个讨论话题•总计30分钟
Under the Hood: Warps, Divergence, and CUDA Execution Dynamics•30分钟
CUDA Memory Model and Streams and Concurrency
第 6 单元•小时 后完成
单元详情
The CUDA Memory Model & Streams and Concurrency module introduces students to the intricacies of memory hierarchy in CUDA, including global, shared, and local memory. It emphasises the importance of memory coalescing and efficient memory access patterns to optimise performance on GPUs. The module also covers CUDA streams, explaining how concurrent kernel execution and memory operations can be managed to enhance parallelism. By understanding these concepts, students will gain the ability to design GPU programs that maximise throughput and minimise latency.
涵盖的内容
14个视频2篇阅读材料14个作业1个讨论话题1个非评分实验室
显示有关单元内容的信息
14个视频•总计126分钟
Introduction to CUDA Memory Model•8分钟
Memory Allocation and Deallocation•6分钟
Zero Copy Memory•4分钟
Unified Virtual Addressing and Unified Memory •3分钟
Aligned and Coalesced Access•6分钟
CUDA Shared Memory•6分钟
Shared Memory Banks and Access Mode •7分钟
Configuring the Amount of Shared Memory•5分钟
Synchronisation•9分钟
CUDA Streams•7分钟
Stream Scheduling and Priorities•6分钟
CUDA Events•6分钟
Concurrent Kernel Execution•6分钟
Recording of Multicore and GPGPU Programming: Week 5 - Live Session on 25-06-20 18:31:59 [47:36]•48分钟
2篇阅读材料•总计120分钟
Recommended Reading: CUDA Memory Model•60分钟
Recommended Reading: Streams and Concurrency•60分钟
14个作业•总计342分钟
SGA-1: CUDA Programming and Performance Optimisation•300分钟
Introduction to CUDA Memory Model•3分钟
Memory Allocation and Deallocation•3分钟
Zero Copy Memory•3分钟
Unified Virtual Addressing and Unified Memory •3分钟
Aligned and Coalesced Access•3分钟
CUDA Shared Memory•6分钟
Shared Memory Banks and Access Mode •3分钟
Configuring the Amount of Shared Memory•3分钟
Synchronisation•3分钟
CUDA Streams•3分钟
Stream Scheduling and Priorities•3分钟
CUDA Events•3分钟
Concurrent Kernel Execution•3分钟
1个讨论话题•总计30分钟
Smart Memory and Seamless Concurrency: CUDA Memory and Streams•30分钟
1个非评分实验室•总计60分钟
Hands on lab: Parallel Matrix Addition Using CUDA•60分钟
Shared-Memory Programming with Pthreads
第 7 单元•小时 后完成
单元详情
This module explains in depth the difference between processes and threads and introduces multithreaded programming using pthreads library. Students are expected to learn about the various functions in pthreads library and implement those to solve real-world problems through a multithreaded approach. It also discusses precautions to take while developing an algorithm that uses multi-threading.
涵盖的内容
10个视频11篇阅读材料10个作业1个讨论话题
显示有关单元内容的信息
10个视频•总计116分钟
Processes, Threads and Pthreads•4分钟
Hello World!!•9分钟
Matrix-Vector Multiplication•13分钟
Critical Sections•5分钟
Busy Waiting•6分钟
Mutexes•5分钟
Semaphores•7分钟
Barriers and Condition Variables•13分钟
Caches, Cache-Coherence and False Sharing•9分钟
Recording of Multicore and GPGPU Programming: Week 6 - Live Session on 25-06-27 18:38:36 [43:53]•44分钟
11篇阅读材料•总计295分钟
Recommended Reading: Processes, Threads and Pthreads•10分钟
Recommended Reading: Barriers and Condition Variables•30分钟
Recommended Reading: Read-Write Locks•60分钟
Recommended Reading: Caches, Cache-Coherence and False Sharing•15分钟
Lab Instruction Document•10分钟
10个作业•总计135分钟
Graded Quiz - Modules 5 and 6 •60分钟
Processes, Threads and Pthreads•9分钟
Hello World!!•9分钟
Matrix-Vector Multiplication•9分钟
Critical Sections•9分钟
Busy Waiting•9分钟
Mutexes•9分钟
Semaphores•6分钟
Barriers and Condition Variables•6分钟
Caches, Cache-Coherence and False Sharing•9分钟
1个讨论话题•总计10分钟
Thread Synchronization and Shared Memory: Building Reliable Parallel Programs with Pthreads•10分钟
Distributed Memory Programming with MPI
第 8 单元•小时 后完成
单元详情
This module aims to introduce students to Distributed memory programming using the Message Passing Interface (MPI). Students will learn about the functions provided by the MPI library and their descriptions. It will enable students to develop parallel programming codes and also to convert a serial programmed code into a parallel code with the help of the MPI functions.
涵盖的内容
7个视频9篇阅读材料7个作业1个讨论话题
显示有关单元内容的信息
7个视频•总计70分钟
Introduction to MPI•4分钟
MPI Setup and Communicator Functions•6分钟
SPMD and Communication•10分钟
Potential Pitfalls•4分钟
Simple Serial Sorting Algorithm•20分钟
Parallel Odd-Even Transposition Sort•19分钟
Safety in MPI Programs•7分钟
9篇阅读材料•总计125分钟
Recommended Reading: Introduction to MPI•15分钟
Recommended Reading: MPI Setup and Communicator Functions•15分钟
Recommended Reading: SPMD and Communication•15分钟
Recommended Reading: Potential Pitfalls•15分钟
Recommended Reading: Simple Serial Sorting Algorithm•15分钟
MPI in Action: Understanding Setup, Communication, and Parallel Sorting•30分钟
Shared-Memory Programming with OpenMP
第 9 单元•小时 后完成
单元详情
This module aims to introduce the shared memory programming model with the help of the OpenMP library. Students will gain exposure to the functions in the OpenMP library and methods to implement those in code to implement parallelism using shared memory. Students will explore the foundational concepts of OpenMP through videos and readings, starting with the basics of the library and progressing to more advanced topics such as reduction clauses, variable scoping, and mutual exclusion. Through worked examples like the Trapezoidal Rule and sorting functions, learners will understand how to parallelise loops, manage scheduling, and apply critical sections and locks for safe concurrent execution. The module also covers tasking in OpenMP and classic concurrency problems like producers and consumers.
涵盖的内容
12个视频12篇阅读材料13个作业1个讨论话题
显示有关单元内容的信息
12个视频•总计94分钟
Introduction to OpenMP•5分钟
Programming in OpenMP•10分钟
Trapezoidal Rule•10分钟
Scope of Variables•4分钟
Reduction Clause•7分钟
Parallel-For Directive and Caveats in Them•8分钟
Sorting Functions•20分钟
Scheduling•6分钟
Producers and Consumers•6分钟
Termination, Startup and Atomic Directive•7分钟
Critical Sections and Locks•6分钟
Tasking•5分钟
12篇阅读材料•总计152分钟
Recommended Reading: Introduction to OpenMP•15分钟
Recommended Reading: Programming in OpenMP•15分钟
Recommended Reading: Trapezoidal Rule•15分钟
Recommended Reading: Scope of Variables•15分钟
Recommended Reading: Reduction Clause•15分钟
Recommended Reading: Parallel-For Directive and Caveats in Them•15分钟
Recommended Reading: Sorting Functions•15分钟
Recommended Reading: Scheduling •15分钟
Recommended Reading: Producers and Consumers•15分钟
Recommended Reading: Termination, Startup and Atomic Directive•1分钟
Recommended Reading: Critical Sections and Locks•1分钟
Recommended Reading: Tasking•15分钟
13个作业•总计168分钟
Graded Quiz - Modules 7 and 8•60分钟
Introduction to OpenMP•9分钟
Programming in OpenMP•9分钟
Trapezoidal Rule•9分钟
Scope of Variables•9分钟
Reduction Clause•9分钟
Parallel-For Directive and Caveats in Them•9分钟
Sorting Functions•9分钟
Scheduling•9分钟
Producers and Consumers•9分钟
Termination, Startup and Atomic Directive•9分钟
Critical Sections and Locks•9分钟
Tasking•9分钟
1个讨论话题•总计30分钟
Mastering OpenMP: From Parallel Patterns to Synchronisation•30分钟
Parallel Program Development 1
第 10 单元•小时 后完成
单元详情
This module will introduce the n-body problem in physics, examining its significance in simulating gravitational interactions among multiple particles. It will explore classical and modern algorithmic approaches to solving the n-body problem, followed by a discussion on their computational complexity. Emphasis will be placed on identifying opportunities for parallelisation, and students will analyse and implement efficient parallel solutions using the programming languages and parallel computing directives covered in the course.
涵盖的内容
13个视频13篇阅读材料13个作业1个讨论话题
显示有关单元内容的信息
13个视频•总计107分钟
Introduction to N-body Problem•8分钟
Serial Solutions to the N-body Problem•16分钟
Parallelising Strategy•13分钟
Parallelising Basic Solver Using OpenMP•9分钟
Parallelising Reduced Solver Using OpenMP •11分钟
Evaluating OpenMP Performance•5分钟
Parallelising Basic Solver Using Pthreads •4分钟
Parallelising Basic Solver Using MPI •9分钟
Parallelising Reduced Solver Using MPI•9分钟
Evaluating MPI Performance•6分钟
Parallelising Basic Solver Using CUDA•7分钟
Evaluating CUDA Solver and Improving Performance•4分钟
Using Shared Memory for Solvers•7分钟
13篇阅读材料•总计195分钟
Recommended Reading: Introduction to N-body Problem•15分钟
Recommended Reading: Serial Solutions to the N-body Problem•15分钟
Recommended Reading: Parallelising Strategy•15分钟
Recommended Reading: Parallelising Basic Solver Using OpenMP•15分钟
Recommended Reading: Parallelising Reduced Solver Using OpenMP•15分钟
Recommended Reading: Parallelising Basic Solver Using CUDA•15分钟
Recommended Reading: Evaluating CUDA Solver and Improving Performance•15分钟
Recommended Reading: Using Shared Memory for Solvers•15分钟
13个作业•总计138分钟
Introduction to N-body Problem•9分钟
Serial Solutions to the N-body Problem•9分钟
Parallelising Strategy•9分钟
Parallelising Basic Solver Using OpenMP•9分钟
Parallelising Reduced Solver Using OpenMP•9分钟
Evaluating OpenMP Performance•9分钟
Parallelising Basic Solver Using Pthreads•9分钟
Parallelising Basic Solver Using MPI•30分钟
Parallelising Reduced Solver Using MPI•9分钟
Evaluating MPI Performance•9分钟
Parallelising Basic Solver Using CUDA•9分钟
Evaluating CUDA Solver and Improving Performance•9分钟
Using Shared Memory for Solvers•9分钟
1个讨论话题•总计30分钟
The N-Body Solver: Exploring Parallelism Across Models•30分钟
Parallel Program Development 2
第 11 单元•小时 后完成
单元详情
This module focuses on hands-on implementations of the Sample Sort algorithm using OpenMP, Pthreads, MPI, and CUDA. Students will explore the strengths and limitations of each parallel programming model through practical coding exercises. The module includes performance benchmarking and comparative analysis of the implementations to highlight trade-offs in scalability, efficiency, and suitability for different architectures. By the end of the module, students will have a strong grasp of each API and be equipped to make informed decisions about the most appropriate tool for a given parallel computing task.
涵盖的内容
8个视频9篇阅读材料10个作业1个讨论话题
显示有关单元内容的信息
8个视频•总计61分钟
Sample Sort and Bucket Sort•10分钟
Map•17分钟
Implementing Sample Sort Using OpenMP: First Implementation•5分钟
Implementing Sample Sort Using OpenMP: Second Implementation•7分钟
Implementing Sample Sort Using Pthreads •4分钟
Implementing Sample Sort Using MPI•6分钟
Implementing Sample Sort Using MPI: Example•5分钟
Implementing Sample Sort Using CUDA •7分钟
9篇阅读材料•总计115分钟
Recommended Reading: Sample Sort and Bucket Sort•15分钟
Recommended Reading: Map•10分钟
Recommended Reading: Implementing Sample Sort Using OpenMP: First Implementation•15分钟
Recommended Reading: Implementing Sample Sort Using OpenMP: Second Implementation•15分钟
Recommended Reading: Implementing Sample Sort Using Pthreads•10分钟
Recommended Reading: Implementing Sample Sort Using MPI•15分钟
Recommended Reading: Implementing Sample Sort Using MPI: Example•15分钟
Recommended Reading: Implementing Sample Sort Using CUDA•10分钟
Birla Institute of Technology & Science, Pilani (BITS Pilani) is one of only ten private universities in India to be recognised as an Institute of Eminence by the Ministry of Human Resource Development, Government of India. It has been consistently ranked high by both governmental and private ranking agencies for its innovative processes and capabilities that have enabled it to impart quality education and emerge as the best private science and engineering institute in India.
BITS Pilani has four international campuses in Pilani, Goa, Hyderabad, and Dubai, and has been offering bachelor's, master’s, and certificate programmes for over 58 years, helping to launch the careers for over 1,00,000 professionals.
When will I have access to the lectures and assignments?
To access the course materials, assignments and to earn a Certificate, you will need to purchase the Certificate experience when you enroll in a course. You can try a Free Trial instead, or apply for Financial Aid. The course may offer 'Full Course, No Certificate' instead. This option lets you see all course materials, submit required assessments, and get a final grade. This also means that you will not be able to purchase a Certificate experience.
What will I get if I purchase the Certificate?
When you purchase a Certificate you get access to all course materials, including graded assignments. Upon completing the course, your electronic Certificate will be added to your Accomplishments page - from there, you can print your Certificate or add it to your LinkedIn profile.
Is financial aid available?
Yes. In select learning programs, you can apply for financial aid or a scholarship if you can’t afford the enrollment fee. If fin aid or scholarship is available for your learning program selection, you’ll find a link to apply on the description page.