End-to-End Multimodal AI: Fine-Tuning, Fusion, and MLOps

Coursera PlusMonthly 3 个月课程4 折优惠 ，让你轻松掌握闪耀技能。立即节省

End-to-End Multimodal AI: Fine-Tuning, Fusion, and MLOps

本课程是 Multimodal Intelligence - Vision, Audio & Language in Action 专业证书的一部分

位教师：Professionals from the Industry

包含在中

了解更多

20个模块

深入了解一个主题并学习基础知识。

中级等级

推荐体验

2 周完成

在 10 小时一周

灵活的计划

自行安排学习进度

20个模块

深入了解一个主题并学习基础知识。

中级等级

推荐体验

2 周完成

在 10 小时一周

灵活的计划

自行安排学习进度

您将学到什么

Fine-tune transformer-based multimodal models using transfer learning in PyTorch and TensorFlow.
Build cross-modal retrieval systems using FAISS and attention-based fusion of visual and text embeddings.
Automate ML pipelines with drift monitoring, hyperparameter tuning, and retraining using MLflow and Ray Tune.
Design and document versioned multimodal inference APIs with FastAPI, OAuth2, and OpenAPI specifications.

您将获得的技能

您将学习的工具

要了解的详细信息

可分享的证书

添加到您的领英档案

了解顶级公司的员工如何掌握热门技能

了解关于 Coursera for Business 的更多信息

Petrobras, TATA, Danone, Capgemini, P&G 和 L'Oreal 的徽标

积累 Algorithms 领域的专业知识

本课程是 Multimodal Intelligence - Vision, Audio & Language in Action 专业证书专项课程的一部分

在注册此课程时，您还会同时注册此专业证书。

向行业专家学习新概念
获得对主题或工具的基础理解
通过实践项目培养工作相关技能
通过 Coursera 获得可共享的职业证书

该课程共有20个模块

Build production-ready multimodal AI systems that combine vision, language, and audio into unified intelligent applications. This course takes you through the full lifecycle of multimodal model development — from constructing and fine-tuning transformer-based architectures using PyTorch and TensorFlow, to diagnosing training failures, designing cross-modal retrieval systems, and deploying secure, monitored inference APIs.

You will work with real-world tools including CLIP, ViT, FAISS, FastAPI, MLflow, and Ray Tune to build systems that process and integrate multiple data types simultaneously. You will analyze computational complexity to optimize fusion algorithms, evaluate model errors to identify failure patterns, and translate model outputs into stakeholder-ready business insights. This course is built for intermediate practitioners in machine learning and AI who want to move beyond single-modality models and into the cutting edge of AI systems design. By the end, you will have a portfolio of deployable, optimized multimodal systems that demonstrate advanced engineering capability to employers.

You will build the foundational MLOps infrastructure for multimodal AI systems by designing modular data pipeline components and implementing your first multimodal transformer fine-tuning workflow using open source tools.

涵盖的内容

3个视频1篇阅读材料1个作业1个非评分实验室

3个视频总计12分钟

Why Modular Data Pipelines Matter in Enterprise Environments2分钟
Open Source Tools for Pipeline Development: Spark, dbt, and Airflow6分钟
Fine-tuning Multimodal Transformers3分钟

1篇阅读材料总计12分钟

Fundamentals of Modular Data Pipeline Architecture12分钟

1个作业总计3分钟

Modular Pipeline Foundations Knowledge Check3分钟

1个非评分实验室总计20分钟

Building Your First Modular Pipeline Component20分钟

You will accelerate multimodal model development using transfer learning techniques and implement the transformation and loading pipeline stages that deliver processed data and trained models reliably to downstream systems.

涵盖的内容

1个视频1篇阅读材料3个作业

You will identify and analyze training and validation metric patterns to diagnose overfitting and gradient stability issues using TensorBoard visualization tools.

涵盖的内容

2个视频1篇阅读材料1个作业1个非评分实验室

2个视频总计8分钟

When Neural Networks Fail: The Hidden Cost of Training Problems2分钟
Understanding Training Dynamics: Patterns, Gradients, and Warning Signs6分钟

1篇阅读材料总计10分钟

Mathematical Foundations of Gradient Analysis10分钟

1个作业总计3分钟

Training Dynamics Diagnosis Assessment3分钟

1个非评分实验室总计20分钟

Neural Network Training Diagnostics Lab20分钟

You will implement targeted interventions including gradient clipping and early stopping to stabilize training processes and prevent common neural network training failures.

涵盖的内容

1个视频1篇阅读材料3个作业

1个视频总计12分钟

Implementing Gradient Clipping in TensorFlow and PyTorch12分钟

1篇阅读材料总计12分钟

Training Stabilization Techniques: Gradient Clipping and Early Stopping12分钟

3个作业总计31分钟

Final Assessment: Neural Network Training Stabilization10分钟
Training Pipeline Stabilization Implementation18分钟
Training Stabilization Techniques Assessment3分钟

You will learn systematic image preprocessing techniques including normalization and color-space conversions to prepare raw visual data for computer vision applications.

涵盖的内容

3个视频1篇阅读材料1个作业1个非评分实验室

3个视频总计17分钟

Why Image Preprocessing Matters in Computer Vision3分钟
Implementing Normalization Techniques with NumPy7分钟
Converting Between Color Spaces with OpenCV7分钟

1篇阅读材料总计10分钟

Fundamentals of Image Normalization and Color Space Theory10分钟

1个作业总计8分钟

Image Preprocessing Fundamentals Assessment8分钟

1个非评分实验室总计18分钟

Image Preprocessing Pipeline: Normalization & Color-Space Transformations18分钟

You will learn optical flow and frame differencing techniques to extract temporal motion features from video sequences for computer vision applications.

涵盖的内容

2个视频1篇阅读材料2个作业

You will establish foundational understanding of systematic error analysis approaches and learn to evaluate computer vision model performance beyond basic accuracy metrics.

涵盖的内容

2个视频1篇阅读材料1个作业1个非评分实验室

2个视频总计10分钟

Why Systematic Error Analysis Matters in Computer Vision3分钟
Understanding Confusion Matrices and Error Categories7分钟

1篇阅读材料总计12分钟

Foundations of Computer Vision Error Analysis12分钟

1个作业总计8分钟

Evaluating Error Analysis Fundamentals8分钟

1个非评分实验室总计20分钟

Hands-On Confusion Matrix Analysis for Computer Vision Models20分钟

You will apply advanced techniques to identify systematic failure patterns in computer vision models and generate comprehensive quality reports for model improvement.

涵盖的内容

1个视频1篇阅读材料3个作业

You will build foundational understanding of cross-modal retrieval systems and implement approximate nearest-neighbor search algorithms using FAISS for production-scale similarity search across multimodal embeddings.

涵盖的内容

1个视频2篇阅读材料1个作业1个非评分实验室

1个视频总计7分钟

Fundamentals of Cross-Modal Retrieval Systems7分钟

2篇阅读材料总计18分钟

FAISS Architecture and Index Types for Production Systems10分钟
Implementing FAISS Indexing for Cross-Modal Search8分钟

1个作业总计3分钟

Cross-Modal Retrieval and FAISS Implementation Assessment3分钟

1个非评分实验室总计15分钟

Building Production-Scale Cross-Modal Retrieval with FAISS15分钟

You will design and implement sophisticated attention-based fusion algorithms that intelligently combine visual and textual embeddings, mastering the creation of multimodal neural architectures for advanced cross-modal AI applications.

涵盖的内容

2篇阅读材料3个作业

You will learn the foundational concepts of computational complexity analysis, learning to systematically evaluate fusion algorithms using Big O notation and profiling tools.

涵盖的内容

3个视频1篇阅读材料1个作业1个非评分实验室

3个视频总计16分钟

Why Algorithm Complexity Analysis Matters in Production AI3分钟
Applying Big O Analysis to Fusion Algorithm Components7分钟
Profiling Fusion Algorithms with cProfile6分钟

1篇阅读材料总计8分钟

Fundamentals of Computational Complexity in Fusion Algorithms8分钟

1个作业总计5分钟

Complexity Analysis Fundamentals Assessment5分钟

1个非评分实验室总计18分钟

Profile and Analyze Fusion Algorithm Performance18分钟

You will apply complexity analysis skills to make strategic optimization decisions, evaluating trade-offs between performance, accuracy, and resource constraints in real-world deployment scenarios.

涵盖的内容

1个视频3个作业

You will learn the systematic evaluation of production ML models to identify performance degradation and implement drift detection systems that automatically trigger remediation actions.

涵盖的内容

1个视频1篇阅读材料1个作业1个非评分实验室

You will build comprehensive automated ML pipelines with integrated hyperparameter optimization and end-to-end automation that maintains model performance in production environments.

涵盖的内容

2个视频1篇阅读材料3个作业

2个视频总计15分钟

End-to-End ML Pipeline Architecture and Components7分钟
Building Automated ML Pipelines with Ray Tune and MLflow8分钟

1篇阅读材料总计10分钟

Hyperparameter Optimization Strategies and Integration Patterns10分钟

3个作业总计28分钟

Final Course Assessment - Automated ML Operations10分钟
Enterprise ML Pipeline Implementation15分钟
Automated ML Pipeline Mastery Assessment3分钟

You will build foundational skills for systematically analyzing multimodal AI model outputs, understanding cross-modal relationships, and preparing technical findings for stakeholder communication.

涵盖的内容

2个视频1篇阅读材料1个作业1个非评分实验室

2个视频总计10分钟

The Business Impact of Multimodal AI Interpretation3分钟
Explainability Tools and Techniques for Multimodal Analysis7分钟

1篇阅读材料总计10分钟

Understanding Multimodal AI Model Architecture and Output Patterns10分钟

1个作业总计3分钟

Multimodal Analysis Fundamentals Knowledge Check3分钟

1个非评分实验室总计20分钟

Multimodal AI Model Analysis for Business Stakeholders20分钟

You will learn the critical skills of translating complex multimodal AI analysis into compelling business narratives, creating executive-level presentations, and developing stakeholder communication frameworks that drive strategic decisions.

涵盖的内容

2个视频1篇阅读材料3个作业

2个视频总计11分钟

When Technical Excellence Isn't Enough: The Communication Gap in AI3分钟
Creating Executive Briefings from Technical AI Analysis8分钟

1篇阅读材料总计10分钟

Business Narrative Frameworks for AI Insights10分钟

3个作业总计38分钟

Comprehensive Multimodal AI Analysis and Stakeholder Communication Assessment15分钟
Developing Comprehensive Executive Briefing from Multimodal Analysis20分钟
Stakeholder Communication Fundamentals Knowledge Check3分钟

You will design and implement versioned API endpoints specifically optimized for multimodal AI inference workloads

涵盖的内容

3个视频1篇阅读材料2个作业

3个视频总计15分钟

Why API Versioning Matters for Multimodal AI Services3分钟
Fundamentals of Multimodal API Endpoint Design7分钟
Implementing Versioned Endpoints with FastAPI4分钟

1篇阅读材料总计10分钟

Designing Robust Data Contracts for Multimodal Inputs10分钟

2个作业总计21分钟

Build a Versioned Multimodal API Prototype18分钟
API Endpoint Design Knowledge Check3分钟

You will implement comprehensive OAuth2 authentication systems and observability middleware for production API services

涵盖的内容

2个视频1篇阅读材料2个作业

2个视频总计14分钟

OAuth2 Authentication and API Security Fundamentals7分钟
Implementing OAuth2 Security Middleware with FastAPI7分钟

1篇阅读材料总计12分钟

Implementing Comprehensive API Monitoring and Observability12分钟

2个作业总计23分钟

Build Comprehensive Security and Monitoring Middleware20分钟
Security and Monitoring Implementation Knowledge Check3分钟

You will create comprehensive OpenAPI specifications that enable automated testing, client generation, and seamless integration

涵盖的内容

2个视频1篇阅读材料2个作业1个非评分实验室

2个视频总计12分钟

Why Comprehensive API Documentation Drives Developer Adoption4分钟
Advanced OpenAPI Features for Multimodal APIs8分钟

1篇阅读材料总计11分钟

OpenAPI Specification Design for Developer Integration11分钟

2个作业总计18分钟

Comprehensive OpenAPI Documentation Assessment15分钟
OpenAPI Documentation Knowledge Check3分钟

1个非评分实验室总计20分钟

OpenAPI Specification for Multimodal AI Services20分钟

You will build a production-grade multimodal AI system that processes visual and textual data, integrating fine-tuning, cross-modal fusion, and deployment-ready inference services.This capstone synthesizes model optimization, data engineering, API design, and MLOps practices to deliver a deployable, monitored multimodal application.

涵盖的内容

4篇阅读材料1个作业

获得职业证书

将此证书添加到您的 LinkedIn 个人资料、简历或履历中。在社交媒体和绩效考核中分享。

位教师

Professionals from the Industry

474 门课程89,087 名学生

提供方

Coursera

从 Algorithms 浏览更多内容

Coursera
Multimodal Intelligence - Vision, Audio & Language in Action
专业证书
Coursera
Fine-tune Multimodal Models with Transfer Learning
课程
Coursera
Career Development for Multimodal Intelligence
课程
Coursera
Production-Ready Multimodal ML Engineering
课程

人们为什么选择 Coursera 来帮助自己实现职业发展

Felipe M.

自 2018开始学习的学生

''能够按照自己的速度和节奏学习课程是一次很棒的经历。只要符合自己的时间表和心情，我就可以学习。'

Jennifer J.

自 2020开始学习的学生

''我直接将从课程中学到的概念和技能应用到一个令人兴奋的新工作项目中。'

Larry W.

自 2021开始学习的学生

''如果我的大学不提供我需要的主题课程，Coursera 便是最好的去处之一。'

Chaitanya A.

''学习不仅仅是在工作中做的更好：它远不止于此。Coursera 让我无限制地学习。'

常见问题

To access the course materials, assignments and to earn a Certificate, you will need to purchase the Certificate experience when you enroll in a course. You can try a Free Trial instead, or apply for Financial Aid. The course may offer 'Full Course, No Certificate' instead. This option lets you see all course materials, submit required assessments, and get a final grade. This also means that you will not be able to purchase a Certificate experience.

When you enroll in the course, you get access to all of the courses in the Certificate, and you earn a certificate when you complete the work. Your electronic Certificate will be added to your Accomplishments page - from there, you can print your Certificate or add it to your LinkedIn profile.