When will I have access to the lectures and assignments?

To access the course materials, assignments and to earn a Certificate, you will need to purchase the Certificate experience when you enroll in a course. You can try a Free Trial instead, or apply for Financial Aid. The course may offer 'Full Course, No Certificate' instead. This option lets you see all course materials, submit required assessments, and get a final grade. This also means that you will not be able to purchase a Certificate experience.

What will I get if I subscribe to this Certificate?

When you enroll in the course, you get access to all of the courses in the Certificate, and you earn a certificate when you complete the work. Your electronic Certificate will be added to your Accomplishments page - from there, you can print your Certificate or add it to your LinkedIn profile.

Multimodal and cross-modal AI integrations

本课程是 Microsoft Generative AI Engineering 专业证书的一部分

位教师： Microsoft

包含在中

了解更多

4个模块

深入了解一个主题并学习基础知识。

中级等级

推荐体验

2 周完成

在 10 小时一周

灵活的计划

自行安排学习进度

4个模块

深入了解一个主题并学习基础知识。

中级等级

推荐体验

2 周完成

在 10 小时一周

灵活的计划

自行安排学习进度

您将获得的技能

您将学习的工具

要了解的详细信息

可分享的证书

添加到您的领英档案

了解顶级公司的员工如何掌握热门技能

了解关于 Coursera for Business 的更多信息

Petrobras, TATA, Danone, Capgemini, P&G 和 L'Oreal 的徽标

积累 Software Development 领域的专业知识

本课程是 Microsoft Generative AI Engineering 专业证书专项课程的一部分

在注册此课程时，您还会同时注册此专业证书。

向行业专家学习新概念
获得对主题或工具的基础理解
通过实践项目培养工作相关技能
通过 Microsoft 获得可共享的职业证书

该课程共有4个模块

Learn to build AI that sees, hears, and understands the world in an integrated way. This course takes you beyond single-modality models, teaching you to architect applications that connect different data types like text, images, and speech.

单元详情

This module introduces the foundational concepts of multimodal AI. You will learn the architectural patterns for combining different AI components, such as text and image models, and progress from basic integration to building complex systems that can reason across multiple data types. Important Notice on the Azure Interface: The screencast videos and screenshots were last updated in late 2025. Please be aware that Microsoft may have updated the Azure interface since then. If the steps shown in the course materials look different from your current Azure environment, please follow the most up-to-date interface, as the underlying concepts and learning objectives remain the same.

涵盖的内容

4个视频9篇阅读材料7个作业

4个视频总计18分钟

Introduction to Microsoft Generative AI engineering certification4分钟
Introduction to multimodal and cross-modal integrations course3分钟
Understanding multimodal AI5分钟
Advanced multimodal applications5分钟

9篇阅读材料总计95分钟

Course syllabus and recommended background5分钟
Components of multimodal AI setup15分钟
Visualizing a multimodal workflow15分钟
Architectural choices in multimodal AI: Single model vs. chained pipelines10分钟
Analyzing your first multimodal integration10分钟
Advanced integration strategies and use cases10分钟
Insights on advanced multimodal AI10分钟
Case study: Designing a multimodal product search10分钟
Module 1 summary: From architectural theory to practical integration10分钟

7个作业总计195分钟

First steps with a true multimodal model15分钟
Building your first multimodal pipeline30分钟
Multimodal integration: Practice Quiz30分钟
Building a multimodal system30分钟
Architecting a complex multimodal solution30分钟
Advanced multimodal skills: Practice Quiz30分钟
Module 1 evaluation: Graded Quiz30分钟

This module provides a deep dive into the popular and creative task of generating images from text descriptions. You will explore the models that power this technology, like DALL·E, and learn both basic and advanced prompting techniques to craft and refine specific, high-quality visual outputs. Important Notice on the Azure Interface: The screencast videos and screenshots were last updated in late 2025. Please be aware that Microsoft may have updated the Azure interface since then. If the steps shown in the course materials look different from your current Azure environment, please follow the most up-to-date interface, as the underlying concepts and learning objectives remain the same.

涵盖的内容

5个视频5篇阅读材料5个作业

5个视频总计19分钟

Module 2 introduction: From words to worlds with text-to-image models6分钟
From text to image in practice4分钟
Text-to-image model comparisons3分钟
Mastering text-to-image control3分钟
Module 2 summary: From architecture to artistic control3分钟

5篇阅读材料总计50分钟

Exploration of text-to-image practices10分钟
Insights from text-to-image applications10分钟
Advanced text-to-image techniques10分钟
Advanced text-to-image insights10分钟
Case study: A creative workflow for a marketing campaign10分钟

5个作业总计180分钟

Generating and refining images with text-to-image prompts30分钟
Text-to-image skills: Practice Quiz30分钟
Synthesizing advanced text-to-image workflows60分钟
Solving text-to-image challenges: Practice Quiz30分钟
Module 2 evaluation: Graded Quiz30分钟

This module focuses on practical implementation using a powerful, specialized tool. You will leverage the features of Azure AI Vision to build and optimize cross-modal applications like image captioning and visual search. You'll learn how this single service can analyze visual content to generate rich textual descriptions and extract embedded text (OCR), providing the core components for sophisticated multimodal solutions. Important Notice on the Azure Interface: The screencast videos and screenshots were last updated in late 2025. Please be aware that Microsoft may have updated the Azure interface since then. If the steps shown in the course materials look different from your current Azure environment, please follow the most up-to-date interface, as the underlying concepts and learning objectives remain the same.

涵盖的内容

7个视频6篇阅读材料7个作业

7个视频总计28分钟

An overview of the Azure AI services toolkit5分钟
Module 3 introduction: The multiple applications of Azure AI Vision3分钟
Bringing sight to your applications with Azure AI Vision5分钟
Getting started with Azure AI Vision4分钟
Exploring cross-modal features in Vision Studio4分钟
Refining cross-modal applications6分钟
Module 3 summary: From a single feature to a complete vision solution2分钟

6篇阅读材料总计60分钟

Prototyping vs. production: The role of Vision Studio10分钟
Cross-modal AI implementation insights10分钟
Interpreting OCR results with the SDK10分钟
Advanced strategies for cross-modal AI10分钟
Optimizing multimodal workflows10分钟
Case study: Building an automated inventory checker10分钟

7个作业总计255分钟

Exploring cross-modal techniques30分钟
Extract text from images60分钟
Cross-modal techniques quiz: Practice Quiz30分钟
Chaining vision skills with the Python SDK30分钟
Extending a multimodal application45分钟
Advanced cross-modal skills: Practice Quiz30分钟
Module 3 evaluation: Graded Quiz30分钟

This capstone module builds upon your deep expertise in Azure AI Vision. You will learn to integrate your vision applications with other powerful Azure AI Services, such as Language and Speech, to create comprehensive, end-to-end solutions. The focus will be on orchestrating these distinct services to develop a sophisticated application that solves a real-world business problem, demonstrating your ability to design and build a complete multimodal system from the ground up. Important Notice on the Azure Interface: The screencast videos and screenshots were last updated in late 2025. Please be aware that Microsoft may have updated the Azure interface since then. If the steps shown in the course materials look different from your current Azure environment, please follow the most up-to-date interface, as the underlying concepts and learning objectives remain the same.