Learn to build AI that sees, hears, and understands the world in an integrated way. This course takes you beyond single-modality models, teaching you to architect applications that connect different data types like text, images, and speech.
Starting with text-to-image generation, you will progress to integrating various AI components and orchestrating the full power of Azure AI Services to build sophisticated, cross-modal solutions. By the end, you'll be equipped to design the next generation of intelligent, multi-faceted AI applications.
This module introduces the foundational concepts of multimodal AI. You will learn the architectural patterns for combining different AI components, such as text and image models, and progress from basic integration to building complex systems that can reason across multiple data types.
Important Notice on the Azure Interface: The screencast videos and screenshots were last updated in late 2025.
Please be aware that Microsoft may have updated the Azure interface since then. If the steps shown in the course materials look different from your current Azure environment, please follow the most up-to-date interface, as the underlying concepts and learning objectives remain the same.
涵盖的内容
4个视频9篇阅读材料7个作业
显示有关单元内容的信息
4个视频•总计18分钟
Introduction to Microsoft Generative AI engineering certification•4分钟
Introduction to multimodal and cross-modal integrations course•3分钟
Understanding multimodal AI•5分钟
Advanced multimodal applications•5分钟
9篇阅读材料•总计95分钟
Course syllabus and recommended background•5分钟
Components of multimodal AI setup•15分钟
Visualizing a multimodal workflow•15分钟
Architectural choices in multimodal AI: Single model vs. chained pipelines•10分钟
Analyzing your first multimodal integration•10分钟
Advanced integration strategies and use cases•10分钟
Insights on advanced multimodal AI•10分钟
Case study: Designing a multimodal product search•10分钟
Module 1 summary: From architectural theory to practical integration•10分钟
7个作业•总计195分钟
First steps with a true multimodal model•15分钟
Building your first multimodal pipeline•30分钟
Multimodal integration: Practice Quiz•30分钟
Building a multimodal system•30分钟
Architecting a complex multimodal solution•30分钟
Advanced multimodal skills: Practice Quiz•30分钟
Module 1 evaluation: Graded Quiz•30分钟
Text-to-image generation
第 2 单元•小时 后完成
单元详情
This module provides a deep dive into the popular and creative task of generating images from text descriptions. You will explore the models that power this technology, like DALL·E, and learn both basic and advanced prompting techniques to craft and refine specific, high-quality visual outputs.
Important Notice on the Azure Interface: The screencast videos and screenshots were last updated in late 2025.
Please be aware that Microsoft may have updated the Azure interface since then. If the steps shown in the course materials look different from your current Azure environment, please follow the most up-to-date interface, as the underlying concepts and learning objectives remain the same.
涵盖的内容
5个视频5篇阅读材料5个作业
显示有关单元内容的信息
5个视频•总计19分钟
Module 2 introduction: From words to worlds with text-to-image models•6分钟
From text to image in practice•4分钟
Text-to-image model comparisons•3分钟
Mastering text-to-image control•3分钟
Module 2 summary: From architecture to artistic control•3分钟
5篇阅读材料•总计50分钟
Exploration of text-to-image practices•10分钟
Insights from text-to-image applications•10分钟
Advanced text-to-image techniques•10分钟
Advanced text-to-image insights•10分钟
Case study: A creative workflow for a marketing campaign•10分钟
5个作业•总计180分钟
Generating and refining images with text-to-image prompts•30分钟
Solving text-to-image challenges: Practice Quiz•30分钟
Module 2 evaluation: Graded Quiz•30分钟
Cross-modal applications with Azure AI vision
第 3 单元•小时 后完成
单元详情
This module focuses on practical implementation using a powerful, specialized tool. You will leverage the features of Azure AI Vision to build and optimize cross-modal applications like image captioning and visual search. You'll learn how this single service can analyze visual content to generate rich textual descriptions and extract embedded text (OCR), providing the core components for sophisticated multimodal solutions.
Important Notice on the Azure Interface: The screencast videos and screenshots were last updated in late 2025.
Please be aware that Microsoft may have updated the Azure interface since then. If the steps shown in the course materials look different from your current Azure environment, please follow the most up-to-date interface, as the underlying concepts and learning objectives remain the same.
涵盖的内容
7个视频6篇阅读材料7个作业
显示有关单元内容的信息
7个视频•总计28分钟
An overview of the Azure AI services toolkit•5分钟
Module 3 introduction: The multiple applications of Azure AI Vision•3分钟
Bringing sight to your applications with Azure AI Vision•5分钟
Getting started with Azure AI Vision•4分钟
Exploring cross-modal features in Vision Studio•4分钟
Refining cross-modal applications•6分钟
Module 3 summary: From a single feature to a complete vision solution•2分钟
6篇阅读材料•总计60分钟
Prototyping vs. production: The role of Vision Studio•10分钟
Cross-modal AI implementation insights•10分钟
Interpreting OCR results with the SDK•10分钟
Advanced strategies for cross-modal AI•10分钟
Optimizing multimodal workflows•10分钟
Case study: Building an automated inventory checker•10分钟
7个作业•总计255分钟
Exploring cross-modal techniques•30分钟
Extract text from images•60分钟
Cross-modal techniques quiz: Practice Quiz•30分钟
Chaining vision skills with the Python SDK•30分钟
Extending a multimodal application•45分钟
Advanced cross-modal skills: Practice Quiz•30分钟
Module 3 evaluation: Graded Quiz•30分钟
Advanced AI integration with Azure services
第 4 单元•小时 后完成
单元详情
This capstone module builds upon your deep expertise in Azure AI Vision. You will learn to integrate your vision applications with other powerful Azure AI Services, such as Language and Speech, to create comprehensive, end-to-end solutions. The focus will be on orchestrating these distinct services to develop a sophisticated application that solves a real-world business problem, demonstrating your ability to design and build a complete multimodal system from the ground up.
Important Notice on the Azure Interface: The screencast videos and screenshots were last updated in late 2025.
Please be aware that Microsoft may have updated the Azure interface since then. If the steps shown in the course materials look different from your current Azure environment, please follow the most up-to-date interface, as the underlying concepts and learning objectives remain the same.
涵盖的内容
6个视频5篇阅读材料5个作业
显示有关单元内容的信息
6个视频•总计26分钟
Module 4 introduction: Building an end-to-end solution•3分钟
Orchestrating Azure AI services: A demonstration•6分钟
Setting up your environment for integration•6分钟
Demonstrating text-to-speech with the SDK•6分钟
Module 4 summary: Orchestrating a full AI solution•2分钟
Course summary•3分钟
5篇阅读材料•总计60分钟
Integrating Azure AI services•15分钟
Managing multimodal workflows•10分钟
Adding speech to your application•15分钟
Analyzing your end-to-end application•10分钟
Production considerations for multimodal apps•10分钟
5个作业•总计210分钟
Integrating Vision with the language service•60分钟
Designing multimodal workflows: Practice Quiz•30分钟
Building an end-to-end multimodal application•60分钟
Analyzing multimodal solutions: Practice Quiz•30分钟
Our goal at Microsoft is to empower every individual and organization on the planet to achieve more.
In this next revolution of digital transformation, growth is being driven by technology. Our integrated cloud approach creates an unmatched platform for digital transformation. We address the real-world needs of customers by seamlessly integrating Microsoft 365, Dynamics 365, LinkedIn, GitHub, Microsoft Power Platform, and Azure to unlock business value for every organization—from large enterprises to family-run businesses. The backbone and foundation of this is Azure.
When will I have access to the lectures and assignments?
To access the course materials, assignments and to earn a Certificate, you will need to purchase the Certificate experience when you enroll in a course. You can try a Free Trial instead, or apply for Financial Aid. The course may offer 'Full Course, No Certificate' instead. This option lets you see all course materials, submit required assessments, and get a final grade. This also means that you will not be able to purchase a Certificate experience.
What will I get if I subscribe to this Certificate?
When you enroll in the course, you get access to all of the courses in the Certificate, and you earn a certificate when you complete the work. Your electronic Certificate will be added to your Accomplishments page - from there, you can print your Certificate or add it to your LinkedIn profile.