Multi-modal AI

本课程是 AI Tooling 专项课程的一部分

位教师：Alfredo Deza

包含在中

了解更多

3个模块

深入了解一个主题并学习基础知识。

初级等级

推荐体验

3 小时完成

灵活的计划

自行安排学习进度

3个模块

深入了解一个主题并学习基础知识。

初级等级

推荐体验

3 小时完成

灵活的计划

自行安排学习进度

您将学到什么

Apply multi-modal AI techniques to convert screenshots into working code using prompt engineering with visual context, GitHub Copilot

您将获得的技能

您将学习的工具

要了解的详细信息

可分享的证书

添加到您的领英档案

了解顶级公司的员工如何掌握热门技能

了解关于 Coursera for Business 的更多信息

Petrobras, TATA, Danone, Capgemini, P&G 和 L'Oreal 的徽标

积累特定领域的专业知识

本课程是 AI Tooling 专项课程专项课程的一部分

在注册此课程时，您还会同时注册此专项课程。

向行业专家学习新概念
获得对主题或工具的基础理解
通过实践项目培养工作相关技能
获得可共享的职业证书

该课程共有3个模块

Learn to build production applications by combining visual and textual inputs with AI coding tools. You will explore multi-modal programming where screenshots, images, and text serve as inputs for AI-assisted code generation, and set up development environments configured for visual AI workflows. The course covers prompt engineering with visual context to improve code generation accuracy, and hands-on development with GitHub Copilot in VS Code for inline suggestions and chat-based interactions. You will build a complete project using live reload and browser developer tools for rapid feedback between AI generation and visual output. The iterative development module teaches documentation-driven design where documentation guides AI toward desired outcomes, image-based iteration for refining generated code through visual comparison, and automated checks and validations that maintain quality through development cycles. You will learn to identify and overcome common iteration challenges including regression and context drift. The advanced module covers Model Context Protocol for connecting AI tools with external capabilities, Playwright for browser automation and visual testing, and Playwright MCP for AI-driven browser interactions that validate web applications directly. By completing this course, you will be able to convert screenshots into production code through iterative, automated, multi-modal AI workflows.

Covers multi-modal, screenshots, overview, programming, and visual.

涵盖的内容

15个视频6篇阅读材料

15个视频总计52分钟

Course Introduction1分钟
What Is Multi-Modal Programming3分钟
Setting Up Multi-Modal Dev Environments6分钟
Your First Screenshot to Code Conversion6分钟
Lesson 1.1 Conclusion0分钟
Prompt Engineering Introduction1分钟
Prompt Engineering with Visual Context5分钟
Introduction to GitHub Copilot and VS Code5分钟
Developing with GitHub Copilot5分钟
Lesson 1.2 Conclusion1分钟
Building Introduction1分钟
What Will We Build4分钟
Live Reload and Developer Tools7分钟
Setting Up the Development Environment7分钟
Lesson 1.3 Conclusion1分钟

6篇阅读材料总计6分钟

Key Terms1分钟
Reflection1分钟
Key Terms1分钟
Reflection1分钟
Key Terms1分钟
Reflection1分钟

Covers iterative, documentation, iteration, designing, and context.

涵盖的内容

12个视频4篇阅读材料

12个视频总计39分钟

MCP and Automation Introduction1分钟
Introduction to MCP4分钟
Overview of Playwright4分钟
Using Playwright MCP6分钟
Overview of What We Built3分钟
Course Conclusion1分钟
Iterative Development Introduction1分钟
Designing with Documentation4分钟
Iterating Over First Changes4分钟
Using Images for Iteration6分钟
Challenges with Iteration3分钟
Automating Checks and Validations4分钟

4篇阅读材料总计40分钟

Key Terms10分钟
Reflection: MCP and Automation10分钟
Key Terms10分钟
Reflection: Iterative Development10分钟

Build a web application using multi-modal AI development techniques, progressing from screenshot-to-code conversion through iterative refinement with visual feedback to automated browser testing with MCP and Playwright. The project demonstrates the complete multi-modal development lifecycle including prompt engineering with visual context, GitHub Copilot integration, and documentation-driven iteration.

涵盖的内容

3篇阅读材料1个作业

获得职业证书

将此证书添加到您的 LinkedIn 个人资料、简历或履历中。在社交媒体和绩效考核中分享。

位教师

Alfredo Deza

Pragmatic AI Labs

35 门课程1,906 名学生

提供方

Pragmatic AI Labs

从 Software Development 浏览更多内容

Microsoft
Multimodal and cross-modal AI integrations
课程
Pragmatic AI Labs
GitHub: Advanced Prompt Engineering for Code
课程
Pragmatic AI Labs
GitHub Production Applications
课程
IBM
Build Multimodal Generative AI Applications
课程

人们为什么选择 Coursera 来帮助自己实现职业发展

Felipe M.

自 2018开始学习的学生

''能够按照自己的速度和节奏学习课程是一次很棒的经历。只要符合自己的时间表和心情，我就可以学习。'

Jennifer J.

自 2020开始学习的学生

''我直接将从课程中学到的概念和技能应用到一个令人兴奋的新工作项目中。'

Larry W.

自 2021开始学习的学生

''如果我的大学不提供我需要的主题课程，Coursera 便是最好的去处之一。'

Chaitanya A.

''学习不仅仅是在工作中做的更好：它远不止于此。Coursera 让我无限制地学习。'

通过订阅解锁 10,000 多门课程的访问权限
通过在线学位推动您的职业生涯
获取世界一流大学的学位 - 100% 在线
加入全球超过 4,700 家选择 Coursera for Business 的公司

常见问题

To access the course materials, assignments and to earn a Certificate, you will need to purchase the Certificate experience when you enroll in a course. You can try a Free Trial instead, or apply for Financial Aid. The course may offer 'Full Course, No Certificate' instead. This option lets you see all course materials, submit required assessments, and get a final grade. This also means that you will not be able to purchase a Certificate experience.

When you enroll in the course, you get access to all of the courses in the Specialization, and you earn a certificate when you complete the work. Your electronic Certificate will be added to your Accomplishments page - from there, you can print your Certificate or add it to your LinkedIn profile.

Yes. In select learning programs, you can apply for financial aid or a scholarship if you can’t afford the enrollment fee. If fin aid or scholarship is available for your learning program selection, you’ll find a link to apply on the description page.

Multi-modal AI

Multi-modal AI

您将学到什么

您将获得的技能

您将学习的工具

要了解的详细信息

了解顶级公司的员工如何掌握热门技能

积累特定领域的专业知识

该课程共有3个模块

Multi-Modal Development Foundations

涵盖的内容

15个视频总计52分钟

6篇阅读材料总计6分钟

Iterative Development and Automation

涵盖的内容

12个视频总计39分钟

4篇阅读材料总计40分钟

Capstone Project

涵盖的内容

3篇阅读材料总计21分钟

1个作业总计15分钟

获得职业证书

位教师

提供方

从 Software Development 浏览更多内容

Multimodal and cross-modal AI integrations

GitHub: Advanced Prompt Engineering for Code

GitHub Production Applications

Build Multimodal Generative AI Applications

人们为什么选择 Coursera 来帮助自己实现职业发展

Felipe M.

Jennifer J.

Larry W.

Chaitanya A.

通过订阅解锁 10,000 多门课程的访问权限

通过在线学位推动您的职业生涯

加入全球超过 4,700 家选择 Coursera for Business 的公司

常见问题

更多问题

Multi-modal AI

Multi-modal AI

您将学到什么

您将获得的技能

您将学习的工具

要了解的详细信息

了解顶级公司的员工如何掌握热门技能

积累特定领域的专业知识

该课程共有3个模块

Multi-Modal Development Foundations

涵盖的内容

Iterative Development and Automation

涵盖的内容

Capstone Project

涵盖的内容

获得职业证书

位教师

提供方

从 Software Development 浏览更多内容

Multimodal and cross-modal AI integrations

GitHub: Advanced Prompt Engineering for Code

GitHub Production Applications

Build Multimodal Generative AI Applications

人们为什么选择 Coursera 来帮助自己实现职业发展

Felipe M.

Jennifer J.

Larry W.

Chaitanya A.

通过订阅解锁 10,000 多门课程的访问权限

通过在线学位推动您的职业生涯

加入全球超过 4,700 家选择 Coursera for Business 的公司

常见问题

When will I have access to the lectures and assignments?

What will I get if I subscribe to this Specialization?

Is financial aid available?

更多问题