This comprehensive course on Evaluating and Applying LLM Capabilities equips you with the skills to analyze, implement, and assess large language models in real-world scenarios. Begin with core capabilities, learn summarization, translation, and how LLMs power industry-relevant content generation. Progress to interactive and analytical applications—explore chatbots, virtual assistants, and sentiment analysis with hands-on demos using LangChain and ChromaDB. Conclude with benchmarking and evaluation—master frameworks like ROUGE, GLUE, SuperGLUE, and BIG-bench to measure model accuracy, relevance, and performance.


您将学到什么
Analyze Core LLM Capabilities: Master summarization, translation, and content generation
Build GenAI Applications: Create chatbots and sentiment analysis tools with LangChain
Evaluate LLM Performance: Use benchmarks like ROUGE, GLUE, and BIG-bench
Apply Real-World Use Cases: Understand industrial applications and limitations of LLMs
您将获得的技能
要了解的详细信息

添加到您的领英档案
July 2025
10 项作业
了解顶级公司的员工如何掌握热门技能

积累特定领域的专业知识
- 向行业专家学习新概念
- 获得对主题或工具的基础理解
- 通过实践项目培养工作相关技能
- 获得可共享的职业证书

该课程共有3个模块
Explore the core capabilities of large language models (LLMs) in this foundational module. Learn the four key functions that power LLM performance, including summarization and content translation. Understand their benefits, limitations, and real-world applications across industries. Gain hands-on experience with a text summarization demo and discover how LLMs transform content across languages.
涵盖的内容
5个视频1篇阅读材料4个作业
Discover how LLMs power interactive and analytical applications in this module. Learn the role of chatbots and virtual assistants in automating conversations across industries. Explore sentiment analysis to interpret user emotions and feedback. Gain hands-on experience with demos like MultiPDF QA Retriever using ChromaDB and LangChain, and real-time sentiment detection.
涵盖的内容
4个视频3个作业
Explore how to evaluate and benchmark large language models in this comprehensive module. Learn key benchmarking steps and widely used frameworks like ROUGE, GLUE, SuperGLUE, and BIG-bench. Understand the need for evolving benchmarks as LLMs grow more advanced. Get hands-on with demos to assess performance, accuracy, and real-world application of generative AI models.
涵盖的内容
9个视频3个作业
获得职业证书
将此证书添加到您的 LinkedIn 个人资料、简历或履历中。在社交媒体和绩效考核中分享。
位教师

提供方
从 Machine Learning 浏览更多内容
- 状态:免费试用
DeepLearning.AI
- 状态:免费
DeepLearning.AI
- 状态:免费试用
人们为什么选择 Coursera 来帮助自己实现职业发展




常见问题
LLM evaluation benchmarks are standardized tests used to assess the performance, reasoning, and language understanding of large language models. Examples include ROUGE, GLUE, SuperGLUE, and BIG-bench.
Creating a benchmark involves defining clear tasks (e.g., summarization, QA), collecting diverse datasets, selecting evaluation metrics (like F1 or accuracy), and validating the benchmark against multiple LLMs.
Common metrics include ROUGE for summarization, BLEU for translation, accuracy, F1-score, and exact match for QA tasks, along with emerging task-specific metrics for generative performance.
更多问题
提供助学金,