Name: Reinforcement Learning from Human Feedback
Brand: DeepLearning.AI
Availability: OnlineOnly
Rating: 4.7 (6 reviews)

返回到 Reinforcement Learning from Human Feedback

学生对 DeepLearning.AI 提供的 Reinforcement Learning from Human Feedback 的评价和反馈

4.7

星

31 个评分

课程概述

Large language models (LLMs) are trained on human-generated text, but additional methods are needed to align an LLM with human values and preferences. Reinforcement Learning from Human Feedback (RLHF) is currently the main method for aligning LLMs with human values and preferences. RLHF is also used for further tuning a base LLM to align with values and preferences that are specific to your use case. In this course, you will gain a conceptual understanding of the RLHF training process, and then practice applying RLHF to tune an LLM. You will: 1. Explore the two datasets that are used in RLHF training: the “preference” and “prompt” datasets. 2. Use the open source Google Cloud Pipeline Components Library, to fine-tune the Llama 2 model with RLHF. 3. Assess the tuned LLM against the original base model by comparing loss curves and using the “Side-by-Side (SxS)” method....

1 - Reinforcement Learning from Human Feedback 的 6 个评论（共 6 个）

创建者 Ahmad A

•

Jun 19, 2025

better to be expanded a bit, but overall, it is super course

创建者 Neil L

•

Aug 17, 2025

Very nice overview about how RLHF works.

创建者 sajjad s

•

May 14, 2025

great

创建者 Fady A S

•

Dec 12, 2024

The content is amazing, the instructor is great and the flow is well structured. I did learn a lot, however, I wish the notebooks were structured so that I can write some of the code on my own as opposed to everything being ready and already verified working.