Most real-world data isn’t clean, it’s messy, incomplete, and spread across sources like websites, APIs, and databases. In this course, you’ll learn how to collect that data, clean it, and prepare it for analysis using Python and SQL.
You’ll start by extracting data from webpages using tools like Pandas and Beautiful Soup, while also learning how to handle unstructured text and apply ethical scraping practices.
Next, you’ll access real-time data through APIs, parse JSON files, and clean numerical data using techniques like normalization and binning. You’ll also learn how to manage authentication with API keys and store them securely.
Finally, you’ll work with databases: Querying and joining tables using SQL, validating results, and understanding when to use SQL versus Python for different preprocessing tasks.
By the end of the course, you’ll be able to turn raw, real-world data into reliable, analysis-ready inputs—a core skill for any data professional.
This module introduces techniques for acquiring data from a wide range of sources, with a focus on web scraping and text processing. You'll begin by exploring how data flows into analysis pipelines and gain hands-on experience using tools like Pandas and Beautiful Soup to extract, clean, and structure data. You'll apply text preprocessing methods to handle missing values and parse HTML. Plus, you’ll consider the ethical implications of scraping data from the web.
涵盖的内容
22个视频3篇阅读材料4个作业1个编程作业3个非评分实验室
显示有关单元内容的信息
22个视频•总计81分钟
Welcome to this course!•5分钟
Generative AI in this course•2分钟
Module 1 introduction•1分钟
The many sources of data•4分钟
Data cleaning and processing•4分钟
ETL and ELT•4分钟
Introduction to web scraping•3分钟
Scraping tables with Pandas•4分钟
String methods: replace•4分钟
Casting•3分钟
Handling missing values•5分钟
String methods: contains•3分钟
String methods: split and strip•4分钟
Networking•3分钟
Scraping webpages with requests•4分钟
HTML•5分钟
Planning HTML parsing•3分钟
Parsing HTML with Beautiful Soup•5分钟
DataFrame setup•5分钟
Regular expressions•4分钟
Writing regular expressions with LLMs•2分钟
The ethics of web scraping•4分钟
3篇阅读材料•总计13分钟
Join the DeepLearning.AI Forum to ask questions, get support, or share amazing ideas!•2分钟
Additional Web Scraping Practice•10分钟
Module 1 lecture notes•1分钟
4个作业•总计80分钟
Lesson 1 quiz•10分钟
Lesson 2 quiz•10分钟
Lesson 3 quiz•30分钟
Module 1 quiz•30分钟
1个编程作业•总计90分钟
Analyzing Tech Industry Jobs and Companies•90分钟
3个非评分实验室•总计90分钟
Module 1 lecture code•30分钟
Practice Lab: Web Scraping with Pandas•30分钟
Practice Lab: Web Scraping with Beautiful Soup•30分钟
APIs & numerical cleaning
第 2 单元•小时 后完成
单元详情
This module focuses on acquiring data using APIs, as well as applying numerical cleaning techniques. You’ll learn how to retrieve data from web-based APIs, handle authentication securely, and transform raw JSON responses into usable dataframes. The module also covers techniques for cleaning and preparing numerical data, including scaling, binning, normalization, and outlier handling.
涵盖的内容
17个视频2篇阅读材料4个作业1个编程作业3个非评分实验室
显示有关单元内容的信息
17个视频•总计61分钟
Module 2 introduction•1分钟
Introduction to APIs•4分钟
JSON•5分钟
API requests and responses•2分钟
Query parameters•4分钟
From JSON to a dataframe•4分钟
Pagination•4分钟
Analyzing the combined DataFrame•4分钟
API keys•3分钟
Using an API key•3分钟
Environmental variables•3分钟
Scaling•4分钟
Binning•4分钟
Normalization•5分钟
Identifying outliers•2分钟
Handling outliers•5分钟
Data quality•4分钟
2篇阅读材料•总计11分钟
Mechanics of API keys•10分钟
Module 2 lecture notes•1分钟
4个作业•总计60分钟
Lesson 1 quiz•10分钟
Lesson 2 quiz•10分钟
Lesson 3 quiz•10分钟
Module 2 quiz•30分钟
1个编程作业•总计90分钟
Identifying Vulnerable Communities using the U.S. Census API•90分钟
3个非评分实验室•总计90分钟
Module 2 lecture code•30分钟
Practice Lab: Using APIs•30分钟
Practice Lab: API keys and numerical cleaning•30分钟
Databases
第 3 单元•小时 后完成
单元详情
This module introduces the fundamentals of data storage and retrieval using databases and SQL. You’ll learn how data is structured in relational systems; explore core concepts like entities, relationships, and schemas; and gain hands-on experience writing SQL queries. You’ll also explore how to query databases from a Python notebook, as well as how generative AI tools can support SQL-based tasks.
涵盖的内容
15个视频3篇阅读材料4个作业1个编程作业2个非评分实验室
显示有关单元内容的信息
15个视频•总计55分钟
Module 3 introduction•1分钟
Data storage systems•4分钟
What is a database?•4分钟
Database management systems•3分钟
Tidy data•4分钟
Entities and attributes•4分钟
Relationships•4分钟
Data models and data schemas•4分钟
Types of tables•3分钟
Introduction to SQL•3分钟
SQL code•3分钟
Selecting•5分钟
Ordering results•5分钟
LLMs for databases•4分钟
SQL in Python•5分钟
3篇阅读材料•总计31分钟
[Optional] Practice with selecting•20分钟
[Optional] Practice with ordering results•10分钟
Module 3 lecture notes•1分钟
4个作业•总计65分钟
Lesson 1 quiz•20分钟
Lesson 2 quiz•10分钟
Lesson 3 quiz•5分钟
Module 3 quiz•30分钟
1个编程作业•总计90分钟
Analyzing Movie Data with SQL•90分钟
2个非评分实验室•总计60分钟
Module 3 lecture code•30分钟
Practice Lab: SQLite in Python•30分钟
Preprocessing, validation, and joins with SQL
第 4 单元•小时 后完成
单元详情
In this module, you’ll expand your SQL skills into data preprocessing, validation, and joins (combining tables). You’ll learn how to use SQL for filtering, conditional logic, and handling missing values, and apply validation techniques using aggregation and grouping. The module also explores different types of joins and demonstrates how to use them to combine and analyze data across multiple tables—especially in real-world scenarios like analyzing sports performance data.
涵盖的内容
17个视频11篇阅读材料4个作业2个编程作业4个非评分实验室
显示有关单元内容的信息
17个视频•总计52分钟
Module 4 introduction•1分钟
SQL vs Python•2分钟
Filtering•4分钟
Filtering: Compound conditions•4分钟
Filtering: String-based conditions•3分钟
Conditionals: CASE•3分钟
Handling NULL values•3分钟
Data validation•4分钟
Validation: COUNT and DISTINCT•3分钟
Validation: GROUP BY•4分钟
Validation: MIN, MAX, SUM•4分钟
Validation: HAVING•3分钟
Introduction to joins•3分钟
Left joins•4分钟
Inner joins•3分钟
Outer joins•2分钟
Your next steps•1分钟
11篇阅读材料•总计101分钟
[Optional] Practice with filtering•10分钟
[Optional] Practice with conditionals and NULL values•10分钟
SQL Basics: Data Creation and Modification•15分钟
[Optional] Practice with COUNT and DISTINCT•10分钟
[Optional] Practice with GROUP BY and aggregations•10分钟
[Optional] Practice with HAVING•10分钟
[Optional] Practice with LEFT JOINs•10分钟
Master INNER JOINs•10分钟
[Optional] Practice with INNER JOINs•10分钟
Module 4 lecture notes•1分钟
Acknowledgments•5分钟
4个作业•总计60分钟
Lesson 1 quiz•10分钟
Lesson 2 quiz•10分钟
Lesson 3 quiz•10分钟
Module 4 quiz•30分钟
2个编程作业•总计180分钟
Deeper Analysis of the Movie Data with SQL•90分钟
NYC Restaurant Inspections•90分钟
4个非评分实验室•总计120分钟
Module 4 lecture code•30分钟
Practice Lab: Analyzing NBA games: Best Players•30分钟
Practice Lab: Analyzing NBA games - Validating your data•30分钟
Practice Lab: Analyzing NBA games: Performance per game•30分钟
DeepLearning.AI is an education technology company that develops a global community of AI talent.
DeepLearning.AI's expert-led educational experiences provide AI practitioners and non-technical professionals with the necessary tools to go all the way from foundational basics to advanced application, empowering them to build an AI-powered future.
When will I have access to the lectures and assignments?
To access the course materials, assignments and to earn a Certificate, you will need to purchase the Certificate experience when you enroll in a course. You can try a Free Trial instead, or apply for Financial Aid. The course may offer 'Full Course, No Certificate' instead. This option lets you see all course materials, submit required assessments, and get a final grade. This also means that you will not be able to purchase a Certificate experience.
What will I get if I subscribe to this Certificate?
When you enroll in the course, you get access to all of the courses in the Certificate, and you earn a certificate when you complete the work. Your electronic Certificate will be added to your Accomplishments page - from there, you can print your Certificate or add it to your LinkedIn profile.