Every data professional writes SQL queries — but few understand why some queries take seconds and others take minutes on the same data. The answer lies beneath the surface: in how data is stored, how query engines read that data, and how columnar formats like Parquet fundamentally change the game for analytics performance. This course gives you that understanding.
通过 Coursera Plus 提高技能,仅需 239 美元/年(原价 399 美元)。立即节省
推荐体验
推荐体验
初级
Basic computer literacy is helpful. No prior SQL experience is required, though familiarity with basic SQL statements will help you progress faster.
推荐体验
推荐体验
初级
Basic computer literacy is helpful. No prior SQL experience is required, though familiarity with basic SQL statements will help you progress faster.
您将学到什么
Explain how data is stored, distinguish row vs columnar storage, and identify performance advantages of columnar formats
Work with Parquet and ORC formats, query columnar data using DuckDB, and compare storage performance against CSV
Read SQL execution plans using EXPLAIN, diagnose bottlenecks, and compare query performance across different engines
Apply optimization techniques including column pruning, filter pushdown, partitioning, and data skipping for analytics
要了解的详细信息
了解顶级公司的员工如何掌握热门技能

该课程共有5个模块
This module introduces how data is stored and organized in computer systems using files, tables, rows, and columns. It explains how SQL is used to access and manipulate data and how databases process read operations. The module also compares row-based and column-based storage to show how different storage models affect query performance.
涵盖的内容
13个视频6篇阅读材料5个作业
13个视频•总计56分钟
- Course Introduction•4分钟
- Core Data and Tabular Structure Overview•4分钟
- Foundations of Data Storage and Organization•6分钟
- Hands-On: Viewing CSV Data Using Spreadsheet Tools•5分钟
- Exploring the Purpose and use of SQL•5分钟
- Internal Mechanisms of SQL Data Retrieval•4分钟
- Hands-On: Running Basic SQL Queries•4分钟
- SQL Commands for Creating, Reading, Updating and Deleting Data•5分钟
- Comparing Read Heavy and Write Heavy Workloads•5分钟
- Hands-On : SQL CRUD Operation•4分钟
- Understanding Row Based and Column Based Storage•4分钟
- Hands-On: Evaluating Row and Column Storage Approaches•3分钟
- Hands-On: How Storage Models Affect Query Execution•4分钟
6篇阅读材料•总计57分钟
- Course Syllabus•7分钟
- Design Principles of Data Organization•10分钟
- Core SQL Concepts for Data Analysis •10分钟
- Managing Data Operations in SQL Systems •10分钟
- Row and Column Storage Structures•10分钟
- Module Summary: Foundations of Data Storage and SQL for Analytics •10分钟
5个作业•总计39分钟
- Foundations of Data Storage and SQL for Analytics•15分钟
- Data Storage and Retrieval•6分钟
- Introduction to SQL and Data Representation•6分钟
- SQL Data Processing Concepts •6分钟
- Row vs. Column Storage Explained•6分钟
This module explains how columnar storage is used in modern data systems and data warehouses for efficient analytics. It introduces common columnar formats and tools used in industry, and demonstrates how techniques like compression, data skipping, and partitioning improve query performance.
涵盖的内容
12个视频4篇阅读材料4个作业
12个视频•总计49分钟
- Columnar storage in Modern Data Systems•4分钟
- Columnar Storage Formats: Parquet and ORC•4分钟
- Columnar Systems vs. File Based Storage Architectures •4分钟
- Hands-On : Exploring Columnar Parquet Files Using Query Engine•3分钟
- Hands-On : Comparing CSV vs. Columnar File Size•3分钟
- Role of Columnar Storage in Modern Data Warehouses•4分钟
- Modern Columnar Systems in Open Source and Cloud•5分钟
- Hands-On: Querying Data in Columnar Storage Systems•4分钟
- Impact of Compression on Query Performance•5分钟
- Data Skipping , Metadata, and Partitioning Techniques•4分钟
- Hands-On: The Role of Filters in Efficient Data Access•4分钟
- Hands-On: Partitioned vs. Non-Partitioned Queries•5分钟
4篇阅读材料•总计40分钟
- Columnar Storage Architecture for Analytical Systems•10分钟
- Scalable Columnar Data Platforms and Ecosystems•10分钟
- Performance Optimization in Columnar Data Platforms•10分钟
- Module Summary: Columnar Storage in Modern Industry Systems•10分钟
4个作业•总计33分钟
- Columnar Storage in Modern Industry Systems•15分钟
- Data Storage Formats and Structures•6分钟
- Columnar Data Processing with DuckDB•6分钟
- Efficiency Gains with Columnar Storage•6分钟
This module introduces query engines and SQL tools used by analysts and engineers to process data. It explains how SQL queries are executed internally and how query plans represent the steps a system takes to run a query. The module also compares different query engines to understand why some systems perform faster for analytical workloads.
涵盖的内容
9个视频4篇阅读材料4个作业
9个视频•总计39分钟
- Role of Query Engine•4分钟
- Hands-On: Running SQL Queries in DuckDB•4分钟
- Execution Lifecycle of SQL Queries•3分钟
- Query Plans•5分钟
- Hands-On: Analyzing Query Execution Flow•5分钟
- Hands-On: Interpreting Query Execution Plans•3分钟
- Performance Differences Across Query Engines•5分钟
- Hands-On: Storage Format and Query Performance Analysis•4分钟
- Hands-On: Cross-Engine SQL Query Comparison•5分钟
4篇阅读材料•总计40分钟
- Architectural Components of Analytical Query System •10分钟
- Query Processing Techniques in Analytical Databases•10分钟
- System Design Factors in Query Engine Performance •10分钟
- Module Summary: Query Engines and SQL Processing Systems•10分钟
4个作业•总计33分钟
- Query Engines and SQL Processing Systems•15分钟
- SQL Execution with in Query Engines•6分钟
- Query Execution and Query Plans•6分钟
- Analytical Query System•6分钟
This module explains why query optimization is important for improving data processing efficiency and reducing slow query performance. It introduces practical SQL optimization techniques such as filtering, column pruning, and efficient aggregations. The module also demonstrates real-world optimization workflows using industry tools to compare query performance before and after optimization.
涵盖的内容
9个视频4篇阅读材料4个作业
9个视频•总计36分钟
- Optimizing SQL Query Performance•5分钟
- Impact of Query Performance on Business Operations•4分钟
- Hands-On: Evaluating Query Inefficiencies•3分钟
- Performance Optimization Strategies for Columnar SQL Databases•4分钟
- Hands-On: Bad SQL vs Optimized SQL Queries•4分钟
- Hands-On: The Role of Filters in Efficient Data Acces•4分钟
- Query Optimization Pipeline: An End to End Perspective•3分钟
- Real World Application of Analytics Optimization Techniques•6分钟
- Hands-On: Comparing Query Performance Before and After Optimization•3分钟
4篇阅读材料•总计40分钟
- Managing Query Performance in Analytic Systems•10分钟
- Efficient Query Design in Analytical Systems•10分钟
- Strategies for Sustaining Query Performance•10分钟
- Module Summary: Query Optimization Concepts and Best Practices•10分钟
4个作业•总计33分钟
- Query Optimization Concepts and Best Practices•15分钟
- Principles of Query Optimization and Its Improtance •6分钟
- SQL Query Optimization Strategies•6分钟
- Real World Query Optimization in Industry System•6分钟
This module consolidates key concepts from data storage, SQL querying, query execution, and optimization. It evaluates understanding through structured assessments and practical query analysis scenarios. It serves as a final checkpoint to assess readiness for real-world data querying and performance optimization tasks.
涵盖的内容
1个视频1篇阅读材料2个作业
1个视频•总计4分钟
- Course Summary•4分钟
1篇阅读材料•总计30分钟
- Practice Project: Designing Fast Data Systems for Analytics•30分钟
2个作业•总计60分钟
- Final Course Assessment : Columnar Storage and Query Optimization•30分钟
- Optimizing the Data Core – Columnar Storage and Query Performance•30分钟
位教师

提供方

提供方

Edureka is an online education platform focused on delivering high-quality learning to working professionals. We have the highest course completion rate in the industry and we strive to create an online ecosystem for our global learners to equip themselves with industry-relevant skills in today’s cutting edge technologies.
人们为什么选择 Coursera 来帮助自己实现职业发展

Felipe M.

Jennifer J.

Larry W.

Chaitanya A.
常见问题
Columnar storage organizes data by columns rather than rows. For analytical queries that typically scan a few columns across millions of rows, columnar formats like Parquet and ORC dramatically reduce the amount of data read from disk — enabling faster queries, better compression, and lower compute costs. This course teaches you exactly how and why.
The primary hands-on tool is DuckDB — a fast, in-process analytical query engine that requires minimal setup. You will also work with Parquet files, CSV data, and SQL query plan tools (EXPLAIN). No complex installation or cloud accounts are needed.
Yes. This course is built around a follow-along, demonstration-driven learning model. Each concept is taught through step-by-step video demonstrations using DuckDB and Parquet that you can replicate on your own machine. DuckDB runs locally with minimal setup, so you can pause, rewind, and practice alongside each demo at your own pace.
Basic SQL familiarity (SELECT, FROM, WHERE) is helpful but not strictly required. Module 1 begins with SQL fundamentals — what SQL is, CRUD operations, and how READ operations work internally — before progressing to advanced query optimization. The course is accessible for beginners while still valuable for experienced SQL users.
DuckDB is a modern, open-source analytical query engine designed for fast, in-process SQL analytics. It runs locally without a server, reads Parquet files natively, and provides query plan visibility through EXPLAIN — making it ideal for learning storage and optimization concepts without infrastructure overhead.
A query execution plan shows the step-by-step strategy a query engine uses to retrieve your data — including scan operations, filter application, join methods, and aggregations. In Module 3, you will use EXPLAIN to view execution plans, interpret them conceptually, and diagnose where queries slow down.
Both Parquet and ORC are columnar file formats designed for analytics. Parquet is more widely adopted in cloud and open-source ecosystems (Spark, Hive, DuckDB), while ORC is optimized for Hive-based environments. This course covers both formats and helps you understand when to use each.
You will learn column pruning (reading only needed columns), filter pushdown (applying filters early), partitioning (organizing data for faster scans), data skipping (avoiding irrelevant data blocks), and aggregation optimization. Module 4 includes before-vs-after comparisons so you can measure the real performance impact of each technique.
Data Analysts who want to understand query performance, junior Data Engineers building storage foundations, BI Professionals moving into platform or performance roles, and SQL Developers who want to go beyond syntax to understand execution internals. If slow queries frustrate you, this course is for you.
Storage and query optimization are foundational skills for Data Engineering, Analytics Engineering, BI Development, Database Engineering, and Performance Engineering roles.
Basic computer literacy is sufficient. SQL is taught from fundamentals in Module 1. No prior experience with DuckDB, Parquet, or query optimization is required. Familiarity with basic SELECT statements will help you move faster but is not mandatory.
Yes. Upon completing all graded assessments and the final course assessment, you will earn a Coursera Course Certificate from Edureka that you can add to your LinkedIn profile, resume, or CV.
To access the course materials, assignments and to earn a Certificate, you will need to purchase the Certificate experience when you enroll in a course. You can try a Free Trial instead, or apply for Financial Aid. The course may offer 'Full Course, No Certificate' instead. This option lets you see all course materials, submit required assessments, and get a final grade. This also means that you will not be able to purchase a Certificate experience.
When you purchase a Certificate you get access to all course materials, including graded assignments. Upon completing the course, your electronic Certificate will be added to your Accomplishments page - from there, you can print your Certificate or add it to your LinkedIn profile.
Yes. In select learning programs, you can apply for financial aid or a scholarship if you can’t afford the enrollment fee. If fin aid or scholarship is available for your learning program selection, you’ll find a link to apply on the description page.
更多问题
提供助学金,
¹ 本课程的部分作业采用 AI 评分。对于这些作业,将根据 Coursera 隐私声明使用您的数据。



