Big data is the area of informatics focusing on datasets whose size is beyond the ability of typical database and other software tools to capture, store, analyze and manage. This course provides a rapid immersion into the area of big data and the technologies which have recently emerged to manage it.
We start with an introduction to the characteristics of big data and an overview of the associated technology landscape and continue with an in depth exploration of Hadoop, the leading open source framework for big data processing. Here the focus is on the most important Hadoop components such as Hive, Pig, stream processing and Spark as well as architectural patterns for applying these components. We continue with an exploration of the range of specialized (NoSQL) database systems architected to address the challenges of managing large volumes of data.
Overall the objective is to develop a sense of how to make sound decisions in the adoption and use of these technologies as well as economically deploy them on modern cloud computing infrastructure.
Welcome to Big Data Technologies! In Module 1, students will develop a foundational understanding of analytic data, its inherent value, and the methods to transform raw data into valuable insights. This module covers the challenges of handling large datasets, including their collection, processing, and analysis, while providing a comprehensive overview of Big Data's origins, properties, and real-world applications. Additionally, students will explore the economic, logistical, and ethical concerns associated with Big Data, alongside the professional advantages for data scientists proficient in Big Data analysis.
涵盖的内容
16个视频10篇阅读材料8个作业1个讨论话题
显示有关单元内容的信息
16个视频•总计104分钟
Course Overview•4分钟
Instructor Introduction•2分钟
Module 1 Introduction•2分钟
From Data to Value - Part 1•10分钟
From Data to Value - Part 2•7分钟
Big Data Overview - Part 1•8分钟
Big Data Overview - Part 2•6分钟
Confounding Factors - Part 1•8分钟
Confounding Factors - Part 2•7分钟
Confounding Factors - Part 3•6分钟
Big Data Challenges•6分钟
Big Data Benefits - Part 1•6分钟
Big Data Benefits - Part 2•5分钟
Big Data Technology - Part 1•10分钟
Big Data Technology - Part 2•8分钟
Generic Distributed Storage Systems and Execution Engines •11分钟
10篇阅读材料•总计500分钟
Syllabus•10分钟
Module 1 Introduction Reading•60分钟
From Data to Value•60分钟
Big Data Overview•60分钟
Confounding Factors•60分钟
Big Data Challenges•60分钟
Big Data Benefits•60分钟
Big Data Technology•60分钟
Generic Distributed Storage Systems and Execution Engines•60分钟
Module 1 Summary•10分钟
8个作业•总计330分钟
Module 1 Summative Assessment•120分钟
From Data to Value Quiz•15分钟
Big Data Overview Quiz•15分钟
Confounding Factors Quiz•15分钟
Big Data Challenges Quiz•15分钟
Big Data Benefits Quiz•15分钟
Big Data Technology Quiz•15分钟
Creating an AWS Account Assignment•120分钟
1个讨论话题•总计10分钟
Meet and Greet Discussion•10分钟
Module 2: Apache Hadoop Overview
第 2 单元•小时 后完成
单元详情
Module 2 introduces students to the challenges of building and managing distributed systems for big data storage and processing. It covers Hadoop’s origins, concepts, core components, and key characteristics, while exploring the Hadoop ecosystem's tools and services. Students will gain an understanding of distributed file systems, specifically HDFS, YARN's resource management, and various technologies for effective big data storage and organization.
涵盖的内容
13个视频7篇阅读材料6个作业
显示有关单元内容的信息
13个视频•总计91分钟
Module 2 Introduction•2分钟
Hadoop - Part 1•9分钟
Hadoop - Part 2•6分钟
Hadoop - Part 3•7分钟
Hadoop Distributed File System Overview - Part 1•7分钟
Hadoop Distributed File System Overview - Part 2•8分钟
Hadoop Distributed File System Overview - Part 3•6分钟
Using the Hadoop Distributed File System - Part 1•9分钟
Using the Hadoop Distributed File System - Part 2•5分钟
Cloud Object Storage for Big Data - Part 1•9分钟
Cloud Object Storage for Big Data - Part 2•8分钟
Yet Another Resource Negotiator - Part 1•9分钟
Yet Another Resource Negotiator - Part 2•6分钟
7篇阅读材料•总计370分钟
Module 2 Introduction Reading•60分钟
Hadoop•60分钟
Hadoop Distributed File System Overview•60分钟
Using the Hadoop Distributed File System•60分钟
Cloud Object Storage for Big Data•60分钟
Yet Another Resource Negotiator•60分钟
Module 2 Summary•10分钟
6个作业•总计195分钟
Module 2 Summative Assessment•120分钟
Hadoop Quiz•15分钟
Hadoop Distributed File System (HDFS) Overview Quiz•15分钟
Using HDFS Quiz•15分钟
Cloud Object Storage Quiz•15分钟
Yet Another Resource Negotiator (YARN) Quiz•15分钟
Module 3: Apache Hadoop MapReduce
第 3 单元•小时 后完成
单元详情
In Module 3, students will explore the differences between processing small to moderate versus massive data volumes through distributed computing. This module covers the key concepts of the MapReduce framework, including how it breaks down large data processing tasks into smaller, parallel tasks for efficient execution. Students will also learn about the phases of MapReduce, the role of map and reduce functions, optimization patterns, and the benefits and limitations of various development approaches, including Java-based MapReduce and Hadoop Streaming.
涵盖的内容
18个视频8篇阅读材料7个作业
显示有关单元内容的信息
18个视频•总计120分钟
Module 3 Introduction•2分钟
The Path to MapReduce - Part 1•8分钟
The Path to MapReduce - Part 2•7分钟
MapReduce Overview - Part 1•6分钟
MapReduce Overview - Part 2•5分钟
MapReduce Overview - Part 3•7分钟
MapReduce Concepts - Part 1•6分钟
MapReduce Concepts - Part 2•5分钟
MapReduce Concepts - Part 3•6分钟
MapReduce Concepts - Part 4•10分钟
MapReduce Examples - Part 1•9分钟
MapReduce Examples - Part 2•5分钟
MapReduce Programming - Part 1•8分钟
MapReduce Programming - Part 2•10分钟
MapReduce Programming - Part 3•6分钟
MapReduce Optimization - Part 1•8分钟
MapReduce Optimization - Part 2•4分钟
MapReduce Optimization - Part 3•8分钟
8篇阅读材料•总计430分钟
Module 3 Introduction Reading•60分钟
The Path to MapReduce•60分钟
MapReduce Overview•60分钟
MapReduce Concepts•60分钟
MapReduce Examples•60分钟
MapReduce Programming•60分钟
MapReduce Optimization•60分钟
Module 3 Summary•10分钟
7个作业•总计210分钟
Module 3 Summative Assessment•120分钟
The Path to MapReduce Quiz•15分钟
MapReduce Overview Quiz•15分钟
MapReduce Concepts Quiz•15分钟
MapReduce Examples Quiz•15分钟
MapReduce Programming•15分钟
MapReduce Optimization•15分钟
Module 4: Apache Spark (Part 1)
第 4 单元•小时 后完成
单元详情
In Module 4, students will explore Apache Spark as a powerful distributed processing framework for interactive, batch, and streaming tasks. This module covers Spark's core functionalities, including machine learning, graph processing, and handling structured and unstructured data, while highlighting its in-memory processing potential and unified nature. Students will compare Spark with MapReduce, learn about Spark's primary components, execution architecture, Resilient Distributed Datasets (RDDs), DataFrames, Datasets, and the various methods for creating and optimizing DataFrames for efficient data processing.
涵盖的内容
25个视频7篇阅读材料6个作业
显示有关单元内容的信息
25个视频•总计143分钟
Module 4 Introduction•2分钟
Spark Overview - Part 1•9分钟
Spark Overview - Part 2•9分钟
Spark Components - Part 1•7分钟
Spark Components - Part 2•6分钟
Spark Components - Part 3•6分钟
Spark Components - Part 4•7分钟
Spark Components - Part 5•3分钟
Spark Concepts - Part 1•7分钟
Spark Concepts - Part 2•6分钟
Spark Concepts - Part 3•5分钟
Spark Concepts - Part 4•7分钟
Spark Concepts - Part 5•6分钟
Spark Concepts - Part 6•4分钟
Spark Concepts - Part 7•7分钟
Spark Concepts - Part 8•3分钟
Spark Concepts - Part 9•5分钟
Spark Concepts - Part 10•5分钟
Creating Spark DataFrames - Part 1•6分钟
Creating Spark DataFrames - Part 2•9分钟
Creating Spark DataFrames - Part 3•6分钟
Creating Spark DataFrames - Part 4•4分钟
Defining Spark Schemas - Part 1•6分钟
Defining Spark Schemas - Part 2•5分钟
Defining Spark Schemas - Part 3•2分钟
7篇阅读材料•总计370分钟
Module 4 Introduction Reading•60分钟
Spark Overview•60分钟
Spark Components•60分钟
Spark Concepts•60分钟
Creating Spark DataFrames•60分钟
Defining Spark Schemas•60分钟
Module 4 Summary•10分钟
6个作业•总计195分钟
Module 4 Summative Assessment•120分钟
Spark Overview Quiz•15分钟
Spark Components Quiz•15分钟
Concepts Quiz•15分钟
Creating Spark DataFrames Quiz•15分钟
Defining Spark Schemas Quiz•15分钟
Module 5: Apache Spark (Part 2)
第 5 单元•小时 后完成
单元详情
In Module 5, students will delve deeper into Spark's capabilities for data manipulation and transformation. The module covers essential operations such as selecting, filtering, and sorting data, as well as joining DataFrames and performing aggregations. Students will also learn about handling null values, using Spark SQL for data queries, and optimizing performance with caching. Practical applications include creating and manipulating DataFrames, executing transformations and actions, and efficiently writing data to various formats.
涵盖的内容
19个视频11篇阅读材料10个作业
显示有关单元内容的信息
19个视频•总计103分钟
Module 5 Introduction•2分钟
Transformation - Rows - Part 1•10分钟
Transformation - Rows - Part 2•5分钟
Transformation - Rows - Part 3•4分钟
Transformations Columns - Part 1•9分钟
Transformations Columns - Part 2•4分钟
Transformations Join - Part 1•4分钟
Transformations Join - Part 2•4分钟
Transformations - Aggregations - Part 1•7分钟
Transformations - Aggregations - Part 2•5分钟
Transformations - Working with Null Values - Part 1•5分钟
Transformations - Working with Null Values - Part 2•5分钟
Transformations - Spark SQL - Part 1•6分钟
Transformations - Spark SQL - Part 2•4分钟
Transformations - Caching - Part 1•4分钟
Transformations - Caching - Part 2•5分钟
Actions•10分钟
Actions - Writing Data - Part 1•5分钟
Actions - Writing Data - Part 2•5分钟
11篇阅读材料•总计610分钟
Module 5 Introduction Reading•60分钟
Transformation - Rows•60分钟
Transformations - Columns•60分钟
Transformations - Join•60分钟
Transformations - Aggregations•60分钟
Transformations - Working with Null Values•60分钟
Transformations - Spark SQL•60分钟
Transformations - Caching•60分钟
Actions•60分钟
Actions - Writing Data•60分钟
Module 5 Summary•10分钟
10个作业•总计255分钟
Module 5 Summative Assessment•120分钟
Transformations - Rows Quiz•15分钟
Transformations - Columns Quiz•15分钟
Transformations - Join Quiz•15分钟
Transformations/Actions - Aggregations Quiz•15分钟
Transformations - Working with Null Values Quiz•15分钟
Transformations - Spark SQL Quiz•15分钟
Transformations - Caching Quiz•15分钟
Transformations - Actions Quiz•15分钟
Actions - Writing Data Quiz•15分钟
Module 6: Big Data Streaming and Design Patterns
第 6 单元•小时 后完成
单元详情
Module 6 introduces students to the limitations of batch processing and the significance of real-time data processing. It covers essential aspects of stream processing, including data ingestion and analysis, with a focus on tools like Apache Kafka for stream ingestion and Spark Structured Streaming for scalable and fault-tolerant data processing. Students will also explore various design patterns for organizing big data clusters, the concept of data lakes, and the Lambda Architecture for unifying real-time and batch data processing in modern data environments.
涵盖的内容
16个视频6篇阅读材料6个作业
显示有关单元内容的信息
16个视频•总计106分钟
Module 6 Introduction•3分钟
Stream Ingestion and Processing I - Part 1•9分钟
Stream Ingestion and Processing I - Part 2•8分钟
Stream Ingestion and Processing I - Part 3•8分钟
Stream Ingestion and Processing II - Part 1•6分钟
Stream Ingestion and Processing II - Part 2•3分钟
Stream Ingestion and Processing II - Part 3•5分钟
Stream Ingestion and Processing II - Part 4•7分钟
Analytic Cluster Pattern - Part 1•7分钟
Analytic Cluster Pattern - Part 2•7分钟
Data Lake Pattern - Part 1•6分钟
Data Lake Pattern - Part 2•6分钟
Data Lake Pattern - Part 3•6分钟
Lambda Architecture - Part 1•10分钟
Lambda Architecture - Part 2•8分钟
Lambda Architecture - Part 3•8分钟
6篇阅读材料•总计310分钟
Stream Ingestion and Processing (Part 1)•60分钟
Stream Ingestion and Processing (Part 2)•60分钟
Analytic Cluster Pattern•60分钟
Data Lake Pattern•60分钟
Lambda Architecture•60分钟
Module 6 Summary•10分钟
6个作业•总计195分钟
Module 6 Summative Assessment•120分钟
Stream Ingestion and Processing (Part 1) Quiz•15分钟
Stream Ingestion and Processing (Part 2) Quiz•15分钟
What is a characteristic of a transient Hadoop cluster? Quiz•15分钟
Data Lake Pattern Quiz•15分钟
Lambda Architecture Quiz•15分钟
Module 7: NoSQL Database
第 7 单元•小时 后完成
单元详情
In Module 7, students will explore the benefits and limitations of relational databases in big data contexts and the concept of distributed database systems. This module covers NoSQL databases, their diverse data models, and their scalability and flexibility advantages. Students will also learn about real-world use cases, data partitioning, consistency models, and the CAP Theorem, gaining a comprehensive understanding of how NoSQL databases manage large datasets across clusters while ensuring scalability and availability.
涵盖的内容
18个视频6篇阅读材料6个作业
显示有关单元内容的信息
18个视频•总计121分钟
Module 7 Introduction•3分钟
Using Databases for Big Data Storage - Part 1•10分钟
In Module 8, students will explore specific NoSQL databases types – namely Key-Value, Wide-Column, and Document databases. Two similar systems, HBase and Cassandra, will be studied and contrasted in the context of the CAP theorem and associated CP/AP trade-offs. Topics such as consistency and availability will be discussed in the context of specific usage scenarios for both HBase and Cassandra – and general application domains of both systems will be highlighted. Finally, the document database MongoDB will be reviewed in the context of natural language/text processing use cases – and MongoDB usage and architecture will be analyzed with respect to traditional RDBMS.
涵盖的内容
9个视频4篇阅读材料4个作业
显示有关单元内容的信息
9个视频•总计71分钟
Module 8 Introduction•0分钟
HRBase Pt. 1•10分钟
HRBase Pt. 2•7分钟
HRBase Pt. 3•6分钟
Cassandra Pt. 1•11分钟
Cassandra Pt. 2•11分钟
MongoDB Pt. 1•9分钟
MongoDB Pt. 2•7分钟
MongoDB Pt. 3•10分钟
4篇阅读材料•总计190分钟
HR Base•60分钟
Dynamo and Cassandra•60分钟
Mongo DB•60分钟
Module 8 Summary•10分钟
4个作业•总计165分钟
Module 8 Summative Assessment•120分钟
HR Base Quiz•15分钟
Cassandra Quiz•15分钟
Mongo DB Quiz•15分钟
Summative Course Assessment
第 9 单元•小时 后完成
单元详情
This module contains the summative course assessment that has been designed to evaluate your understanding of the course material and assess your ability to apply the knowledge you have acquired throughout the course.
Illinois Tech is a top-tier, nationally ranked, private research university with programs in engineering, computer science, architecture, design, science, business, human sciences, and law. The university offers bachelor of science, master of science, professional master’s, and Ph.D. degrees—as well as certificates for in-demand STEM fields and other areas of innovation. Talented students from around the world choose to study at Illinois Tech because of the access to real-world opportunities, renowned academic programs, high value, and career prospects of graduates.
When will I have access to the lectures and assignments?
To access the course materials, assignments and to earn a Certificate, you will need to purchase the Certificate experience when you enroll in a course. You can try a Free Trial instead, or apply for Financial Aid. The course may offer 'Full Course, No Certificate' instead. This option lets you see all course materials, submit required assessments, and get a final grade. This also means that you will not be able to purchase a Certificate experience.
What will I get if I subscribe to this Specialization?
When you enroll in the course, you get access to all of the courses in the Specialization, and you earn a certificate when you complete the work. Your electronic Certificate will be added to your Accomplishments page - from there, you can print your Certificate or add it to your LinkedIn profile.
Is financial aid available?
Yes. In select learning programs, you can apply for financial aid or a scholarship if you can’t afford the enrollment fee. If fin aid or scholarship is available for your learning program selection, you’ll find a link to apply on the description page.