Finding stories in data using exploratory data analysis (EDA) is all about organizing and interpreting raw data. Python can help you do this quickly and effectively. In this course, you’ll learn how to use Python to perform the EDA practices of discovering and structuring.
By the end of this course, you will be able to:
• Identify ethical issues that may come up during the data “discovering” practice of EDA
• Use Python to merge or join data based on defined criteria
• Use Python to sort and/or filter data
• Use relevant Python libraries for cleaning raw data
• Recognize opportunities for creating hypotheses based on raw data
• Recognize when and how to communicate status updates and questions to key stakeholders
• Apply Python tools to examine raw data structure and format.
• Use the PACE workflow to understand whether given data is adequate and applicable to a data science project
• Differentiate between the common formats of raw data sources (json, tabular, etc.) and data types
Data professionals must understand data sources, file formats, and responsible parties during exploratory analysis. In this module, you will learn when to contact data owners for questions or issues, how to import data using Python and perform EDA using basic functions in Python.
涵盖的内容
5个视频3篇阅读材料1个作业3个非评分实验室
显示有关单元内容的信息
5个视频•总计34分钟
Introduction to data exploration•3分钟
Yaser: Understand data to drive value•2分钟
Where the data comes from•9分钟
Find stories using the six exploratory data analysis practices •10分钟
EDA using basic data functions with Python•10分钟
3篇阅读材料•总计24分钟
Reference guide: The EDA process•8分钟
Reference guide: Import datasets with Python•8分钟
Reference guide: Pandas methods for the discovery of a dataset•8分钟
1个作业•总计8分钟
Test your knowledge: Discovering is the beginning of an investigation•8分钟
3个非评分实验室•总计100分钟
Annotated follow-along resource: EDA using basic functions with Python•20分钟
Activity: Discover what is in your dataset•60分钟
Exemplar: Discover what is in your dataset•20分钟
Understand data format
第 2 单元•小时 后完成
单元详情
EDA discovery uses targeted questioning to identify data gaps and missing information. In this module, you will learn how to formulate hypotheses, manipulate datetime strings and create bar graph visualizations.
涵盖的内容
2个视频1篇阅读材料1个作业1个非评分实验室
显示有关单元内容的信息
2个视频•总计20分钟
Discover what is missing from your dataset•6分钟
Date string manipulations with Python•14分钟
1篇阅读材料•总计8分钟
Reference guide: Datetime manipulation•8分钟
1个作业•总计6分钟
Test your knowledge: Understand data format•6分钟
1个非评分实验室•总计20分钟
Annotated follow-along guide: Date string manipulations with Python•20分钟
Create structure from raw data
第 3 单元•小时 后完成
单元详情
Structuring is an EDA practice for organizing data to learn more about it. In this module, you will learn different types of structuring methods, pandas tools for structuring datasets, and interpret histograms to understand data distributions.
涵盖的内容
2个视频2篇阅读材料1个作业3个非评分实验室1个插件
显示有关单元内容的信息
2个视频•总计21分钟
Use structuring methods to establish order in your dataset•5分钟
EDA structuring with Python•16分钟
2篇阅读材料•总计10分钟
Reference guide: Pandas tools for structuring a dataset•8分钟
Histograms•2分钟
1个作业•总计6分钟
Test your knowledge: Create structure from raw data•6分钟
3个非评分实验室•总计100分钟
Annotated follow-along guide: EDA structuring with Python•20分钟
Activity: Structure your data•60分钟
Exemplar: Structure your data•20分钟
1个插件•总计10分钟
Categorize: Structuring methods•10分钟
Review: Explore raw data
第 4 单元•小时 后完成
单元详情
Review everything you’ve learned and take the final assessment.
Grow with Google is an initiative that draws on Google's decades-long history of building products, platforms, and services that help people and businesses grow. We aim to help everyone – those who make up the workforce of today and the students who will drive the workforce of tomorrow – access the best of Google’s training and tools to grow their skills, careers, and businesses.
Organizations of all types and sizes have business processes that generate massive volumes of data. Every moment, all sorts of information gets created by computers, the internet, phones, texts, streaming video, photographs, sensors, and much more. In the global digital landscape, data is increasingly imprecise, chaotic, and unstructured. As the speed and variety of data increases exponentially, organizations are struggling to keep pace.
Data science is part of a field of study that uses raw data to create new ways of modeling and understanding the unknown. To gain insights, businesses rely on data professionals to acquire, organize, and interpret data, which helps inform internal projects and processes. Data scientists rely on a combination of critical skills, including statistics, scientific methods, data analysis, and artificial intelligence.
What do data professionals do?
A data professional is a term used to describe any individual who works with data and/or has data skills. At a minimum, a data professional is capable of exploring, cleaning, selecting, analyzing, and visualizing data. They may also be comfortable with writing code and have some familiarity with the techniques used by statisticians and machine learning engineers, including building models, developing algorithmic thinking, and building machine learning models.
Data professionals are responsible for collecting, analyzing, and interpreting large amounts of data within a variety of different organizations. The role of a data professional is defined differently across companies. Generally speaking, data professionals possess technical and strategic capabilities that require more advanced analytical skills such as data manipulation, experimental design, predictive modeling, and machine learning. They perform a variety of tasks related to gathering, structuring, interpreting, monitoring, and reporting data in accessible formats, enabling stakeholders to understand and use data effectively. Ultimately, the work of data professionals helps organizations make informed, ethical decisions.
Why start a career in data science?
Large volumes of data — and the technology needed to manage and analyze it — are becoming increasingly accessible. Because of this, there has been a surge in career opportunities for people who can tell stories using data, such as senior data analysts and data scientists. These professionals collect, analyze, and interpret large amounts of data within a variety of different organizations. Their responsibilities require advanced analytical skills such as data manipulation, experimental design, predictive modeling, and machine learning.
Do I need to take the course in a certain order?
We highly recommend taking the courses in the order presented, as the content builds on information from earlier courses. This is the fifth course in a series of six courses that make up the Google Data Analysis with Python Specialization.
When will I have access to the lectures and assignments?
To access the course materials, assignments and to earn a Certificate, you will need to purchase the Certificate experience when you enroll in a course. You can try a Free Trial instead, or apply for Financial Aid. The course may offer 'Full Course, No Certificate' instead. This option lets you see all course materials, submit required assessments, and get a final grade. This also means that you will not be able to purchase a Certificate experience.
What will I get if I subscribe to this Specialization?
When you enroll in the course, you get access to all of the courses in the Specialization, and you earn a certificate when you complete the work. Your electronic Certificate will be added to your Accomplishments page - from there, you can print your Certificate or add it to your LinkedIn profile.
Is financial aid available?
Yes. In select learning programs, you can apply for financial aid or a scholarship if you can’t afford the enrollment fee. If fin aid or scholarship is available for your learning program selection, you’ll find a link to apply on the description page.