Get hands-on experience with designing and building data processing systems on Google Cloud. This course uses lectures, demos, and hands-on labs to show you how to design data processing systems, build end-to-end data pipelines, analyze data, and implement machine learning. This course covers structured, unstructured, and streaming data.
Objectives
In this course, participants will learn the following skills:
- Design and build data processing systems on Google Cloud.
- Process batch and streaming data by implementing autoscaling data pipelines on Dataflow.
- Derive business insights from extremely large datasets using BigQuery.
- Leverage unstructured data using Spark and ML APIs on Dataproc.
- Enable instant insights from streaming data.
- Understand ML APIs and BigQuery ML, and learn to use AutoML to create powerful models without coding.
Audience
This class is intended for developers who are responsible for:
- Extracting, loading, transforming, cleaning, and validating data.
- Designing pipelines and architectures for data processing.
- Integrating analytics and machine learning capabilities into data pipelines.
- Querying datasets, visualizing query results, and creating reports.
Prerequisites
To get the most of out of this course, participants should have:
- Completed Google Cloud Fundamentals Big Data and Machine Learning course or have equivalent experience.
- Basic proficiency with a common query language such as SQL.
- Experience with data modeling and ETL (extract, transform, load) activities.
- Experience with developing applications using a common programming language such as Python.
- Familiarity with machine learning and/or statistics.
Duration
4 days
Investment
Check the next open public class in our enrollment page. If you are interested in a private training class for your company, contact-us.
Course Outline
The course includes presentations, demonstrations, and hands-on labs.
- Explore the role of a data engineer
- Analyze data engineering challenges
- Introduction to BigQuery
- Data lakes and data warehouses
- Transactional databases versus data warehouses
- Partner effectively with other data teams
- Manage data access and governance
- Build production-ready pipelines
- Review Google Cloud customer case study
- Introduction to data lakes
- Data storage and ETL options on Google Cloud
- Building a data lake using Cloud Storage
- Securing Cloud Storage
- Storing all sorts of data types
- Cloud SQL as a relational data lake
- The modern data warehouse
- Introduction to BigQuery
- Getting started with BigQuery
- Loading data
- Exploring schemas
- Schema design
- Nested and repeated fields
- Optimizing with partitioning and clustering
- EL, ELT, ETL
- Quality considerations
- How to carry out operations in BigQuery
- Shortcomings
- ETL to solve data quality issues
- The Hadoop ecosystem
- Run Hadoop on Dataproc
- Cloud Storage instead of HDFS
- Optimize Dataproc
- Introduction to Dataflow
- Why customers value Dataflow
- Dataflow pipelines
- Aggregating with GroupByKey and Combine
- Side inputs and windows
- Dataflow templates
- Dataflow SQL
- Building batch data pipelines visually with Cloud Data Fusion
- Components
- UI overview
- Building a pipeline
- Exploring data using Wrangler
- Orchestrating work between Google Cloud services with Cloud Composer
- Apache Airflow environment
- DAGs and operators
- Workflow scheduling
- Monitoring and logging
- Process Streaming Data
- Introduction to Pub/Sub
- Pub/Sub push versus pull
- Publishing with Pub/Sub code
- Process Streaming Data
- Steaming data challenges
- Dataflow windowing
- Streaming into BigQuery and visualizing results
- High-throughput streaming with Cloud Bigtable
- Optimizing Cloud Bigtable performance
- Analytic Window Functions.
- Using With Clauses.
- GIS Functions.
- Performance Considerations.
- What is AI?.
- From Ad-hoc Data Analysis to Data Driven Decisions.
- Options for ML models on Google Cloud
- Unstructured Data is Hard.
- ML APIs for Enriching Data.
- Whats a Notebook.
- BigQuery Magic and Ties to Pandas.
- Ways to do ML on GCP.
- Kubeflow.
- AI Hub.
- BigQuery ML for Quick Model Building.
- Supported Models.
- Why Auto ML?
- Auto ML Vision.
- Auto ML NLP.
- Auto ML Tables.