Serverless Data Processing with Dataflow

Beginning with foundations, this training explains how Apache Beam and Dataflow work together to meet your data processing needs without the risk of vendor lock-in. The section on developing pipelines covers how you convert your business logic into data processing applications that can run on Dataflow. This training culminates with a focus on operations, which reviews the most important lessons for operating a data application on Dataflow, including monitoring, troubleshooting, testing, and reliability.

Objectivos

In this course, participants will learn the following skills:

  • Demonstrate how Apache Beam and Dataflow work together to fulfill your organization’s data processing needs.
  • Summarize the benefits of the Beam Portability Framework and enable it for your Dataflow pipelines.
  • Enable Shuffle and Streaming Engine, for batch and streaming pipelines respectively, for maximum performance.
  • Enable Flexible Resource Scheduling for more cost-efficient performance.
  • Select the right combination of IAM permissions for your Dataflow job.
  • Implement best practices for a secure data processing environment.
  • Select and tune the I/O of your choice for your Dataflow pipeline.
  • Use schemas to simplify your Beam code and improve the performance of your pipeline.
  • Develop a Beam pipeline using SQL and DataFrames.
  • Perform monitoring, troubleshooting, testing and CI/CD on Dataflow pipelines.

Público

This training is intended for big data practitioners who want to further their understanding of Dataflow in order to advance their data processing applications, including:
  • Data Engineer
  • Data Analysts and Data Scientists aspiring to develop Data Engineering skills

Prerrequisitos

To get the most of out of this course, participants should have:

  • Completed Google Cloud Fundamentals- Big Data and Machine Learning course OR  have equivalent experience.
  • Basic proficiency with common query language such as SQL
  • Experience with data modeling, extract, transform, load activities
  • Developing applications using a common programming language such as Python
  • Completed “Building Batch Data Pipelines”
    and “Building Resilient Streaming Analytics Systems” or Data Engineering on Google Cloud

Duración

~ 24 horas (~ 3 días)

Inversión

Vea el valor actualizado y las próximas fechas de las clases abiertas en nuestra página de registro.
Si está interesado en una clase cerrada para su empresa, contáctenos.

Resumen del curso

Dependencias de otros cursos y certificaciones con el curso de Serverless Data Processing with Dataflow
Dependencias de otros cursos y certificaciones con el curso de Serverless Data Processing with Dataflow
El curso incluye presentaciones y laboratorios prácticos.
  • Course Introduction
  • Beam and Dataflow Refresher
  • Beam Portability
  • Runner v2
  • Container Environments
  • Cross Language TransformS
  • Dataflow
  • Dataflow Shuffle Service
  • Dataflow Streaming Engine
  • Flexible Resource Scheduling
  • Data Locality
  • Shared VPC
  • Private IPs
  • CMEK
  • Beam Basics
  • Utility Transforms
  • DoFn Lifecycle
  • Windows
  • Watermarks
  • Triggers
  • Sources and Sinks
  • Text IO and File IO
  • BigQuery IO
  • PubSub IO
  • Kafka IO
  • Bigtable IO
  • Avro IO
    – Splittable DoFn
  • Beam Schemas
  • Code Examples
  • State API
  • Timer API
  • Summary
  • Schemas
  • Handling un-processable Data
  • Error Handling
  • AutoValue Code Generator
  • JSON Data Handling
  • Utilize DoFn Lifecycle
  • Pipeline Optimizations
  • Dataflow and Beam SQL
  • Windowing in SQL
  • Beam DataFrames
  • Beam Notebooks
  • Job List
    Job Info
    Job Graph
    Job Metrics
    Metrics Explorer
  • Logging
  • Error Reporting
  • Troubleshooting Workflow
    Types of Troubles
  • Pipeline Design
  • Data Shape
  • Source, Sinks, and External Systems
  • Shuffle and Streaming Engine
  • Testing and CI/CD Overview
  • Unit Testing
  • Integration Testing
  • Artifact Building
  • Deployment
  • Introduction to Reliability
  • Monitoring
  • Geolocation
  • Disaster Recovery
  • High Availability
  • Classic Templates
  • Flex Templates
  • Using Flex Templates
  • Google provided Templates
  • Summary
    Quick recap of training topics