Serverless Data Processing with Dataflow

Beginning with foundations, this training explains how Apache Beam and Dataflow work together to meet your data processing needs without the risk of vendor lock-in. The section on developing pipelines covers how you convert your business logic into data processing applications that can run on Dataflow. This training culminates with a focus on operations, which reviews the most important lessons for operating a data application on Dataflow, including monitoring, troubleshooting, testing, and reliability.

Objetivos

In this course, participants will learn the following skills:

  • Demonstrate how Apache Beam and Dataflow work together to fulfill your organization’s data processing needs.
  • Summarize the benefits of the Beam Portability Framework and enable it for your Dataflow pipelines.
  • Enable Shuffle and Streaming Engine, for batch and streaming pipelines respectively, for maximum performance.
  • Enable Flexible Resource Scheduling for more cost-efficient performance.
  • Select the right combination of IAM permissions for your Dataflow job.
  • Implement best practices for a secure data processing environment.
  • Select and tune the I/O of your choice for your Dataflow pipeline.
  • Use schemas to simplify your Beam code and improve the performance of your pipeline.
  • Develop a Beam pipeline using SQL and DataFrames.
  • Perform monitoring, troubleshooting, testing and CI/CD on Dataflow pipelines.

Público-Alvo

This training is intended for big data practitioners who want to further their understanding of Dataflow in order to advance their data processing applications, including:
  • Data Engineer
  • Data Analysts and Data Scientists aspiring to develop Data Engineering skills

Pré-requisitos

Para aproveitar ao máximo este curso, os participantes precisam atender aos seguintes critérios:

  • Completed Google Cloud Fundamentals Big Data and Machine Learning course OR  have equivalent experience.
  • Basic proficiency with common query language such as SQL
  • Experience with data modeling, extract, transform, load activities
  • Developing applications using a common programming language such as Python
  • Completed “Building Batch Data Pipelines”
    and “Building Resilient Streaming Analytics Systems” or Data Engineering on Google Cloud

Duração

24 horas (3 dias)

Investimento

Consulte o valor atualizado e próximas datas para turmas abertas em nossa página de inscrições. Caso tenha interesse em uma turma fechada para sua empresa, entre em contato conosco.
Dependências de outros cursos e certificações com o curso de Serverless Data Processing with Dataflow
Dependências de outros cursos e certificações com o curso de Serverless Data Processing with Dataflow

Resumo do curso

O curso inclui apresentações, demonstrações e laboratórios práticos.
  • Course Introduction
  • Beam and Dataflow Refresher
  • Beam Portability
  • Runner v2
  • Container Environments
  • Cross Language TransformS
  • Dataflow
  • Dataflow Shuffle Service
  • Dataflow Streaming Engine
  • Flexible Resource Scheduling
  • Data Locality
  • Shared VPC
  • Private IPs
  • CMEK
  • Beam Basics
  • Utility Transforms
  • DoFn Lifecycle
  • Windows
  • Watermarks
  • Triggers
  • Sources and Sinks
  • Text IO and File IO
  • BigQuery IO
  • PubSub IO
  • Kafka IO
  • Bigtable IO
  • Avro IO
    – Splittable DoFn
  • Beam Schemas
  • Code Examples
  • State API
  • Timer API
  • Summary
  • Schemas
  • Handling un-processable Data
  • Error Handling
  • AutoValue Code Generator
  • JSON Data Handling
  • Utilize DoFn Lifecycle
  • Pipeline Optimizations
  • Dataflow and Beam SQL
  • Windowing in SQL
  • Beam DataFrames
  • Beam Notebooks
  • Job List
    Job Info
    Job Graph
    Job Metrics
    Metrics Explorer
  • Logging
  • Error Reporting
  • Troubleshooting Workflow
    Types of Troubles
  • Pipeline Design
  • Data Shape
  • Source, Sinks, and External Systems
  • Shuffle and Streaming Engine
  • Testing and CI/CD Overview
  • Unit Testing
  • Integration Testing
  • Artifact Building
  • Deployment
  • Introduction to Reliability
  • Monitoring
  • Geolocation
  • Disaster Recovery
  • High Availability
  • Classic Templates
  • Flex Templates
  • Using Flex Templates
  • Google provided Templates
  • Summary
    Quick recap of training topics