Koulutus
Overview
Get hands-on experience designing and building data processing systems on Google Cloud. This course uses lectures, demos, and hands-on labs to show you how to design data processing systems, build end-to-end data pipelines, and analyze data. This course covers structured, unstructured, and streaming data.
Products:
- BigQuery
- Bigtable
- Cloud Storage
- Cloud SQL
- Spanner
- Dataproc
- Dataflow
- Cloud Data Fusion
- Cloud Composer
- Pub/Sub
Prerequisites
Participants should have:
- Prior Google Cloud experience using Cloud Shell and accessing products from the Google Cloud console.
- Basic proficiency with a common query language such as SQL.
- Experience with data modeling and ETL (extract, transform, load) activities.
- Experience developing applications using a common programming language such as Python
Target audience
This course is designed for:
- Data engineers
- Database administrators
- System administrators
Objectives
By the end of this course, learners will be able to:
- Design and build data processing systems on Google Cloud.
- Process batch and streaming data by implementing autoscaling data pipelines on Dataflow.
- Derive business insights from extremely large datasets using BigQuery.
- Leverage unstructured data using Spark and ML APIs on Dataproc.
- Enable instant insights from streaming data.
Outline
Module 01: Data engineering tasks and components
- The role of a data engineer
- Data sources versus data syncs
- Data formats
- Storage solution options on Google Cloud
- Metadata management options on Google Cloud
- Share datasets using Analytics Hub
Module 02: Data replication and migration
- Replication and migration architecture
- The gcloud command line tool
- Moving datasets
- Datastream
Module 03: The extract and load data pipeline pattern
- Extract and load architecture
- The bq command line tool
- BigQuery Data Transfer Service
- BigLake
Module 04: The extract, load, and transform data pipeline pattern
- Extract, load, and transform (ELT) architecture
- SQL scripting and scheduling with BigQuery
- Dataform
Module 05: The extract, transform, and load data pipeline pattern
- Extract, transform, and load (ETL) architecture
- Google Cloud GUI tools for ETL data pipelines
- Batch data processing using Dataproc
- Streaming data processing options
- Bigtable and data pipelines
Module 06: Automation techniques
- Automation patterns and options for pipelines
- Cloud Scheduler and Workflows
- Cloud Composer
- Cloud Run functions
- Eventarc
Module 07: Introduction to data engineering
- Data engineer’s role
- Data engineering challenges
- Introduction to BigQuery
- Data lakes and data warehouses
- Transactional databases versus data warehouses
- Effective partnership with other data teams
- Management of data access and governance
- Building of production-ready pipelines
- Google Cloud customer case study
Module 08: Build a Data Lake
- Introduction to data lakes
- Data storage and ETL options on Google Cloud
- Building of a data lake using Cloud Storage
- Secure Cloud Storage
- Store all sorts of data types
- Cloud SQL as your OLTP system
Module 09: Build a data warehouse
- The modern data warehouse
- Introduction to BigQuery
- Get started with BigQuery
- Loading of data into BigQuery
- Exploration of schemas
- Schema design
- Nested and repeated fields
- Optimization with partitioning and clustering
Module 10: Introduction to building batch data pipelines
- EL, ELT, ETL
- Quality considerations
- Ways of executing operations in BigQuery
- Shortcomings
- ETL to solve data quality issues
Module 11: Execute Spark on Dataproc
- The Hadoop ecosystem
- Run Hadoop on Dataproc
- Cloud Storage instead of HDFS
- Optimize Dataproc
Module 12: Serverless data processing with Dataflow
- Introduction to Dataflow
- Reasons why customers value Dataflow
- Dataflow pipelines
- Aggregating with GroupByKey and Combine
- Side inputs and windows
- Dataflow templates
Module 13: Manage data pipelines with Cloud Data Fusion and Cloud Composer
- Build batch data pipelines visually with Cloud Data Fusion
- Components
- Overview
- Building a pipeline
- Exploring data using Wrangler
- Orchestrate work between Google Cloud services with Cloud Composer
- Apache Airflow environment
- DAGs and operators
- Workflow scheduling
- Monitoring and logging
Module 14: Serverless messaging with Pub/Sub
- Introduction to Pub/Sub
- Pub/Sub push versus pull
- Publishing with Pub/Sub code
Module 16: Dataflow streaming features
- Streaming data challenges
- Dataflow windowing
Module 17: High-throughput BigQuery and Bigtable streaming features
- Streaming into BigQuery and visualizing results
- High-throughput streaming with Bigtable
- Optimizing Bigtable performance
Module 18: Advanced BigQuery functionality and performance
- Analytic window functions
- GIS functions
- Performance considerations
Exams and assessments
There is no specific certification related to this course.
Hands-on learning
There are practical labs in this course.
Osta liput
QA’s online-courses from Tieturi
Questions about QA courses?
Find out how QA’s live online courses work, what you need to participate, and what to expect before booking your training.
Accreditation and trademark notice
ITIL® and PRINCE2® courses are provided by QA Ltd, an ATO of People Cert.
ITIL®, PRINCE2® are registered trademarks of the PeopleCert group. Used under licence from PeopleCert. All rights reserved.
TOGAF® is a registered trademark of The Open Group.