EtusivuHae koulutuksia & tapahtumiaApache Spark Programming with Databricks

Apache Spark Programming with Databricks


Osallistumismuoto

Remote


Kesto

2 päivää


Hinta

1367 €

This course provides an in-depth exploration of Apache Spark and Delta Lake on Databricks, focusing on the core architectural components of Spark, the DataFrame API, and Structured Streaming. Participants will learn how to efficiently read, transform, and aggregate data using SparkSQL and the DataFrame API. The course also covers user-defined functions (UDFs), query optimization, partitioning strategies, and the advantages of Delta Lake for improving data pipelines. By the end of the course, learners will be able to execute streaming queries and understand how Delta Lake enhances real-time data processing.

By the end of this course, learners will be able to:

  • Describe the architecture and core components of Apache Spark.
  • Implement data transformations using the DataFrame API.
  • Optimise Spark queries for performance improvements.
  • Apply partitioning strategies to manage large datasets efficiently.
  • Use Structured Streaming to process real-time data.
  • Implement Delta Lake to enhance data reliability and performance.

Participants should have:

  • Familiarity with Python and fundamental programming concepts, including data types, lists, dictionaries, variables, functions, loops, conditional statements, exception handling, accessing classes, and using third-party libraries.
  • Basic knowledge of SQL, including writing queries using SELECT, WHERE, GROUP BY, ORDER BY, LIMIT, and JOIN.

If you do not have one or more of the pre-requisites QA recommends:

Target Audience

This course is designed for:

  • Data engineers and data scientists looking to enhance their Spark programming skills.
  • Developers who want to leverage Apache Spark and Delta Lake on Databricks.
  • Professionals working with large-scale data processing and real-time analytics.

Introduction to Spark and Databricks

  • Overview of Apache Spark and its role in big data processing.
  • Introduction to the Databricks platform.

Working with SparkSQL and DataFrames

  • Understanding SparkSQL and its use cases.
  • DataFrame operations: reading, writing, transformations, and aggregations.
  • Working with complex data types and datetime functions.

Optimisation and Performance Tuning

  • Introduction to Spark internals and execution plans.
  • Query optimization techniques and best practices.
  • Implementing partitioning strategies to improve performance.

User-Defined Functions and Advanced APIs

  • Creating and using user-defined functions (UDFs).
  • Vectorized UDFs for efficient data processing.

Streaming and Real-Time Processing

  • Introduction to Spark’s Structured Streaming API.
  • Executing and managing real-time streaming queries.

Delta Lake and Data Reliability

  • Understanding the advantages of Delta Lake.
  • Implementing Delta Lake for scalable and reliable data pipelines.

Exams and Assessments

This course does not include formal assessments.

Hands-On Learning

This course includes:

  • Practical exercises using Apache Spark on Databricks.
  • Hands-on labs to implement and optimise Spark queries.
  • Guided projects focusing on real-time data processing with Structured Streaming and Delta Lake.

Hinta 1367 € +alv

Toteutukset


+ Näytä lisää toteutuksia


Pidätämme oikeudet mahdollisiin muutoksiin ohjelmassa, kouluttajissa ja toteutusmuodossa. 
Katso usein kysytyt kysymykset täältä.