Koulutus

Fundamentals of Accelerated Data Science

Access expert-led QA training live online, wherever you learn best.

Ajankohta

7.7.2026

online

QA On-Line Virtual Centre

Ajankohta

7.7.2026

online

QA On-Line Virtual Centre

Osta liput

Overview

Data science is about using scientific methods, processes, algorithms, and systems to analyse and extract insight from data. We believe organisations that master AI, Cloud, and Data can turn information into a competitive advantage. This hands-on workshop demonstrates how GPU-accelerated tools can transform data science workflows, enabling faster experimentation, greater scalability, and more cost-effective outcomes.

Across the workshop, learners use RAPIDS libraries to accelerate data manipulation, machine learning, and graph analytics. Participants work with cuDF, cuML, cuGraph, and related tools to process large and larger-than-memory datasets. The course culminates in a population-scale project that applies GPU-accelerated analytics to simulate and respond to an epidemic affecting the UK, reinforcing practical, real-world application of skills.

Prerequisites

Participants should have:

Experience with Python programming and common data science libraries
A foundational understanding of data manipulation and analysis concepts
Familiarity with basic machine learning principles
Awareness of pandas or similar dataframe-based workflows

Target audience

This course is designed for:

Data scientists seeking to accelerate existing Python workflows
Machine learning engineers working with large or complex datasets
Developers and analysts exploring GPU-accelerated analytics
Organisations aiming to scale data science capabilities efficiently

Objectives

By the end of this workshop, learners will be able to:

Use cuDF to accelerate pandas, Polars, and Dask workflows for analysing datasets of varying sizes
Ingest and prepare large and larger-than-memory datasets directly on single or multiple GPUs
Apply GPU-accelerated supervised and unsupervised machine learning algorithms using cuML
Use algorithms such as XGBoost to address a range of data science problems
Create and analyse complex network data using NetworkX and cuGraph
Deploy machine learning models to an NVIDIA Triton Inference Server for optimised performance
Integrate multiple large datasets to perform iterative, real-world analysis tasks

Outline

Introduction and environment setup

Meet the instructor and review workshop objectives
Set up access to the training environment
Overview of GPU-accelerated data science and the RAPIDS ecosystem
Understanding the role of GPUs in scalable analytics

GPU-accelerated data manipulation

Ingest and prepare several datasets, including larger-than-memory data, for use in downstream machine learning tasks:

Reading data directly to single and multiple GPUs using pandas, Polars, cuDF, and Dask
Comparing CPU-based and GPU-accelerated dataframe operations
Cleaning and transforming structured datasets on the GPU
Preparing population, road network, and clinic datasets for analysis
Managing memory constraints and performance considerations
Building repeatable, scalable data preparation pipelines

GPU-accelerated machine learning

Apply essential machine learning techniques to prepared datasets using GPU-accelerated libraries:

Introduction to cuML and GPU-based model training
Using supervised learning algorithms for predictive modelling
Applying unsupervised learning techniques for clustering and pattern discovery
Leveraging XGBoost for classification and regression tasks
Evaluating model performance and refining hyperparameters
Understanding performance trade-offs between CPU and GPU workflows

Graph analytics on the GPU

Perform advanced graph analytics to analyse complex networks:

Introduction to graph data structures and network analysis concepts
Creating graph data on the GPU using cuGraph
Analysing connectivity, centrality, and path-based metrics
Comparing NetworkX and cuGraph implementations
Scaling graph analytics to large, population-scale datasets

Project: data analysis to support the UK during a simulated epidemic

Apply new GPU-accelerated data manipulation and analysis skills to a population-scale scenario:

Integrating multiple massive datasets using RAPIDS libraries
Performing real-world analysis to model and respond to a simulated epidemic affecting the UK population
Pivoting and iterating on analysis as new simulated daily data becomes available
Identifying patterns and insights to inform intervention strategies
Communicating findings clearly and effectively

Inference and deployment considerations

Preparing trained machine learning models for inference
Deploying models to an NVIDIA Triton Inference Server
Validating model performance in a live inference context
Understanding scalability and operational considerations for production environments

Exams and assessments

Learners complete practical, scenario-based exercises throughout the workshop, culminating in a project that integrates data manipulation, machine learning, and graph analytics techniques.

Assessment is based on applied tasks that evaluate the ability to prepare data, train GPU-accelerated models, perform graph analysis, and interpret results within the simulated epidemic scenario.

Hands-on learning

This workshop is built around applied, GPU-enabled practice:

Guided exercises using RAPIDS libraries including cuDF, cuML, and cuGraph
Real-world datasets reflecting population, infrastructure, and healthcare contexts
Iterative experimentation enabled by accelerated compute performance
Project-based learning focused on solving a complex, evolving problem

Osta liput

QA’s online-courses from Tieturi

Questions about QA courses?

Find out how QA’s live online courses work, what you need to participate, and what to expect before booking your training.

Read the QA course FAQ

Accreditation and trademark notice

ITIL® and PRINCE2® courses are provided by QA Ltd, an ATO of People Cert.

TOGAF® is a registered trademark of The Open Group.