Koulutus
Overview
Data science is about using scientific methods, processes, algorithms, and systems to analyse and extract insight from data. We believe organisations that master AI, Cloud, and Data can turn information into a competitive advantage. This hands-on workshop demonstrates how GPU-accelerated tools can transform data science workflows, enabling faster experimentation, greater scalability, and more cost-effective outcomes.
Across the workshop, learners use RAPIDS libraries to accelerate data manipulation, machine learning, and graph analytics. Participants work with cuDF, cuML, cuGraph, and related tools to process large and larger-than-memory datasets. The course culminates in a population-scale project that applies GPU-accelerated analytics to simulate and respond to an epidemic affecting the UK, reinforcing practical, real-world application of skills.
Prerequisites
Participants should have:
- Experience with Python programming and common data science libraries
- A foundational understanding of data manipulation and analysis concepts
- Familiarity with basic machine learning principles
- Awareness of pandas or similar dataframe-based workflows
Target audience
This course is designed for:
- Data scientists seeking to accelerate existing Python workflows
- Machine learning engineers working with large or complex datasets
- Developers and analysts exploring GPU-accelerated analytics
- Organisations aiming to scale data science capabilities efficiently
Objectives
By the end of this workshop, learners will be able to:
- Use cuDF to accelerate pandas, Polars, and Dask workflows for analysing datasets of varying sizes
- Ingest and prepare large and larger-than-memory datasets directly on single or multiple GPUs
- Apply GPU-accelerated supervised and unsupervised machine learning algorithms using cuML
- Use algorithms such as XGBoost to address a range of data science problems
- Create and analyse complex network data using NetworkX and cuGraph
- Deploy machine learning models to an NVIDIA Triton Inference Server for optimised performance
- Integrate multiple large datasets to perform iterative, real-world analysis tasks
Outline
Introduction and environment setup
- Meet the instructor and review workshop objectives
- Set up access to the training environment
- Overview of GPU-accelerated data science and the RAPIDS ecosystem
- Understanding the role of GPUs in scalable analytics
GPU-accelerated data manipulation
Ingest and prepare several datasets, including larger-than-memory data, for use in downstream machine learning tasks:
- Reading data directly to single and multiple GPUs using pandas, Polars, cuDF, and Dask
- Comparing CPU-based and GPU-accelerated dataframe operations
- Cleaning and transforming structured datasets on the GPU
- Preparing population, road network, and clinic datasets for analysis
- Managing memory constraints and performance considerations
- Building repeatable, scalable data preparation pipelines
GPU-accelerated machine learning
Apply essential machine learning techniques to prepared datasets using GPU-accelerated libraries:
- Introduction to cuML and GPU-based model training
- Using supervised learning algorithms for predictive modelling
- Applying unsupervised learning techniques for clustering and pattern discovery
- Leveraging XGBoost for classification and regression tasks
- Evaluating model performance and refining hyperparameters
- Understanding performance trade-offs between CPU and GPU workflows
Graph analytics on the GPU
Perform advanced graph analytics to analyse complex networks:
- Introduction to graph data structures and network analysis concepts
- Creating graph data on the GPU using cuGraph
- Analysing connectivity, centrality, and path-based metrics
- Comparing NetworkX and cuGraph implementations
- Scaling graph analytics to large, population-scale datasets
Project: data analysis to support the UK during a simulated epidemic
Apply new GPU-accelerated data manipulation and analysis skills to a population-scale scenario:
- Integrating multiple massive datasets using RAPIDS libraries
- Performing real-world analysis to model and respond to a simulated epidemic affecting the UK population
- Pivoting and iterating on analysis as new simulated daily data becomes available
- Identifying patterns and insights to inform intervention strategies
- Communicating findings clearly and effectively
Inference and deployment considerations
- Preparing trained machine learning models for inference
- Deploying models to an NVIDIA Triton Inference Server
- Validating model performance in a live inference context
- Understanding scalability and operational considerations for production environments
Exams and assessments
Learners complete practical, scenario-based exercises throughout the workshop, culminating in a project that integrates data manipulation, machine learning, and graph analytics techniques.
Assessment is based on applied tasks that evaluate the ability to prepare data, train GPU-accelerated models, perform graph analysis, and interpret results within the simulated epidemic scenario.
Hands-on learning
This workshop is built around applied, GPU-enabled practice:
- Guided exercises using RAPIDS libraries including cuDF, cuML, and cuGraph
- Real-world datasets reflecting population, infrastructure, and healthcare contexts
- Iterative experimentation enabled by accelerated compute performance
- Project-based learning focused on solving a complex, evolving problem
Osta liput
QA’s online-courses from Tieturi
Questions about QA courses?
Find out how QA’s live online courses work, what you need to participate, and what to expect before booking your training.
Accreditation and trademark notice
ITIL® and PRINCE2® courses are provided by QA Ltd, an ATO of People Cert.
ITIL®, PRINCE2® are registered trademarks of the PeopleCert group. Used under licence from PeopleCert. All rights reserved.
TOGAF® is a registered trademark of The Open Group.