EtusivuHae koulutuksia & tapahtumiaCertified DevOps Site Reliability Engineering (SRE) Practitioner

Certified DevOps Site Reliability Engineering (SRE) Practitioner


Koulutusmuoto

Remote


Kesto

3 päivää


Hinta

1870 €

This advanced course is designed for experienced site reliability engineers (SREs) looking to deepen their knowledge and practical skills in implementing and managing reliability engineering principles at scale. Participants will explore anti-patterns, service level objectives (SLOs), observability, chaos engineering, incident response, and automation. The course includes real-world case studies, hands-on exercises, and group discussions to reinforce learning and application in professional environments.

By the end of this course, learners will be able to:

  • Identify and mitigate SRE anti-patterns to improve reliability.
  • Define and implement Service Level Objectives (SLOs) aligned with business needs.
  • Apply full-stack observability to monitor system health and detect failures.
  • Use AIOps and platform engineering to enhance automation and efficiency.
  • Implement incident response management best practices.
  • Explore chaos engineering techniques to build resilient systems.
  • Understand how SRE integrates with DevOps methodologies.

Participants should have:

  • A foundational understanding of Site Reliability Engineering (SRE) principles.
  • Prior completion of the SRE Foundation certification (mandatory).
  • Experience with DevOps practices, system administration, or software development.
  • Familiarity with incident response, monitoring, and automation.

Target Audience

This course is ideal for:

  • Site Reliability Engineers (SREs) aiming to advance their expertise.
  • DevOps engineers and software developers seeking reliability best practices.
  • IT operations professionals responsible for maintaining highly available services.
  • Engineering managers and technical leaders looking to implement SRE strategies.

SRE Anti-Patterns

  • Common reliability pitfalls and how to avoid them.
  • Case study: Monzo Bank's reliability failures and lessons learned.
  • Conducting blameless postmortems and retrospectives.

Service Level Objectives (SLOs) – The Proxy for Customer Happiness

  • Establishing SLOs, SLIs, and error budgets.
  • Case studies: Kudos Engineering and Home Depot’s SLO implementation.
  • Practical exercise: Obtaining service credits.

Full-Stack Observability

  • Implementing end-to-end monitoring, logging, and alerting.
  • Reducing false positives and alert fatigue.

Using Platform Engineering & AIOps

  • Leveraging automation and AI-driven operations to enhance system reliability.

SRE & Incident Response Management

  • Best practices for incident response and on-call management.
  • The role of incident command systems.

Chaos Engineering

  • Designing fault injection experiments to improve system resilience.
  • Case study: How Netflix uses chaos engineering.

SRE as a Form of DevOps

  • Bridging software engineering and system operations.
  • Implementing SRE culture and best practices in organisations.

Exams and Assessments

  • An exam voucher is included which can be used to attempt the exam separately from the course.
  • SRE Practitioner Exam (DevOps Institute accreditation).
    • 40 multiple-choice questions
    • 90 minutes duration
    • 65% pass mark
  • Hands-on exercises and knowledge checks throughout the course.

Hinta 1870 € +alv

Toteutukset