EtusivuHae koulutuksia & tapahtumiaSRE Essentials

SRE Essentials


Osallistumismuoto

Remote


Kesto

3 päivää


Hinta

3232 €

This course blends the flexibility of self-paced learning with the structure of live, instructor-led sessions. You'll learn from world-class industry experts and gain practical skills to drive meaningful results in your workplace. Our digital platform also empowers you to track your progress and manage your learning journey effectively.

In the intensive three-day workshop engineers and technical professionals will seek to master Site Reliability Engineering (SRE) principles through immersive, hands-on learning. Each module blends concise theory with practical labs, ensuring participants gain real-world skills in reliability, automation, monitoring, and incident management. The event culminates in a comprehensive disaster recovery and postmortem exercise, simulating real incident response and fostering a culture of continuous improvement.

Each module is structured around practical labs that mirror real-world challenges. The final day’s disaster recovery and postmortem exercise brings together all skills learned, providing a safe environment to practice high-stakes incident management and continuous improvement.

By the end of this course, learners will be able to:

  • Apply SRE principles to real-world systems and scenarios
  • Automate operational tasks and implement effective monitoring
  • Respond to incidents using industry-standard procedures
  • Conduct blameless postmortems and drive reliability improvements
  • Collaborate effectively in high-pressure, real-time situations

Participants should have:

  • Basic understanding of Linux/Unix systems and networking
  • Familiarity with scripting (e.g., Bash, Python) is beneficial
  • Prior exposure to cloud platforms or DevOps practices is helpful but not required
  • Willingness to collaborate and participate in practical group exercises

Target audience

This course is designed for:

  • DevOps engineers and platform engineers seeking to deepen their reliability and incident management skills
  • System administrators and operations staff responsible for service uptime and automation
  • Software engineers interested in building reliable, scalable systems and understanding operational best practices
  • Technical leads and engineering managers who want to foster a culture of reliability and continuous improvement within their teams

Introduction to SRE and Reliability Engineering

  • Overview of SRE philosophy and key concepts
  • The role of SRE in modern IT organizations
  • Practical Lab: Setting up your SRE environment and tools

Service Level Objectives (SLOs) and Error Budgets

  • Defining and measuring SLOs, SLIs, and SLAs
  • Error budgets and their impact on release velocity
  • Practical Lab: Creating and tracking SLOs for a sample service

Monitoring, Alerting, and Observability

  • Principles of effective monitoring and alerting
  • Building observability into systems
  • Practical Lab: Implementing monitoring dashboards and alert rules

Automation and Toil Reduction

  • Identifying and eliminating toil through automation
  • Tools and techniques for automating operational tasks
  • Practical Lab: Writing scripts to automate common SRE workflows

Change Management and Release Engineering

  • Safe deployment strategies and change management processes
  • Balancing reliability with innovation
  • Practical Lab: Simulating blue/green and canary deployments

Incident Response Principles

  • Coordinated, well-drilled, and sustainable incident response
  • Roles, responsibilities, and communication during incidents
  • Practical Lab: Simulated incident response exercise

Incident Management Lab and Postmortem

  • Applying SRE incident response principles in a simulated scenario
  • Conducting a blameless postmortem to extract actionable lessons
  • Highly Practical Lab: Full-scale incident simulation, including:
  • Raising and managing an incident
  • Real-time remediation activities
  • Stakeholder communication and documentation
  • Culminating Exercise: Disaster recovery and postmortem analysis, producing actionable outputs and improvement items

Exams and assessments

There is no specific certification associated with this course.

Hands-on learning

  • Practical Lab: Setting up your SRE environment and tools
  • Practical Lab: Creating and tracking SLOs for a sample service
  • Practical Lab: Implementing monitoring dashboards and alert rules
  • Practical Lab: Writing scripts to automate common SRE workflows
  • Practical Lab: Simulating blue/green and canary deployments
  • Practical Lab: Simulated incident response exercise
  • Final incident management and disaster recovery simulation

Self-paced learning

  • Up to 4 hours, completed over a 4-week period prior to the live event.
  • It is recommended that the self-paced learning is completed prior to joining the live event.
  • It is recommended that learners have a minimum of 4 weeks between the course booking and the instructor-led live event to complete the necessary hours of learning.
  • The self-paced learning is available 4 weeks prior to the live event and for 12 months following the live event.

Instructor-led live event

  • This course has a 3-day live event.

Hinta 3232 € +alv

Toteutukset


+ Näytä lisää toteutuksia


Pidätämme oikeudet mahdollisiin muutoksiin ohjelmassa, kouluttajissa ja toteutusmuodossa. 
Katso usein kysytyt kysymykset täältä.