EtusivuHae koulutuksia & tapahtumiaSRE Essentials

SRE Essentials


Osallistumismuoto

Remote


Kesto

3 päivää


Hinta

3135 €

This intensive three-day workshop is designed for engineers and technical professionals seeking to master Site Reliability Engineering (SRE) principles through immersive, hands-on learning. Each module blends concise theory with practical labs, ensuring participants gain real-world skills in reliability, automation, monitoring, and incident management. The event culminates in a comprehensive disaster recovery and postmortem exercise, simulating real incident response and fostering a culture of continuous improvement.

Each module is structured around practical labs that mirror real-world challenges. The final day’s disaster recovery and postmortem exercise brings together all skills learned, providing a safe environment to practice high-stakes incident management and continuous improvement.

By the end of this course, learners will be able to:

  • Apply SRE principles to real-world systems and scenarios
  • Automate operational tasks and implement effective monitoring
  • Respond to incidents using industry-standard procedures
  • Conduct blameless postmortems and drive reliability improvements
  • Collaborate effectively in high-pressure, real-time situations

Participants should have:

  • Basic understanding of Linux/Unix systems and networking
  • Familiarity with scripting (e.g., Bash, Python) is beneficial
  • Prior exposure to cloud platforms or DevOps practices is helpful but not required
  • Willingness to collaborate and participate in practical group exercises

Target Audience

This course is designed for:

  • DevOps engineers and platform engineers seeking to deepen their reliability and incident management skills
  • System administrators and operations staff responsible for service uptime and automation
  • Software engineers interested in building reliable, scalable systems and understanding operational best practices
  • Technical leads and engineering managers who want to foster a culture of reliability and continuous improvement within their teams

1: Introduction to SRE and Reliability Engineering

  • Overview of SRE philosophy and key concepts
  • The role of SRE in modern IT organizations
  • Practical Lab: Setting up your SRE environment and tools

2: Service Level Objectives (SLOs) and Error Budgets

  • Defining and measuring SLOs, SLIs, and SLAs
  • Error budgets and their impact on release velocity
  • Practical Lab: Creating and tracking SLOs for a sample service

3: Monitoring, Alerting, and Observability

  • Principles of effective monitoring and alerting
  • Building observability into systems
  • Practical Lab: Implementing monitoring dashboards and alert rules

4: Automation and Toil Reduction

  • Identifying and eliminating toil through automation
  • Tools and techniques for automating operational tasks
  • Practical Lab: Writing scripts to automate common SRE workflows

5: Change Management and Release Engineering

  • Safe deployment strategies and change management processes
  • Balancing reliability with innovation
  • Practical Lab: Simulating blue/green and canary deployments

6: Incident Response Principles

  • Coordinated, well-drilled, and sustainable incident response
  • Roles, responsibilities, and communication during incidents
  • Practical Lab: Simulated incident response exercise

7: Incident Management Lab and Postmortem

  • Applying SRE incident response principles in a simulated scenario
  • Conducting a blameless postmortem to extract actionable lessons
  • Highly Practical Lab: Full-scale incident simulation, including:
  • Raising and managing an incident
  • Real-time remediation activities
  • Stakeholder communication and documentation
  • Culminating Exercise: Disaster recovery and postmortem analysis, producing actionable outputs and improvement items

Exams and Assessments

There is no specific certification associated with this course.

Hands-On Learning

  • Practical Lab: Setting up your SRE environment and tools
  • Practical Lab: Creating and tracking SLOs for a sample service
  • Practical Lab: Implementing monitoring dashboards and alert rules
  • Practical Lab: Writing scripts to automate common SRE workflows
  • Practical Lab: Simulating blue/green and canary deployments
  • Practical Lab: Simulated incident response exercise
  • Final incident management and disaster recovery simulation

Hinta 3135 € +alv

Toteutukset


+ Näytä lisää toteutuksia


Pidätämme oikeudet mahdollisiin muutoksiin ohjelmassa, kouluttajissa ja toteutusmuodossa. 
Katso usein kysytyt kysymykset täältä.