SRE Essentials
Osallistumismuoto
Remote
Kesto
3 päivää
Hinta
3135 €
This intensive three-day workshop is designed for engineers and technical professionals seeking to master Site Reliability Engineering (SRE) principles through immersive, hands-on learning. Each module blends concise theory with practical labs, ensuring participants gain real-world skills in reliability, automation, monitoring, and incident management. The event culminates in a comprehensive disaster recovery and postmortem exercise, simulating real incident response and fostering a culture of continuous improvement.
Each module is structured around practical labs that mirror real-world challenges. The final day’s disaster recovery and postmortem exercise brings together all skills learned, providing a safe environment to practice high-stakes incident management and continuous improvement.
By the end of this course, learners will be able to:
- Apply SRE principles to real-world systems and scenarios
- Automate operational tasks and implement effective monitoring
- Respond to incidents using industry-standard procedures
- Conduct blameless postmortems and drive reliability improvements
- Collaborate effectively in high-pressure, real-time situations
Participants should have:
- Basic understanding of Linux/Unix systems and networking
- Familiarity with scripting (e.g., Bash, Python) is beneficial
- Prior exposure to cloud platforms or DevOps practices is helpful but not required
- Willingness to collaborate and participate in practical group exercises
Target Audience
This course is designed for:
- DevOps engineers and platform engineers seeking to deepen their reliability and incident management skills
- System administrators and operations staff responsible for service uptime and automation
- Software engineers interested in building reliable, scalable systems and understanding operational best practices
- Technical leads and engineering managers who want to foster a culture of reliability and continuous improvement within their teams
1: Introduction to SRE and Reliability Engineering
- Overview of SRE philosophy and key concepts
- The role of SRE in modern IT organizations
- Practical Lab: Setting up your SRE environment and tools
2: Service Level Objectives (SLOs) and Error Budgets
- Defining and measuring SLOs, SLIs, and SLAs
- Error budgets and their impact on release velocity
- Practical Lab: Creating and tracking SLOs for a sample service
3: Monitoring, Alerting, and Observability
- Principles of effective monitoring and alerting
- Building observability into systems
- Practical Lab: Implementing monitoring dashboards and alert rules
4: Automation and Toil Reduction
- Identifying and eliminating toil through automation
- Tools and techniques for automating operational tasks
- Practical Lab: Writing scripts to automate common SRE workflows
5: Change Management and Release Engineering
- Safe deployment strategies and change management processes
- Balancing reliability with innovation
- Practical Lab: Simulating blue/green and canary deployments
6: Incident Response Principles
- Coordinated, well-drilled, and sustainable incident response
- Roles, responsibilities, and communication during incidents
- Practical Lab: Simulated incident response exercise
7: Incident Management Lab and Postmortem
- Applying SRE incident response principles in a simulated scenario
- Conducting a blameless postmortem to extract actionable lessons
- Highly Practical Lab: Full-scale incident simulation, including:
- Raising and managing an incident
- Real-time remediation activities
- Stakeholder communication and documentation
- Culminating Exercise: Disaster recovery and postmortem analysis, producing actionable outputs and improvement items
Exams and Assessments
There is no specific certification associated with this course.
Hands-On Learning
- Practical Lab: Setting up your SRE environment and tools
- Practical Lab: Creating and tracking SLOs for a sample service
- Practical Lab: Implementing monitoring dashboards and alert rules
- Practical Lab: Writing scripts to automate common SRE workflows
- Practical Lab: Simulating blue/green and canary deployments
- Practical Lab: Simulated incident response exercise
- Final incident management and disaster recovery simulation
Hinta 3135 € +alv
Pidätämme oikeudet mahdollisiin muutoksiin ohjelmassa, kouluttajissa ja toteutusmuodossa.
Katso usein kysytyt kysymykset täältä.
