Lead Cloud Site Reliability Engineer/Manager

New Today

Lead Cloud Site Reliability Engineer, Leadership, (Azure or GCP), SLO's, SLO's, Automation A leading financial Services client is seeking a strong technical leader to help drive and support a large group of SRE engineers across multiple locations. The role will be split 50/50 hands-on, team management. This is an Engineering role, not operations. The role:
Lead and mentor a team of up to 15 SREs, championing continuous improvement and engineering excellence. Partner with application teams as they migrate services to the Cloud. Work with Product Owners and Engineering Leads to balance feature delivery with system reliability, performance and health. Use observability tooling, performance metrics and SRE principles to proactively identify issues and reduce operational toil. Implement Incident and problem management practices, ensuring strong root cause analysis and reduced MTTF/MTTR. Champion SLOs, SLIs, error budgets and reliability‑first thinking. Influence platform direction and engineering standards to help shape resilient cloud services at scale.
Technical Skills required:
Strong team management experience (day-to-day, mentoring/coaching) Strong cloud engineering background, ideally across Azure and GCP. Experience building or operating large‑scale, resilient cloud platforms. Deep understanding of observability tooling (metrics, logs, traces). Hands‑on experience with modern SRE practices: SLOs / SLIs Automation to reduce toil Production readiness and robust post‑mortems Solid understanding of GitHub pipelines and Terraform modules. Proven experience leading high‑performing engineering teams. Ability to communicate complex technical topics in a clear, accessible way. Comfortable working with diverse stakeholder groups.
#J-18808-Ljbffr
Location:
Manchester
Job Type:
FullTime

We found some similar jobs based on your search