Senior Site Reliability Engineer (Application / API Focused), Greater London

Senior Site Reliability Engineer (Application / API Focused)

24 Days Old

Senior Site Reliability Engineer (Application / API Focused) Location: London (Hybrid) Salary: £100,000 per annum + 25% Bonus + Excellent Benefits We are hiring a Senior SRE to support a large-scale digital organisation undergoing a major commercial re-platforming across web and mobile channels. This role sits much closer to the application layer than traditional infrastructure SRE positions. You will work directly with product and engineering teams across customer-facing platforms (web, mobile, payment journeys, APIs) to improve reliability, resilience, and service behaviour in production. This is not a ticket-driven operational role and not a pure platform engineering post. It is about embedding measurable reliability into distributed systems at service level. What You’ll Be Doing

Embed SRE practices across API and microservices-based architectures Define and own meaningful SLIs/SLOs aligned to customer journeys and business-critical flows Improve service reliability through proactive observability, tracing, telemetry and alert tuning Partner closely with backend and platform engineers to reduce systemic failure modes Lead and contribute to incident response, post-incident reviews and resilience improvements Move the organisation from symptom-based alerting to customer-impact driven diagnostics Contribute to release safety, progressive deployments and production guardrails

What We’re Looking For

Proven experience operating as an SRE within digital product environments Strong understanding of API architectures, microservices and distributed systems behaviour Hands-on experience defining and implementing SLIs, SLOs and error budgets Deep observability exposure (e.g. Datadog, Splunk, Prometheus, tracing/APM platforms) Experience working closely with application engineering teams, not just infrastructure teams Background in high-availability, customer-facing systems where outages have commercial impact Cloud-native exposure (AWS preferred) with practical understanding of Kubernetes environments

Important This role is best suited to engineers who care deeply about production behaviour, customer experience in failure scenarios, and reliability as a first-class product feature, rather than engineers focused purely on infrastructure provisioning or CI/CD enablement. Please get in touch with Benjamin Applewhaite to discuss the role in confidence. #J-18808-Ljbffr

Apply

Location:: Greater London
Job Type:: FullTime

Start a New Search