← Back to opportunities
Own and improve the reliability, availability, and performance of production services in Google Cloud (GCP).
Participate in incident management, including detection, triage, mitigation, escalation, and recovery.
Use and improve incident workflows and tooling (e.g., ServiceNow) to ensure clear ownership and timely communication.
Design, implement, and operate observability solutions including monitoring, logging, tracing, synthetics, and dashboards (e.g., Splunk Observability, OpenTelemetry).
Reduce operational toil through automation and engineering-led solutions, proactively introducing a...
About the Role
The Senior Site Reliability Engineer (SRE) is responsible for ensuring the reliability, availability, performance, and operability of production systems across our platforms, by applying software engineering practices to operations, with a focus on automation, observability, and incident response.
Responsibilities:
Ready to Join Through a Referral?
Apply now and get connected directly with the hiring team
Apply for this Position