← Back to opportunities
7+ years of experience as a System Engineer, Site Reliability Engineer (SRE), or in IT Operations Proficiency with monitoring and observability tools such as Grafana, Datadog, Splunk Strong understanding of logging frameworks and best practices (e.g., Fluentd, Logstash, Loki) Hands-on knowledge of infrastructure monitoring (CPU, memory, network, IOPS) and application performance monitoring (APM) Proficiency in scripting and automation (Bash, PowerShell or Python) for automating incident responses and log parsing Familiarity with containerized environments on Docker and monitoring their performance Experience integrating monitoring and alerting tools into CI/CD pipelines Understanding of non-functional requirements (NFRs) such as performance, reliability, availability, and observability Ability to work within cross-functional teams, integrating monitoring and incident management into daily workflows
About the Role
Nice to ha...
Ready to Join Through a Referral?
Apply now and get connected directly with the hiring team
Apply for this Position