Chennai, Tamil Nadu, India
4 hours ago
SRE - Observability Engineer

Job Title: SRE &Observability Engineer

The Security SRE & Observability platform team, EPEO is looking for a passionate, experienced SRE & Observability Engineer who is excited to foray into any technology and improve the observability, reliability of security platforms. The successful candidate will be responsible for implementing Site/Security reliability engineering, developing observability dashboard, developing automations. This role will involve collaborating with cross-functional teams to ensure the reliability, scalability, and performance of systems while driving the implementation of automation to streamline operations.

YOUR TYPICAL DAY HERE WOULD BE:

Design, automate and manage a highly available and scalable cloud deployment that allows development teams to deploy and run their services. Collaborating with engineering and Architects teams to evaluate and identify optimal cloud solutions, also leveraging scalability, high-performance and security. Design, build, observability dashboards using Dynatrace, Grafana Develop and implement strategies to improve system reliability, performance, and efficiency. Analyze performance data, identify bottlenecks, and implement solutions to optimize system performance. Develop and implement automation solutions to streamline operations, improve efficiency, and reduce manual intervention in infrastructure management and deployment processes. Collaborate with security service teams in implementing Security Reliability Engineering.  Define and implement best practices for monitoring, alerting, and incident response to proactively identify and address potential issues. Drive continuous improvement initiatives to enhance the reliability, scalability, and performance of our systems through automation and infrastructure optimization. Collaborate with development teams to identify and address performance bottlenecks. Regularly reviewing existing systems and making recommendations for improvements.

WHAT YOUR SKILLSET LOOKS LIKE:

A relevant Bachelor's or Master’s Degree in computer science / engineering  7+ years of experience as a Site Reliability Engineer, DevOps Engineer, or similar role. Hands on Experience with Dynatrace, Grafana, SPLUNK tools  Experience setting up logging and monitoring services (Dynatrace, Grafana, GCP Ops Suites), developing dashboards Understanding of incident management processes and best practices. Expertise in automation and scripting languages (Python, Go, Bash). Proven work experience in designing, deploying and operating mid to large scale public cloud environments.  Knowledge of GCP and configuring infrastructure using infrastructure-as-a-code libraries like Terraform Demonstrated ability to drive continuous improvement and innovation in SRE and automation practices Excellent problem-solving and analytical skills. 

WOULD BE GREAT IF YOU ALSO BRING:

GCP / DevOps Certification
Confirm your E-mail: Send Email