Pune, Maharashtra
28 days ago
Senior CloudOps Engineer

Senior CloudOps Engineer

Onit, Inc. is looking for a Sr. CloudOps Engineer to join our team in Pune to help manage and maintain a diverse infrastructure across numerous geographical locations. To be successful in this role, great people skills are a must, as well as a passion for technology. The individual we seek is bright, creative and a problem solver. You must be able to multi-task in a fast-paced environment and be a self-starter with the ability to work independently. 
 

Responsibilities 

Responsible for optimizing performance, ensuring security, and driving innovation in our cloud environment while responding to infrastructure and security alerts in a 24x7x365 operation. 

Create automation, runbooks, and playbooks to help others support the infrastructure 

Troubleshoot infrastructure and application 

-level issues and collaborate with support specialists and Cloud Operations / SRE 

Write and present weekly report highlighting the previous week’s alerts, with detailed analysis, resolution and any impact to SLA. 

Monitor performance and capacity of Onit systems. 

Monitor for hardware, software and environmental alerts or malfunctions. 

Monitor security alerts from multiple sources. 

Triage and troubleshoot problems as they arise, following runbooks and standard operating procedures. 

Track all issues from start to finish and document in detail all resolutions, across trouble ticketing system and engineering runbooks. 

Escalate issues to InfraOps/Devops engineers and Onit management. 

Ready to work in shifts. 

Requirements 

Bachelor’s degree in Computer Science or equivalent experience is required. 

4+ years’ experience with Red Hat Enterprise or Amazon Linux 2023 is required. 

3+ years hands-n experience with AWS (EC2, S3, RDS, VPC, Cloudwatch, CloudTrail, IAM, EKS, ECS, Security, etc.) 

A solid understanding of the components that make up production systems (Memory, CPU, Disk space, Disk i/o, Network i/o, etc.) is required. 

Strong experience with monitoring, alerting, and log aggregation tools: Datadog, AWS CloudWatch, PagerDuty, Statuspage. 

Experience with SIEM/event correlation systems like Elastic, Splunk, ELK, etc. required. 

Strong understanding of AWS security and monitoring and experience implementing best practices. 

Ability to read and interpret application server logs, outputs, CloudTrail and other critical logging output 

Experience working with Relational Database such as Postgres, AWS RDS is a plus 

Hands-on experience working in Kubernetes is a plus 

Experience with Enterprise Web applications in production  

Experience with a programming language such as Python a plus 

Excellent troubleshooting skills required. 

Ensure resource availability and allocation 

Excellent written and verbal communication skills required. 

Experience using Git (GitLab a plus), CI/CD pipelines (eg: Jenkins) 

Confirm your E-mail: Send Email