Romania
158 days ago
Site Reliability Engineer (SRE) - Hybrid Working

We have developed a Cloud Services platform, based on microservices architecture, hosting SaaS applications for large enterprises and internet service providers. As we scale our offerings, we are expanding our support capabilities in Europe. This position involves providing site reliability engineering support needed by our customers. 

Responsibilities:

The candidate will be part of the Site Reliability engineering team whose mission is to manage cloud operations, deploy microservices and monitor the overall health. The engineer will also take proactive measures to exceed customer SLAs, plan capacity, understand the overall cloud services platform architecture, and the end to end customer solution needs.

· Help in getting the solution deployed for the customer during POC and working with them to ensure all their design needs are met.

· Operate, maintain and support production systems/applications; ensure that the systems are accessible and available

· Deploy, maintain and improve availability and performance of production environment to ensure high quality through early detection of issues

· Develop metrics to monitor health and security of applications and micro services running on cloud in AWS infrastructure

· Deploy code and infrastructure in AWS using continuous integration/continuous delivery (CI/CD) tools for delivering features, fixes and system updates in development, integration and production environments

· Ensure systems availability to adhere to customers SLAs and plan capacity

· Participate in change management process, as appropriate

· Escalate issues to engineering team, as appropriate, and be the point of contact for customers until the issue is resolved.

· Work with customers to deliver high quality service and look for constant improvements/optimizations in the way things are done.

· Work with the development team in resolving the issues found.

 

Minimum Qualifications:

· Possess outstanding problem-solving skills in the diagnosis and resolution of customer issues

· Background using public cloud infrastructure like AWS

· 4+ years hands-on experience with AWS, Docker, Jenkins, Terraform, Kubernetes, Ansible, Salt

· 4+ years of automation using Python or any other programming/scripting language

· 4+ years of experience in managing SaaS applications infrastructure with REST based test automation experience using Python

· Ideally detailed understanding of IP routing, Security and Cloud services such as CGNAT, IPSec, IDP and SDWAN/SDN for different customer use cases.

· The candidate should have thorough understanding of networking fundamentals (TCP/IP, UDP, DHCP, DNS, ICMP, AR, routing and switching).

 

Preferred Qualifications:

· Prior experience in Public cloud operations like GCP or Azure is a desirable.

· Experience and thorough understanding of Micro service development architecture, Agile development model.

· Knowledge of build pipeline/infrastructure like Jenkin, GitHub, CICD would be added advantage.

· Bonus points for experience with the ELK Stack – ElasticSearch, LogStash and Kibana.

· Working knowledge of RDBMS and Cassandra databases

· Work alongside project management teams to successfully monitor progress and implementation of initiatives

Confirm your E-mail: Send Email