Plano, TX, USA
6 days ago
Lead Site Reliability Engineer - Capture Platforms

There’s nothing more exciting than being at the center of a rapidly growing field in technology and applying your skillsets to drive innovation and modernize the world's most complex and mission-critical systems.

As a Lead Site Reliability Engineer at JPMorgan Chase within the Business Collaboration team, you will solve complex and broad business problems with simple and straightforward solutions. Through code and cloud infrastructure, you will configure, maintain, monitor, and optimize applications and their associated infrastructure to independently decompose and iteratively improve on existing solutions. You are a significant contributor to your team by sharing your knowledge of end-to-end operations, availability, reliability, and scalability of your application or platform.

Job responsibilities

Consistently models and champions site reliability culture and practices and exerts technical influence throughout your team Leads initiatives to improve the reliability and stability of your team’s applications and platforms using data-driven analytics to improve service levels Drives collaboration with your team to identify comprehensive service level indicators and the stakeholder partners to establish reasonable service level objectives and error budgets with your customers Offers a high level of technical expertise within one or more technical domains and proactively identifies and solves for technology-related bottlenecks in your areas of expertise Serves as the main point of contact during major incidents for your application and have the skills to identify and solve the issue quickly to avoid financial loss to the business Documents and shares knowledge within your organization via internal forums and communities of practice

Required qualifications, capabilities, and skills

Formal training or certification on software engineering concepts and 5+ years applied experience Demonstrated proficiency in reliability, scalability, performance, security, enterprise system architecture, toil reduction, and other site reliability best practices Fluent in at least one programming language such as: Python, Java/Spring Boot, .Net Advanced knowledge of software applications and technical processes with emerging depth in one or more technical disciplines Proficient knowledge and experience in observability such as white and black box monitoring, service level objective alerting, and telemetry collection using tools such as Grafana, Dynatrace, Prometheus, Datadog, Splunk, and others Proficient with continuous integration and continuous delivery tools like Jenkins, GitLab, or Terraform Proficient with container and container orchestration: (ECS, Kubernetes, Docker) Experience with troubleshooting common networking technologies and issues Experience identifying and solving complex data structures and algorithms-related problems

Preferred qualifications, capabilities, and skills

Ability to initiate and implement ideas to solve business problems Actively self-educates, evaluate new technology, and recommend suitable ones Knowledge of scheduling tools like Autosys or Control M Good communication skills
Confirm your E-mail: Send Email