Position Description:
Builds and operates highly resilient platforms in AWS cloud environments. Coordinates systems using Infrastructure as Code tools (IAM, ARM, Terraform, and Chef). Performs reliability engineering throughout the entire Software Development Lifecycle (SDLC) using Python, NodeJS, or Java. Deploys and supports distributed multi-tiered application systems using Kubernetes and Continuous Integration/Continuous Deployment (CI/CD) pipelines. Creates dashboards to capture the latency, availability, error, and saturation (performance) of applications using Splunk, Grafana, Prometheus, Catchpoint, and Datadog. Creates Service-Level Indicator/Service-Level Objective (SLI/SLO) dashboards and automated processes to update changes and create new dashboards. Identifies and resolves application issues using DataDog, Prometheus, and Splunk. Creates, maintains, and tune monitors using ELK, OpenSearch, and OpenTelemetry. Supports applications hosted in Amazon Web Services (“AWS”) Cloud and Kubernetes. Builds, deploys, automates, and supports application services spanning multiple technology platforms, frameworks, and languages.
Primary Responsibilities:
Provides automated solutions for business and technology operational activities and manual tasks.
Analyzes the observability, resiliency, availability, and performance of applications.
Triages, deep dives, and executes root cause analysis.
Provides resolution of business, and system issues through enhancement initiatives.
Resolves issues as required during critical outages to avoid negative business impact.
Contributes to product architectural solutions, addressing high impact system issues.
Deploys and supports distributed multi-tiered application systems.
Manages the scalability and resiliency of applications.
Ensures daily business operations are not impacted by system issues (trade processing and correction, fund and sweep translation, and cash position and reconciliation).
Consults across the enterprise to plan for and implement enhancements to systems to avoid system outages and ensure seamless implementations.
Establishes end-to-end flow of application systems to quickly identify and resolve critical business issues.
Tests the resiliency of application systems using Chaos Engineering techniques.
Mentors junior team members.
Education and Experience:
Bachelor’s degree (or foreign education equivalent) in Computer Information Systems, Computer Science, Engineering, Information Technology, Information Systems, Mathematics, Physics, or a closely related field and five (5) years of experience as a Principal Site Reliability Engineer (or closely related occupation) maintaining and improving the reliability, performance, and scalability of distributed applications.
Or, alternatively, Master’s degree (or foreign education equivalent) in Computer Information Systems, Computer Science, Engineering, Information Technology, Information Systems, Mathematics, Physics, or a closely related field and three (3) years of experience as a Principal Site Reliability Engineer (or closely related occupation) maintaining and improving the reliability, performance, and scalability of distributed applications.
Skills and Knowledge:
Candidate must also possess:
Demonstrated Expertise (“DE”) performing site reliability engineering to analyze the observability, resiliency, availability, instrumentation, and performance of distributed applications; creating dashboards and monitors to capture the latency, availability, error, and saturation performance of distributed applications using Splunk, Grafana, Prometheus, Catchpoint, Telemetry, and Datadog; and creating SLI/SLO dashboards, monitors, and automated processes to update changes and create new dashboards.
DE developing Kubernetes platforms and automations in public and private Cloud -- RKS (Rancher), EKS (AWS), and AKS (Azure) -- using Python, Shell Scripting, GIT, Docker, and Kubernetes.
DE automating business and technology operational activities — Kubernetes cluster rehydration, application recycling, patching, disaster recovery, and ITSM reporting -- using Jenkins Core, uDeploy, RunDeck, Ansible and AWX.
DE performing triage and root cause analysis (RCA) in a multi-tiered, fund accounting application system related to hardware, software, network, applications, and cloud service providers, on multiple platforms -- Unix, Windows, and AWS cloud Environments, using DataDog, Splunk, Grafana, and Kibana.
#PE1M2 #LI-DNI
Certifications:Company Overview
Fidelity Investments is a privately held company with a mission to strengthen the financial well-being of our clients. We help people invest and plan for their future. We assist companies and non-profit organizations in delivering benefits to their employees. And we provide institutions and independent advisors with investment and technology solutions to help invest their own clients’ money.
Join Us
At Fidelity, you’ll find endless opportunities to build a meaningful career that positively impacts peoples’ lives, including yours. You can take advantage of flexible benefits that support you through every stage of your career, empowering you to thrive at work and at home. Honored with a Glassdoor Employees’ Choice Award, we have been recognized by our employees as a top 10 Best Place to Work in 2024. And you don’t need a finance background to succeed at Fidelity—we offer a range of opportunities for learning so you can build the career you’ve always imagined.
Fidelity’s hybrid working model blends the best of both onsite and offsite work experiences. Having the majority of our associates work onsite is important for our business strategy and our culture. We also value the benefits that working offsite offers associates. Most roles listed as Hybrid will require associates to work onsite all business days of every other week in a Fidelity office. This does not apply to roles listed as Remote or Onsite.
At Fidelity, we value honesty, integrity, and the safety of our associates and customers within a heavily regulated industry. Certain roles may require candidates to go through a preliminary credit check during the screening process. Candidates who are presented with a Fidelity offer will need to go through a background investigation, detailed in this document, and may be asked to provide additional documentation as requested. This investigation includes but is not limited to a criminal, civil litigations and regulatory review, employment, education, and credit review (role dependent). These investigations will account for 7 years or more of history, depending on the role. Where permitted by federal or state law, Fidelity will also conduct a pre-employment drug screen, which will review for the following substances: Amphetamines, THC (marijuana), cocaine, opiates, phencyclidine.
We invite you to Find Your Fidelity at fidelitycareers.com.
Fidelity Investments is an equal opportunity employer. We believe that the most effective way to attract, develop and retain a diverse workforce is to build an enduring culture of inclusion and belonging.
Fidelity will reasonably accommodate applicants with disabilities who need adjustments to participate in the application or interview process. To initiate a request for an accommodation, contact the HR Accommodation Team by sending an email to accommodations@fmr.com.