Our team delivers cross-team visibility and execution on the most challenging reliability issues impacting Oracle's SaaS customers. We engage deeply with service owners and stakeholders to deeply understand and improve critical issues that impair service experience.
As a Senior Technical Product Manager, SRE (Site Reliability Engineering), you are one of the technical leaders within an SRE organization who partners with the development teams across various SaaS product lines. This role is a perfect intersection of passions for customer advocacy, long term engineering solutions and data driven problem solving.
About the JobA unique opportunity to join a rapidly growing world-class team to improve the cutting-edge Oracle Cloud technologies and infrastructure that make up the Oracle Cloud solutions. As part of this SRE team, you will be continually challenged and have an opportunity to contribute to the Oracle Cloud success every day, working closely with our development partners.
You will be part of a team that operates over 20,000 customer databases at up to 12 node deployments to support our Fusion SaaS offering. The scale and volume we run at is a unique opportunity.
As a Site Reliability Engineer, you will solve exciting technical challenges by analyzing, troubleshooting, and designing vital Oracle Cloud services, platforms, and infrastructure while always thinking about reliability, scalability, resilience, security, and performance. You will focus on engineering improvements to the systems that will eliminate whole classes of issues.
What You'll Do• Service Accountability –You will be playing product owner/product manager hats for the reliability of a collection of services and technology areas• Advocacy- As a product owner you will be representing the needs of the customer and the business to our development partners.• Ownership Scope – As an SRE, you will understand the end-to-end configuration, technical dependencies, and overall behavioral characteristics of the production services you collaborate with. In partnership with your Development colleagues, you will have the responsibility to ensure that services are designed and delivered to be mission critical with a focus on security, resiliency, scale, and performance.• Operations Engineering – You will understand and be able to communicate the scale, capacity, security, performance attributes, and requirements of the services you own. We are subject matter experts, able to understand and communicate every characteristic of our service stack, such as:o degradation and behavior under load of the services and their dependencieso end-to-end tuning needs, optimizing resource utilization, as load patterns fluctuateo Instrumentation and metrics that clearly describe the service behaviorso scaling requirements and patternso resiliency and recoverability, ensuring that backup/restore and disaster recovery capabilities are implemented, tested and maintained• Automation – You will have a clear understanding of automation and orchestration principles, and will be eager to drive automation, wherever and whenever the possibility arises, while simultaneously eliminating technical debt. Automation must be part of your DNA.• Technical Experts - You will have a deep understanding of service topology and their dependencies required to troubleshoot issues and define mitigations. You will bring this expertise to bear in driving reliability improvements in the services you engage with.• Broad Interests - SREs are a rare mix of sysadmins and Development Engineers, and as such, can understand and explain the effect of product architecture decisions on the ability to run as distributed systems. They are driven by professional curiosity, and a desire to develop deep understanding of their services and their dependencies.• Cross-team collaboration – You will engage with and present to a wide variety of audiences, ranging from individual contributors and teams to executive leadership
Career Level - IC5