Bellevue, WA, United States of America
12 hours ago
VKS Cluster Management R&D Engineer 5

Please Note:

1. If you are a first time user, please create your candidate login account before you apply for a job. (Click Sign In > Create Account)

2. If you already have a Candidate Account, please Sign-In before you apply.

Job Description:

Job Description

VMware by Broadcom is the leader in virtualization and cloud infrastructure solutions.
 

VMware Cloud Foundation (VCF) is a full-stack Infrastructure as a Service (IaaS) platform that provides a unified, self-service experience for deploying and managing virtual machines (VMs) and containers across on-premises and public cloud environments. It delivers consistent infrastructure and operations across a variety of clouds, enabling organizations to modernize their applications and optimize their cloud deployments. The VKS cluster management team in VCF focused on managing Kubernetes clusters at scale.  

The VKS cluster management team is looking for a Senior Kubernetes Engineer with deep expertise in Kubernetes internals, Velero, and cloud-native data protection strategies. In this role, you will design and implement backup and disaster recovery solutions for Kubernetes workloads, ensuring data resilience and business continuity at scale.

This position is ideal for engineers who are highly skilled in Go, familiar with controller-runtime and CRDs, and passionate about building robust, automated systems for data management in dynamic cloud environments.

What are the primary needs, technical challenges, and problems you will be responsible for?

As a Senior Engineer of the VKS cluster management team, we expect you to:

Design and implement scalable backup and restore strategies for Kubernetes using Velero and other ecosystem tools.

Extend and customize Velero through plugins, CRDs, or controllers written in Go using controller-runtime.

Lead the development of Kubernetes-native data protection solutions, including snapshot orchestration, workload migration, and disaster recovery.

Define and maintain Custom Resource Definitions (CRDs) and Kubernetes controllers related to backup workflows.

Develop observability and alerting around backup health, coverage, and performance.

Partner with SRE, DevOps, and application teams to ensure backup strategies meet SLAs and compliance requirements.

Contribute to the architecture and security of backup pipelines across multi-cluster or multi-cloud environments.

Conduct root cause analysis and postmortems for backup failures or recovery incidents.

Collaborate with stakeholders from partner engineering, product, security, infrastructure, and operations teams.

Success in the Role: What are the performance goals over the first 6-12 months you will work toward completing?

Within your first six months

You understand the vision, architecture, and data model of the VCF cluster management backend control plane. 

In-depth understanding of Kubernetes Custom Resource Definitions (CRDs) and reconciliation logic.

Production experience with Velero (including plugins, Restic integrations, volume snapshots).

Familiarity with CSI snapshots, PVC management, and volume restore processes across cloud providers

You are expected to produce software designs that define and extend TMC’s policy management capabilities. These interfaces must meet Broadcom and industry standards and provide a consistent programming model across multiple languages.

You will be expected to design, implement, test, and deploy microservices developed in Go.

After six months+

Take ownership of the VCF policy management and lead a team of exceptional software engineers to implement cloud-scale solutions.

Work with product and leadership to evaluate, prioritize, and provide solutions for customers and partner service teams. 

Representing the team and product in cross-team design discussions, identifying dependencies and areas of impact.

Actively participate and contribute to the gatekeeper and OPA open source community.

Leads developer efficiency efforts and takes personal ownership of improving the culture of innovation of the team.

Mentoring fellow engineers in their role, and coaching them into influential voices in the department.

 

What type of work will you be doing? What assignments, requirements, or skills will you be performing regularly?

As part of the VCF cluster management Team:

Most of your time will be developing microservices and Kubernetes controllers using cloud-native technologies like Kubernetes, gRPC, REST APIs, databases, message queues, distributed tracing, monitoring, and more. All of our services are written in Go.

You will write automated tests in Go to validate and secure critical customer functionalities.

You will be responsible for delivering your code changes to production and monitoring/maintaining our CI/CD pipelines.

You will take on-call responsibilities to triage, troubleshoot, and mitigate production issues.  

Work directly with Technical Project Managers and Product Managers to better understand requirements and define the scope of work.

You'll be responsible for high-level epics and be asked to help define requirements and tangible deliverables. You will be expected to break down the work into individual work items that can be assigned to the team and lead in estimating and scoping.  

You can expect to collaborate with partner teams to identify dependencies and align on delivering cross-team initiatives.

You'll work closely with management to understand priorities and advocate for them on the team.

 

Where is this role located?

This role is located in Bellevue, WA USA.

Additional Job Description:

Compensation and Benefits 

The annual base salary range for this position is $127,000 - $225,000

  

This position is also eligible for a discretionary annual bonus in accordance with relevant plan documents, and equity in accordance with equity plan documents and equity award agreements. 

  

Broadcom offers a competitive and comprehensive benefits package: Medical, dental and vision plans, 401(K) participation including company matching, Employee Stock Purchase Program (ESPP), Employee Assistance Program (EAP), company paid holidays, paid sick leave and vacation time. The company follows all applicable laws for Paid Family Leave and other leaves of absence. 

Broadcom is proud to be an equal opportunity employer.  We will consider qualified applicants without regard to race, color, creed, religion, sex, sexual orientation, national origin, citizenship, disability status, medical condition, pregnancy, protected veteran status or any other characteristic protected by federal, state, or local law.  We will also consider qualified applicants with arrest and conviction records consistent with local law.

If you are located outside USA, please be sure to fill out a home address as this will be used for future correspondence.

Confirm your E-mail: Send Email