Redmond, WA, 98073, USA
17 hours ago
Software Engineer - XR Codec Interactions and Avatars Team
**Summary:** XR Codec Interactions and Avatars (XRCIA) brings together a highly interdisciplinary team of researchers and engineers to create the future of augmented and virtual reality. On the Research Oriented Cluster Foundations team, you’ll work on building and maintaining tools, libraries, and frameworks that will help researchers collaborate with each other and empower their research towards the generation of Codec Interactions and Avatars. Our team cultivates an honest and considerate environment where self-motivated individuals thrive. We encourage ownership and embrace the ambiguity that comes with working on the frontiers of research.In this software engineer role, you will serve as the point of contact for Meta's research GPU super clusters. You are a hybrid software/systems/infrastructure engineer who ensures that Meta’s Research Super Clusters run smoothly and have the capacity for future growth. Our team is composed of people with varied levels of experience and backgrounds. Relevant industry experience is important (Software Engineer, Site Reliability Engineer (SRE), Systems Engineer, DevOps Engineer, Network Engineer, or similar role), but ultimately less so than your demonstrated attitude. We sail into uncharted waters every day at Meta’s large scale ML model training GPU clusters, and we are always learning. **Required Skills:** Software Engineer - XR Codec Interactions and Avatars Team Responsibilities: 1. Leverage the scale and complexity of the larger Meta infrastructure to accelerate our Codec Interaction and Avatars projects 2. Influence outcomes within your immediate team, peer engineering teams, and with cross-functional stakeholders 3. Work independently, handle large projects simultaneously, and prioritize team roadmap and deliverables by balancing required effort with resulting impact 4. Own Research Super Cluster back-end services which handle fleet management, infrastructure components that drive Meta’s advances in AI, core services which are used by every team at XRCIA, networking systems, and everything in between 5. Author and review code, develop documentation and capacity plans, and debug the hardest problems, all live, on some of the largest and most complex systems in the world 6. Together with your engineering team, you will share an on-call rotation and be an escalation contact for service incidents. Provide on-call support and lead incident root cause analysis through multiple data engineering layers (compute, storage, network) for GPU clusters and act as a final escalation point **Minimum Qualifications:** Minimum Qualifications: 7. Currently has, or is in the process of obtaining a Bachelor's degree in Computer Science, Computer Engineering, relevant technical field, or equivalent practical experience. Degree must be completed prior to joining Meta 8. 3+ years of experience in UNIX/LINUX and clear understanding of TCP/IP network fundamentals 9. 5+ years of experience coding in at least one of the following languages: C++, Python, or Rust 10. Experience with software development practices such as source control, code reviews, unit testing, debugging and profiling 11. Experience with Internet service architecture capacity planning and/or handling needs for urgent capacity augmentation 12. Knowledge of common web technologies and/or Internet service architectures (such as LAMP or MEAN stacks, CDN, Load Balancing techniques, etc.) 13. Experience configuring and running infrastructure level applications, such as Kubernetes, Terraform, MySQL, SLURM, etc. **Preferred Qualifications:** Preferred Qualifications: 14. Thorough understanding of Linux operating system, including the networking subsystem 15. Experience in distributed system performance measurement, logging, and optimization 16. Experience with Python library management systems such as Conda 17. Prior experience in cluster oncall operations, including troubleshooting server/scheduler/storage errors, maintaining compute/storage environments/libraries/tools, helping onboard users to the cluster, and answering general questions from users 18. Prior experience in cluster coordination and strategy planning, including collecting/understanding needs of users, developing tools to improve user experience, providing guidance on best practices, forecasting compute/storage needs, and developing long-term user experience/compute/storage strategies 19. Prior experience building tooling for monitoring and telemetry 20. Prior experience in developing/managing distributed network file systems 21. Prior experience in network security **Public Compensation:** $56.25/hour to $173,000/year + bonus + equity + benefits **Industry:** Internet **Equal Opportunity:** Meta is proud to be an Equal Employment Opportunity and Affirmative Action employer. We do not discriminate based upon race, religion, color, national origin, sex (including pregnancy, childbirth, or related medical conditions), sexual orientation, gender, gender identity, gender expression, transgender status, sexual stereotypes, age, status as a protected veteran, status as an individual with a disability, or other applicable legally protected characteristics. We also consider qualified applicants with criminal histories, consistent with applicable federal, state and local law. Meta participates in the E-Verify program in certain locations, as required by law. Please note that Meta may leverage artificial intelligence and machine learning technologies in connection with applications for employment. Meta is committed to providing reasonable accommodations for candidates with disabilities in our recruiting process. If you need any assistance or accommodations due to a disability, please let us know at accommodations-ext@fb.com.
Confirm your E-mail: Send Email