Menlo Park, CA
Performance and Capacity Engineer, AI
Facebook is seeking an Engineer to join the Capacity Engineering & Analysis team to drive AI efficiency across Meta’s entire product portfolio. This person would be required to work cross-functionally with a large number of teams to ensure high performance and scalability of our infrastructure from both a cost and technology perspective. The scale is characterized by tens of billions of user requests, exabytes of data, thousands of giga bps of network flow. Help scale one of the largest Internet services in the world! This position is full-time. Performance and Capacity Engineer, AI Responsibilities: * In close collaboration with the product groups and sister teams in Infrastructure, drive the scaling of the AI infrastructure to enable the launch of next generation products and platforms (e.g. Reels and The Metaverse)* Drive R&D of techniques to bend the infrastructure demand curve of the next generation large - trillions of parameters - AI models for training and inference, e.g. algorithmic optimization and hardware-software co-design* Develop novel techniques to surface deep insights, centered around performance and utilization efficiency, to the partners in the product groups* Address gnarly hardware performance issues: specifically, identify and debug performance bottlenecks in large scale distributed systems and optimize (via algorithm and/or architecture redesign) product/service performance to improve user experience* Develop tools to monitor billions of user requests and to carry out performance and capacity-related tests and analyses* Develop long range plans for infrastructure demand for product groups. These plans total billions of dollars of investment* Evaluate cutting-edge technologies, and assess their cost, usability and efficiency for our product groups and sister teams Minimum Qualifications: * Hands-on experience with GPU technologies and performance optimization of GPU kernels for AI training and serving* Experience with deep learning technologies and their system requirements. Proven understanding of the underlying algorithms* Experience working with cross-functional teams, driving consensus via influencing* 5+ years of experience in building and scaling distributed systems and performance engineering* Bachelor's degree in Computer Science, Computer Engineering, relevant technical field, or equivalent practical experience Preferred Qualifications: * Experience in budgeting and long-range infrastructure planning at scale Facebook is proud to be an Equal Opportunity and Affirmative Action employer. We do not discriminate based upon race, religion, color, national origin, sex (including pregnancy, childbirth, or related medical conditions), sexual orientation, gender, gender identity, gender expression, transgender status, sexual stereotypes, age, status as a protected veteran, status as an individual with a disability, or other applicable legally protected characteristics. We also consider qualified applicants with criminal histories, consistent with applicable federal, state and local law.Facebook is committed to providing reasonable accommodations for candidates with disabilities in our recruiting process. If you need any assistance or accommodations due to a disability, please let us know at {apply below}.
Requirements:
Facebook
Requirements:
Recommended Skills
- Algorithms
- Architecture
- Artificial Intelligence
- Coaching And Mentoring
- Computer Engineering
- Debugging
Browse other jobs