AIML - Sr Engineering Program Manager, ML Compute Infrastructure (apple)
apple Cupertino, United States
2024-10-27
Job posting number: #153362 (Ref:apl-200574360)
Job Description
Summary
Apple is where individual imaginations gather together, committing to the values that lead to great work. Every new product we build, service we create, or Apple Store experience we deliver is the result of us making each other’s ideas stronger. That happens because every one of us shares a belief that we can make something wonderful and share it with the world, changing lives for the better. It’s the diversity of our people and their thinking that inspires the innovation that runs through everything we do. When we bring everybody in, we can do the best work of our lives. Here, you’ll do more than join something — you’ll add something..
The AIML team at Apple is building the next generation of Machine Learning models, and we are looking for people like you to come build Apple Intelligence with us! The Machine Learning Platform & Technology (MLPT) team in the AIML organization is seeking a Senior Engineering Program Manager for its ML Compute Platform. This platform provides services to all internal Apple developers focused on providing efficient and scalable compute and processing for machine learning lifecycle from model experimentation to deployment for the features across the entire Apple consumer ecosystem.
We’re looking for someone with passion in this space and an experience building large-scale infrastructure for cloud platforms with a strong knowledge of ML workflow. You will partner with the engineering teams and other product/program managers across the org to drive and influence our compute roadmap for improving engineering efficiencies, reducing cost, and ensuring resiliency for Apple's ML use cases. You will also have the opportunity to be the leader with vision, identifying new technologies for adoption and be the central PM voice for interactions with public cloud providers, such as AWS and GCP, as well as internal Apple Cloud. Come, join us and build the new frontier technologies that support Apple Intelligence!
The AIML team at Apple is building the next generation of Machine Learning models, and we are looking for people like you to come build Apple Intelligence with us! The Machine Learning Platform & Technology (MLPT) team in the AIML organization is seeking a Senior Engineering Program Manager for its ML Compute Platform. This platform provides services to all internal Apple developers focused on providing efficient and scalable compute and processing for machine learning lifecycle from model experimentation to deployment for the features across the entire Apple consumer ecosystem.
We’re looking for someone with passion in this space and an experience building large-scale infrastructure for cloud platforms with a strong knowledge of ML workflow. You will partner with the engineering teams and other product/program managers across the org to drive and influence our compute roadmap for improving engineering efficiencies, reducing cost, and ensuring resiliency for Apple's ML use cases. You will also have the opportunity to be the leader with vision, identifying new technologies for adoption and be the central PM voice for interactions with public cloud providers, such as AWS and GCP, as well as internal Apple Cloud. Come, join us and build the new frontier technologies that support Apple Intelligence!
View Orignal JOB on: italents.net
Description
- As a key EPM of the AIML team, you will be responsible for establishing cross-functional partnerships with all of Apple’s ML partners, understanding their use cases and improving the ease of use of the compute services.
- You will collaborate with Apple’s AIML engineering teams to define a partnership strategy across the entire ML ecosystem including 3rd party public cloud, Apple’s internal cloud, silicon vendors, and OSS providers to enable Apple Intelligence.
- Develop and complete capacity forecasting models to ensure optimal compute resource availability for current and future ML workloads.
- Analyze and track ML compute usage to find opportunities for cost savings without compromising performance. Propose and implement cost-optimization strategies.
- Provide regular reports to senior leadership on ML compute capacity, performance trends, and cost management efforts. Communicate complex technical concepts to non-technical partners.
- We work with best in class engineering teams and developers across Apple to design the most efficient technologies to accelerate the ML lifecycle. You will be the trusted advisor to our customers to deeply understand their needs and help deliver low cost, high performance solutions that meet their use cases and business outcomes.
- Translate the product requirements to technical use cases helping to see around corners enabling risk mitigation for engineering execution through creating partnership roadmap for delivery.
- You will define new computing modalities using resources native to the cloud and take that vision to market in collaboration with internal and external customers.
- You will collaborate with Apple’s AIML engineering teams to define a partnership strategy across the entire ML ecosystem including 3rd party public cloud, Apple’s internal cloud, silicon vendors, and OSS providers to enable Apple Intelligence.
- Develop and complete capacity forecasting models to ensure optimal compute resource availability for current and future ML workloads.
- Analyze and track ML compute usage to find opportunities for cost savings without compromising performance. Propose and implement cost-optimization strategies.
- Provide regular reports to senior leadership on ML compute capacity, performance trends, and cost management efforts. Communicate complex technical concepts to non-technical partners.
- We work with best in class engineering teams and developers across Apple to design the most efficient technologies to accelerate the ML lifecycle. You will be the trusted advisor to our customers to deeply understand their needs and help deliver low cost, high performance solutions that meet their use cases and business outcomes.
- Translate the product requirements to technical use cases helping to see around corners enabling risk mitigation for engineering execution through creating partnership roadmap for delivery.
- You will define new computing modalities using resources native to the cloud and take that vision to market in collaboration with internal and external customers.
Minimum Qualifications
- 5+ years of Product and/or technical program management experience covering some or as many of the following areas as possible: distributed computing, large scale cloud infrastructure, GPU/TPU usage for ML training, software and computing architectures, container stack, and networking.
- Strong desire to learn, aptitude for problem solving, and the ability to make sophisticated trade-offs.
- Experience in working with large scale GPU based AI applications, like Natural Language Processing and Recommendation, for training and inference or direct experience in building or managing cloud computing infrastructure and technologies.
- Proficiency in multitasking and leading sophisticated programs with cross functional teams with a track record in developing and bringing outstanding platforms to market.
- Self-motivated, independent, and proactive; demonstrated creative and critical thinking capabilities; can quickly (realtime) triage, prioritize, and lead cross-functional teams under pressure.
- Highly developed drive to improve how things work, with a proven track record of driving dramatic improvements for team quality, performance, agility, or effectiveness.
Key Qualifications
Preferred Qualifications
- 5+ years of work experience in Product/Program and/or solutions architecture or developer roles
- Demonstrated ability to define product vision, strategy, and roadmap along with the ability to deliver and execute
- MS/PhD in EE, CS, Math, or Physics or equivalent work experience
- Experience driving technical partnerships with internal and external cloud, software stakeholders
- Knowledge of computer systems and cloud infrastructure architecture
- Familiarity with AI frameworks (e.g. Tensorflow, Pytorch or MxNet) and/or GPU development experience
- Excellent interpersonal skills including ability to explain sophisticated technical topics to non-experts