AWS Neuron is looking for an experienced Technical Product Manager to define and drive product strategy for ML training software. You will be part of the AWS Neuron Product Management team, driving innovation in machine learning training acceleration. AWS Neuron is the software stack for Trainium and Inferentia, the AWS Machine Learning chips, delivering best-in-class ML training performance in the cloud. You will lead training software requirements working backward from customer needs, drive training frameworks, and collaborate with open source communities and ML ecosystem partners, enabling customers to successfully develop and optimize ML training workloads on AWS Trainium through deep understanding of distributed training, compilation systems, and hardware acceleration.
The ideal candidate will have a solid understanding of AI/ML models training, distributed training architectures, and performance optimization techniques. They should be able to assess technical implications of training software stack decisions, understand customer needs, and drive developer experience. Experience with large-scale distributed training, model parallelism strategies, and hardware acceleration is valuable.
Additionally, the ideal candidate should have:
Key job responsibilities:
About the team:
About AWS Neuron: AWS Neuron is the software of Trainium and Inferentia, the AWS Machine Learning chips. Inferentia delivers best-in-class ML inference performance at the lowest cost in the cloud to our AWS customers. Trainium is designed to deliver the best-in-class ML training performance at the lowest training cost in the cloud, and it's all being enabled by AWS Neuron. Neuron is a Software that includes ML compiler and native integration into popular ML frameworks. Our products are being used at scale with external customers like Anthropic and Databricks as well as internal customers like Alexa, Amazon Bedrocks, Amazon's Rufus AI assistant, Amazon Robotics, Amazon Ads, Amazon Rekognition and many more.