- Meta (Menlo Park, CA)
- …platforms, all the way to mass production and deployment. **Required Skills:** Production Systems Engineer , AI Systems Responsibilities: ... **Summary:** Meta is seeking a Systems Engineer to join our Release to Production...Inference Accelerator (MTIA) program as a part of the AI /ML initiatives supporting large scale AI Training… more
- Meta (Menlo Park, CA)
- …health and lifecycle of servers in production . **Required Skills:** Production Systems Engineer , Fleet AI Systems Responsibilities: 1. Interface ... **Summary:** Meta is seeking a Production Systems Engineer to...systems issues. 15. 2+ years of experience supporting AI or HPC systems and/or related … more
- Meta (Menlo Park, CA)
- …health and lifecycle of servers in production . **Required Skills:** Production Systems Engineer , Fleet AI Systems Lead Responsibilities: 1. Lead ... **Summary:** Meta is seeking an experienced Production Systems Engineer to...systems issues. 18. 4+ years of experience supporting AI or HPC systems and/or related … more
- Meta (Menlo Park, CA)
- **Summary:** Meta is seeking a Systems Engineer to join our Release to Production (RTP) team working on AI /ML initiatives supporting large scale AI ... services, and data center operations teams to enable new systems that will be deployed in our production...Silicon hyperscalar bring up and validation. **Required Skills:** Hardware Systems Engineer , NPI AI Responsibilities:… more
- Meta (Menlo Park, CA)
- … based approach to the new product introduction (NPI) phase. **Required Skills:** Hardware Systems Engineer , AI NPI Responsibilities: 1. Drive and execute ... services, and data center operations teams to enable new systems that will be deployed in our production...strategy (hardware and software), with a focus on various AI /HPC hardware systems in datacenter applications. 2.… more
- Meta (Menlo Park, CA)
- …fabric and host networking, comms lib and scheduling infrastructure. **Required Skills:** AI /HPC Systems Performance Engineer Responsibilities: 1. Lead ... **Summary:** Meta's AI Training and Inference Infrastructure is growing exponentially...interconnect with minimal latency. To improve performance of these systems we constantly look for opportunities across stack: network… more
- Meta (Menlo Park, CA)
- …fabric and host networking, comms lib and scheduling infrastructure. **Required Skills:** AI /HPC Systems Performance Engineer Responsibilities: 1. Active ... **Summary:** Meta's AI Training and Inference Infrastructure is growing exponentially...a loss-less fabric interconnect. To improve performance of these systems we constantly look for opportunities across stack: network… more
- Capital One (San Francisco, CA)
- Lead AI Engineer At Capital One, we are...- scalability, cost, latency, throughput - of large scale production AI systems . + Contribute to ... latest AI research and AI systems , and judiciously apply novel techniques in production...regularly worked. Cambridge, MA: $193,400 - $220,700 for Lead AI Engineer McLean, VA: $193,400 - $220,700… more
- Amazon (San Francisco, CA)
- …team, you'll be instrumental in transforming cutting-edge research into high-performance production systems . You'll collaborate directly with scientists to ... Peter Chen to make breakthrough foundation models run at production scale. As a Senior Machine Learning Engineer...We tackle some of the most challenging problems in AI and robotics, from developing sophisticated perception systems… more
- Cisco (San Francisco, CA)
- …learning technologies. The ideal candidate will help build and maintain scalable AI systems while ensuring robust deployment and operational excellence. ... part of our journey! **Role** As the Machine Learning Engineer , AI Platform in the Splunk ...Engineers and Applied Scientists to build efficient model serving systems + Monitor system performance and implement improvements for… more
- Meta (Menlo Park, CA)
- …this area and implementing them towards product needs. **Required Skills:** Software Engineer (Technical Leadership) - AI Specialist Responsibilities: 1. Help ... using AI /ML approaches 13. Experience in applying research to production problems 14. Experience communicating research for public audiences of peers 15.… more
- Meta (Menlo Park, CA)
- …validation, supporting customer deployment, production issue triage. **Required Skills:** Production Systems Engineer , Cooling & Power Responsibilities: ... **Summary:** Meta is seeking a Systems Engineer to join our Release to Production...scaling and deployment challenges requires us to take a systems based approach to AI system bring… more
- Meta (Menlo Park, CA)
- …fabric and host networking, comms lib and scheduling infrastructure. **Required Skills:** AI /HPC Network Engineer Responsibilities: 1. Design, develop, test and ... operate networking systems to support large scale AI training...more. 5. Be oncall to learn from real world production challenges and take the lessons to improve current… more
- Meta (Menlo Park, CA)
- …and drive SOTA research across the Monetization organization. **Required Skills:** Research Engineer , Language - Monetization AI Responsibilities: 1. Develop and ... **Summary:** We are the Monetization Ranking AI Research organization, dedicated to delivering personalized ads that maximize both user utility and advertiser value.… more
- Amazon (San Francisco, CA)
- …team, you'll be instrumental in transforming novel research into high-performance production systems . You'll collaborate directly with scientists to optimize ... where you'll contribute to breakthrough foundation models run at production scale. As a Software Development Engineer ...We tackle some of the most challenging problems in AI and robotics, from developing sophisticated perception systems… more
- Cisco (San Francisco, CA)
- …software engineering, Dev/Ops or related fields. * Experience in building large scale AI models and systems using MLOps practices; including deep learning models ... and accessible for everyone. As the Lead Machine Learning Engineer for Meraki Assurance, you will drive the engineering...Jenkins, Terraform Bonus points: * Experience building scalable and production ready Generative AI solutions * Have… more
- Cisco (San Francisco, CA)
- …(eg, Docker, Kubernetes). + Experience with model deployment and serving into production environments + Knowledge of version control systems , especially Git. ... become a part of our journey! **Principal Machine Learning Engineer (MLE), Artificial Intelligence** Join us as we pursue...group, you will be responsible for developing the core AI /ML capabilities to power the entire Splunk product portfolio… more
- Meta (Menlo Park, CA)
- …space of GenAI/LLM scaling reliability and performance. **Required Skills:** Software Engineer , SystemML - AI Networking Responsibilities: 1. Tech-leading the ... learning/deep learning domains: Distributed ML Training, GPU architecture, ML systems , AI infrastructure, high performance computing, performance optimizations,… more
- Meta (Menlo Park, CA)
- …space of GenAI/LLM scaling reliability and performance. **Required Skills:** Software Engineer , SystemML - AI Networking Responsibilities: 1. Enabling reliable ... learning/deep learning domains: Distributed ML Training, GPU architecture, ML systems , AI infrastructure, high performance computing, performance optimizations,… more
- Snap Inc. (Palo Alto, CA)
- …always execute with privacy at the forefront. We're looking for a Staff Software Engineer , Machine Learning Infrastructure to join the AI Training Platform (aka ... evaluation, and inference in the cloud such as Vertex AI , Google Kubernetes Engine (GKE) and Sagemaker + Build...industry software engineering experience + Experience building large scale production machine learning systems or data pipelines… more