- Meta (Menlo Park, CA)
- …fabric and host networking, comms lib and scheduling infrastructure. **Required Skills:** AI / HPC Systems Performance Engineer Responsibilities: 1. ... **Summary:** Meta's AI Training and Inference Infrastructure is growing exponentially...a loss-less fabric interconnect with minimal latency. To improve performance of these systems we constantly look… more
- Meta (Menlo Park, CA)
- …fabric and host networking, comms lib and scheduling infrastructure. **Required Skills:** AI / HPC Systems Performance Engineer Responsibilities: 1. ... **Summary:** Meta's AI Training and Inference Infrastructure is growing exponentially...workloads that expects a loss-less fabric interconnect. To improve performance of these systems we constantly look… more
- Meta (Menlo Park, CA)
- … testing with focus on automation. 22. Experience in developing or debugging AI / HPC systems , performance optimizations, including familiarity with ... and/or similar languages. **Preferred Qualifications:** Preferred Qualifications: 16. Proficiency in High- Performance Computing ( HPC ) or AI system… more
- Meta (Menlo Park, CA)
- …requirements of RDMA workloads that expects a loss-less fabric interconnect. To improve performance of these systems we constantly look for opportunities across ... fabric and host networking, comms lib and scheduling infrastructure. **Required Skills:** AI / HPC Network Engineer Responsibilities: 1. Design, develop, test and… more
- Meta (Menlo Park, CA)
- …on existing accelerator systems and guiding the future of models and AI HW at Meta. This drives improved performance , new model architectures and ... the following areas: Accelerators/GPU architectures, High Performance Computing ( HPC ), Machine Learning Compilers, Training/Inference ML Systems , Model… more
- Meta (Menlo Park, CA)
- …end-to-end system validation strategy (hardware and software), with a focus on various AI / HPC hardware systems in datacenter applications. 2. Lead the ... algorithms, and OOP). **Preferred Qualifications:** Preferred Qualifications: 17. Proficiency in High- Performance Computing ( HPC ) or AI system architecture… more
- Meta (Menlo Park, CA)
- …Qualifications: 7. Experience in leading teams working on high performance computing ( HPC ) and AI /ML systems , including: 8. Communication libraries (eg, ... of Meta AI infrastructure! **Required Skills:** Software Engineering Manager - AI Systems Co-Design Responsibilities: 1. Lead and support the communications… more
- Meta (San Francisco, CA)
- … AI product introductions and AI operations initiatives supporting Meta's growing AI / HPC infrastructure for our Family of Apps . They will be responsible ... deliver on shared goals. 10. The ideal candidate will have experience in AI / HPC product development and operations, demonstrated experience in the Network… more
- Meta (Menlo Park, CA)
- …following machine learning/deep learning domains: Distributed ML Training, GPU architecture, ML systems , AI infrastructure, high performance computing, ... large-scale GPU training and inference fleet through an observable, reliable and high- performance distributed AI /GPU communication stack. Currently, one of the… more
- Meta (Menlo Park, CA)
- …following machine learning/deep learning domains: Distributed ML Training, GPU architecture, ML systems , AI infrastructure, high performance computing, ... large-scale GPU training and inference fleet through an observable, reliable and high- performance distributed AI /GPU communication stack. Currently, one of the… more
- Meta (Menlo Park, CA)
- …in high- performance computation. **Required Skills:** Engineering Manager, PyTorch - AI Acceleration Responsibilities: 1. Grow a team of domain experts within ... **Summary:** AI Acceleration is an org within PyTorch. It's...candidate should have technical skills - GPU / ML Systems knowledge is preferred, though not required. We work… more
- Wells Fargo (San Francisco, CA)
- **About this role:** We are seeking a High- Performance Computing ( HPC ) Engineer with experience in Machine Learning to optimize and scale AI /ML workloads. ... experience with distributed training, model parallelization, GPU acceleration, and performance optimization across diverse hardware platforms. Experience or strong… more
- quadric.io, Inc (Burlingame, CA)
- …wide variety of edge and endpoint devices, ranging from battery operated smart-sensor systems to high- performance automotive or autonomous vehicle systems . ... Candidates must demonstrate deep technical mastery of Quadric's product ecosystem including HPC Hardware (IP, Chips, Boards), SDK, and various algorithms (NN, DSP,… more
- Meta (Menlo Park, CA)
- …11. Full-stack experience and understanding of AI / HPC systems , from HW/infrastructure through the application layer, performance optimizations, including ... learning domains: hardware accelerators, AI Infrastructure, and/or high performance computing ( HPC ), particularly pertaining to interconnect and collective.… more
- Meta (Menlo Park, CA)
- …**Preferred Qualifications:** Preferred Qualifications: 15. Full-stack experience and understanding of AI / HPC systems , from hardware and infrastructure ... ML domains: hardware accelerators, AI Infrastructure, and/or high performance compute ( HPC ), particularly pertaining to interconnect and collective.… more
- Meta (Menlo Park, CA)
- …training **Preferred Qualifications:** Preferred Qualifications: 15. Full-stack experience and understanding of AI / HPC systems , with a focus on the ... of Meta's accelerators collective communications software library and optimizing distributed AI /ML workloads' performance . This is an opportunity to work… more
- Meta (Menlo Park, CA)
- …years of experience with one subset of the following AI systems : Accelerator (GPU/ASIC), Kernel development, Performance optimization (eg, NVIDIA, AMD, ... productizing high- performance software and hardware technologies for AI at datacenter scale. Hardware Systems Engineer...Intel, or other misc accelerator), computer architecture, HPC communication libraries (eg, NCCL, MPI), performance … more
- Amazon (San Francisco, CA)
- …experience - 5+ years building or optimizing computational applications for large scale HPC systems (eg physics based simulations) to take advantage of high ... of Go to Market (GTM) at AWS using generative AI (GenAI)? AWS Sales, Marketing, and Global Services (SMGS)...years building or optimizing computational applications for large scale HPC systems (eg physics based simulations) to… more
- Amazon (San Francisco, CA)
- …modernizing customer requirements to the cloud - Practical experience in High Performance Computing ( HPC ) and/or distributed training, performance profiling ... responsibilities We are looking for a strong Solution Architect with a Data & AI background to enable new capabilities for our customers to deploy GenAI workloads on… more