- Meta (Menlo Park, CA)
- …fabric and host networking, comms lib and scheduling infrastructure. **Required Skills:** AI / HPC Systems Performance Engineer Responsibilities: 1. ... **Summary:** Meta's AI Training and Inference Infrastructure is growing exponentially...a loss-less fabric interconnect with minimal latency. To improve performance of these systems we constantly look… more
- Meta (Menlo Park, CA)
- …fabric and host networking, comms lib and scheduling infrastructure. **Required Skills:** AI / HPC Systems Performance Engineer Responsibilities: 1. ... **Summary:** Meta's AI Training and Inference Infrastructure is growing exponentially...workloads that expects a loss-less fabric interconnect. To improve performance of these systems we constantly look… more
- Meta (Menlo Park, CA)
- …hardware and software components, co-design 15. Experience in developing or debugging AI / HPC systems , performance optimizations, including familiarity ... or supporting production hardware at scale 9. Experience in deploying and productionizing AI / HPC systems and/or related components at scale 10. Experience in… more
- NVIDIA (Santa Clara, CA)
- …to work effectively with diverse teams and individuals. + Experience analyzing and tuning performance for a variety of AI / HPC workloads. + Passion for ... GPU compute clusters that run demanding deep learning, high performance computing, and computationally intensive workloads. We seek a...storage systems like Lustre and GPFS for AI / HPC workloads + Familiarity with deep learning… more
- NVIDIA (Santa Clara, CA)
- …designing and operating large scale storage infrastructure. + Experience analyzing and tuning performance for a variety of AI / HPC workloads. + Experience ... join us today! As a member of the GPU AI / HPC Infrastructure team, you will provide leadership...solutions to enable runs of demanding deep learning, high performance computing, and computationally intensive workloads. We seek an… more
- NVIDIA (Santa Clara, CA)
- …looking for a technical leader to define a vision and roadmap for distributed observability systems for large-scale AI and HPC clusters and workloads and ... and visualization to spectacularly improve efficiency, performance , and productivity of AI and HPC workloads. You will lead technical teams to develop,… more
- Meta (Menlo Park, CA)
- …requirements of RDMA workloads that expects a loss-less fabric interconnect. To improve performance of these systems we constantly look for opportunities across ... fabric and host networking, comms lib and scheduling infrastructure. **Required Skills:** AI / HPC Network Engineer Responsibilities: 1. Design, develop, test and… more
- NVIDIA (Santa Clara, CA)
- …Be Doing: + Primary responsibilities will include building and enabling robust AI / HPC infrastructure for customers + Support operational and reliability aspects ... of large-scale AI clusters, focusing on performance at scale,...in working with customers + Expertise with parallel file systems (eg Lustre, GPFS, BeeGFS, WekaIO) and high-speed interconnects… more
- Amazon (Santa Clara, CA)
- …computing and its potential to overcome some of the biggest challenges in High Performance Computing ( HPC )? Do you have a unique combination of deep technical ... C++, Python, CUDA, Bash - Deep GPU knowledge in HPC and/or AI /ML frameworks. Preferred Qualifications -...life sciences or related discipline. - Working knowledge of HPC schedulers and distributed/parallel file systems , underlying… more
- NVIDIA (Santa Clara, CA)
- …vision? What you will be doing: + Investigate opportunities to improve communication performance by identifying bottlenecks in today's systems . + Design and ... implement new communication technologies to accelerate AI and HPC workloads. + Explore innovative solutions in HW and SW for our next generation platforms as… more
- NVIDIA (Santa Clara, CA)
- …and to power data centers. Join the team building many of the largest and fastest AI / HPC systems in the world! NVIDIA is looking for someone with the ... and internal teams to analyze, define, and implement large-scale AI / HPC projects. These efforts include a combination...they begin rolling out some of the most sophisticated systems in the world! + Provide feedback to internal… more
- NVIDIA (Santa Clara, CA)
- …long term maintenance strategy. What you'll be doing: + Design highly available and scalable systems to meet the demands of our HPC clusters + Evaluate new and ... graphics, and revolutionized parallel computing. More recently, GPU deep learning ignited modern AI and enabled the next era of computing. NVIDIA is a "learning… more
- NVIDIA (Santa Clara, CA)
- …experience. Ways to stand out from the crowd: + Experience analyzing and tuning performance for a variety of HPC or EDA workloads. + Solid understanding ... NVIDIA is the leader in AI , machine learning and datacenter acceleration. NVIDIA is...and operate these clusters at high reliability, efficiency, and performance and drive foundational improvements and automation to improve… more
- NVIDIA (Santa Clara, CA)
- …Protocols InfiniBand, Ethernet + Deep knowledge in computer architecture and operating systems + Experience in performance optimizations + MSc or equivalent ... We are now looking for a senior HPC software engineer. As a member of our the High Performance Computing Software development team, you will be responsible for… more
- NVIDIA (Santa Clara, CA)
- NVIDIA is looking for an experienced GPU and network systems Solutions Architect & Engineer. Do you want to be part of a team that brings new Artificial Intelligence ... ( AI ) hardware and software technologies to production in customer...Demonstrate subject matter expertise in advanced GPU & network systems and be a trusted technical advisor to NVIDIA's… more
- Meta (Menlo Park, CA)
- …on existing accelerator systems and guiding the future of models and AI HW at Meta. This drives improved performance , new model architectures and ... the following areas: Accelerators/GPU architectures, High Performance Computing ( HPC ), Machine Learning Compilers, Training/Inference ML Systems , Model… more
- NVIDIA (Santa Clara, CA)
- … infrastructure. + Passion for solving complex technical challenges and optimizing system performance . + Experience with AI / HPC advanced job schedulers, and ... support operational and reliability aspects of large scale distributed systems with focus on performance at scale,...storage systems like Lustre and GPFS for AI / HPC workloads. + Familiarity with deep learning… more
- Meta (Menlo Park, CA)
- …Qualifications: 7. Experience in leading teams working on high performance computing ( HPC ) and AI /ML systems , including: 8. Communication libraries (eg, ... of Meta AI infrastructure! **Required Skills:** Software Engineering Manager - AI Systems Co-Design Responsibilities: 1. Lead and support the communications… more
- Meta (Menlo Park, CA)
- …7+ years of experience with one subset of the following AI systems : Accelerator (GPU/ASIC), Performance characterization/optimization/tracing/debugging (eg, ... exploring, developing and productizing high- performance software and hardware technologies for AI at datacenter scale. Hardware Systems Engineer in RTP work… more
- Meta (Menlo Park, CA)
- … AI product introductions and AI operations initiatives supporting Meta's growing AI / HPC infrastructure for our Family of Apps . They will be responsible ... deliver on shared goals. 10. The ideal candidate will have experience in AI / HPC product development and operations, demonstrated experience in the Network… more