- Unknown (San Jose, CA)
- …and compute infrastructure, with a preference for those with a background in AI /ML, HPC , or hyperscale environments. Deep technical knowledge of GPU clusters, ... Head of Infrastructure About the Company Innovative provider of AI infrastructure solutions Industry Information Technology and Services Type Privately Held About… more
- Meta (Menlo Park, CA)
- …fabric and host networking, communications lib and scheduling infrastructure. **Required Skills:** AI / HPC System Performance Engineer Responsibilities: ... a loss-less fabric interconnect with minimal latency. To improve performance of these systems we constantly look...teamwork and close collaboration 3. Responsible for the overall performance of the communication system , including … more
- Meta (Menlo Park, CA)
- …These workloads expect a loss-less fabric interconnect with minimal latency. To improve performance of these systems we constantly look for opportunities across ... host networking, communications lib and scheduling infrastructure. **Required Skills:** AI / HPC Network Engineering Manager Responsibilities: 1. Manage engineers… more
- IBM (San Jose, CA)
- …in system design * Experience with GPU Systems * Familiarity with HPC system performance evaluation. * Familiarity with system architectures * ... technical areas in the context of hybrid cloud, AI systems , networking, security, high-speed networked-storage, accelerators, and HPC principles. The… more
- Meta (Menlo Park, CA)
- …on existing accelerator systems and guiding the future of models and AI HW at Meta. This drives improved performance , new model architectures and ... RCCL, UCC and MPI. 7. Guide Meta's AI HW requirements and design focusing on performance at System and Silicon levels. Co-design and optimize our AI HW… more
- Deloitte (San Jose, CA)
- …Engineer, Solutions Architect) + 2+ years of experience with GPU computing (CUDA, OpenCL) and HPC system software stack The wage range for this role takes into ... drug discovery, optimizing population health and clinical trials, autonomous systems and edge AI , and renewable energy....on prem + Adopt best engineering practices in automation, HPC and AI /GenAI infrastructure and design patterns… more
- Deloitte (San Jose, CA)
- …Engineer, Solutions Architect) + 2+ years of experience with GPU computing (CUDA, OpenCL) and HPC system software stack The wage range for this role takes into ... in the cloud or on prem + Adopt best engineering practices in automation, HPC and AI /GenAI infrastructure and design patterns + Define and lead technology… more
- Meta (Menlo Park, CA)
- … AI product introductions and AI operations initiatives supporting Meta's growing AI / HPC infrastructure for our Family of Apps . They will be responsible ... deliver on shared goals 10. The ideal candidate will have experience in AI / HPC product development and operations, demonstrated experience in the Network… more
- Meta (Menlo Park, CA)
- …following machine learning/deep learning domains: Distributed ML Training, GPU architecture, ML systems , AI infrastructure, high performance computing, ... large-scale GPU training and inference fleet through an observable, reliable and high- performance distributed AI /GPU communication stack. Currently, one of the… more
- Meta (Menlo Park, CA)
- …domains: High speed networking (RDMA), Distributed ML Training, GPU architecture, ML systems , AI infrastructure, high performance computing, performance ... large-scale GPU training and inference fleet through an observable, reliable and high- performance distributed AI /GPU communication stack. Currently, one of the… more
- Meta (Fremont, CA)
- …Delivery Engineering/ Power architecture and associated system design trade-offs, particularly for AI and High Performance Computing ( HPC ) systems ... board-level power, busbars, DC-DC converters, onboard power modules, and HV DC systems ) to advance this mission. The TSM will collaborate with Hardware Engineering,… more
- IBM (San Jose, CA)
- …Python. Rust, CUDA * Familiarity with executing HPC workloads * Familiarity with HPC system performance evaluation. At IBM, we pride ourselves on being ... technical areas in the context of hybrid cloud, AI systems , networking, security, high-speed networked-storage, accelerators, and HPC principles. The… more
- Meta (Menlo Park, CA)
- …following machine learning/deep learning domains: Distributed ML Training, GPU architecture, ML systems , AI infrastructure, high performance computing, ... large-scale GPU training and inference fleet through an observable, reliable and high- performance distributed AI /GPU communication stack. Currently, one of the… more
- Broadcom (San Jose, CA)
- …you apply.** **Job Description:** Ethernet NIC product portfolio is designed for high performance computing and networking applications including AI and ML. This ... of the next generation of Ethernet NIC solutions for AI /ML and High performance computing applications. We...is an added advantage. 6. Experience analyzing and tuning performance for a variety of HPC workloads.… more
- Cadence Design Systems, Inc. (San Jose, CA)
- …motivated 3DIC Design Flow Engineer to implement system planning and integration of complex HPC and AI systems with chiplets. You will be responsible for ... technology. Job Description Summary: Hands on experience with 2.5/3DIC system planning, including exploration. MSA - Si, Pi, Thermal...developing and implementing advanced solutions for high- performance AI applications. This is a challenging… more
- Microsoft Corporation (Mountain View, CA)
- …and external, and operate at the intersection of AI algorithmic innovation, purpose-built AI hardware, systems , and software. We are a team of highly capable ... performance at all levels of abstraction including kernel, model, algorithm and system level, monitor performance and drive efficiencies that contribute to… more
- Cadence Design Systems, Inc. (San Jose, CA)
- …model reduction (POD/MOR). + Differential equations: ODE/PDE/SDE, dynamical systems . + Probability and statistics: stochastic processes, inference, uncertainty ... (SVM, kNN), ensembles (trees, random forests, boosting). + Contemporary AI (a plus): graph neural networks, transformers, reinforcement/transfer learning,… more
- Microsoft Corporation (Mountain View, CA)
- …high- performance GPUs, ultra-low-latency NVLink/NVSwitch networks, and innovative liquid-cooling systems . Our team is seeking a Member of Technical Staff, ... Hardware Health, to ensure these systems deliver sustained reliability, performance , and availability...predictive health models, failure detection frameworks, and autonomous remediation systems that keep our AI clusters operating… more
- Microsoft Corporation (Mountain View, CA)
- …the boundaries of scale, performance , and deployment, creating frontier AI systems that power transformative experiences across Microsoft. The Multimodal ... (Pandas, NumPy, etc.) + OR equivalent experience. + **Experience with large-scale AI systems ** - design and deployment of distributed architectures, multimodal… more
- Microsoft Corporation (Mountain View, CA)
- …Microsoft AI ** , created to push the boundaries of AI toward **Humanist Superintelligence-ultra-capable systems that remain controllable, safety-aligned, and ... and storage while building large-scale distributed applications on cloud platforms. + Build systems for AI research teams, with a solid understanding of training… more