- Meta (Menlo Park, CA)
- …and efficiency in our global network . **Required Skills:** Network Engineer , HPC Systems Network Strategy Responsibilities: 1. Design, ... meeting our demands; you will be responsible for conceiving, developing, and deploying software, hardware and network systems and tools that improve reliability… more
- Meta (Menlo Park, CA)
- …fabric and host networking, comms lib and scheduling infrastructure. **Required Skills:** AI/ HPC Systems Performance Engineer Responsibilities: 1. Active ... daily basis. We need to build and evolve our network infrastructure that connects myriads of training accelerators like...a loss-less fabric interconnect. To improve performance of these systems we constantly look for opportunities across stack: … more
- LTD Global (Berkeley, CA)
- …computing ( HPC ) and data analysis for the organization. Our center provides essential HPC and data systems to more than 10,000 researchers working in areas ... Position overview: We are seeking a Site Reliability Engineer to join our Operations Group. This role...part of a 24/7 operations team that ensures our systems are accessible, reliable, secure, and available to the… more
- Meta (Menlo Park, CA)
- …Communications Library), which enables multi-GPU and multi-node data communication through HPC -style collectives. NCCL has been integrated into PyTorch and is on ... (eg Large-Scale GenAI/LLM training) from the trainer down to the inter-GPU and network communication layer. And we are seeking for engineers to work on the… more
- Meta (Menlo Park, CA)
- …Meta's global data center networks. Our work covers the entire network lifecycle, including hardware development, capacity planning, distributed and centralized ... control systems , modeling/provisioning/automation, monitoring/troubleshooting/analytics, and simulation/design/failure analysis.We are actively seeking Software… more
- Meta (Menlo Park, CA)
- …Meta's global data center networks. Our work covers the entire network lifecycle, including hardware development, capacity planning, distributed and centralized ... control systems , modeling/provisioning/automation, monitoring/troubleshooting/analytics, and simulation/design/failure analysis.We are actively seeking Software… more
- quadric.io, Inc (Burlingame, CA)
- …battery operated smart-sensor systems to high-performance automotive or autonomous vehicle systems . Unlike other NPUs or neural network accelerators in the ... co-optimized software and hardware is targeted to run neural network (NN) inference workloads in a wide variety of...C++ DSP and control code. Role: The Corporate Applications Engineer is the key bridge between development engineering and… more