- Palo Alto Networks (Santa Clara, CA)
- …Cassandra, Spanner), with an understanding of their respective architectural trade-offs for distributed systems + Demonstrated ability to design scalable data ... Alto Networks, we are redefining cybersecurity. As a Distinguished Engineer on the Enterprise DLP team, you will be...service. Your mission is to establish the standards and systems necessary to process and analyze massive volumes of… more
- Cadence Design Systems, Inc. (San Jose, CA)
- …make an impact on the world of technology. We are seeking a highly skilled and experienced AI Systems Engineer to join our team. This is a hands-on, senior ... infrastructure. You will be responsible for the entire lifecycle of our AI systems , from architecting and building high-performance GPU clusters to deploying… more
- Amazon (Cupertino, CA)
- …degree in computer science or equivalent - Preferred previous software engineer expertise with Pytorch/Jax/Tensorflow, Distributed libraries and Frameworks, ... Web Services (AWS) is looking for a Software Development Engineer II to build, deliver, and maintain complex products...customers and raise our performance bar. You'll design fault-tolerant systems that run at massive scale as we continue… more
- Amazon (Cupertino, CA)
- …Machine Learning accelerators. This role is for a Senior Machine Learning Engineer in the Distribute Training team for AWS Neuron, responsible for development, ... Diffusion, Vision Transformers (ViT) and many more. The ML Distributed Training team works side by side with chip...(design patterns, reliability and scaling) of new and existing systems experience - 5+ years of full software development… more
- NVIDIA (Santa Clara, CA)
- Join the NVIDIA Deep Learning Frameworks Infrastructure team as a Senior Systems Engineer focusing on High-Performance AI & Networking Applications, ... the crowd: + Mastery in deep learning frameworks or distributed training systems . + Familiarity with datacenter...environments. + Understanding of fast, distributed storage systems like Lustre and GPFS for AI /HPC… more
- Cisco (San Jose, CA)
- …platforms, such as AWS, Azure, or Google Cloud. + Understanding of distributed systems concepts, including scalability, reliability, fault tolerance, and data ... Team** Our dedicated team members are building the future of Cisco's AI -driven platforms and data infrastructure, supporting innovation across the globe. You will… more
- LinkedIn (Mountain View, CA)
- …such as Scala or other relevant coding languages + Hands-on experience developing distributed systems or other large-scale systems . Preferred Qualifications ... billions of user queries. Model Training Infrastructure: As an engineer on the AI Training Infra team,...Beam, Spark etc., feature engineering, + Experience with search systems or similar large-scale distributed systems… more
- NVIDIA (Santa Clara, CA)
- …data structures, operating systems , computer architecture, parallel programming, distributed systems , deep learning theories. + Knowledgeable and passionate ... systems and performance optimization. Our leadership includes world-renowned experts in AI systems who have received multiple academic and industry research… more
- LinkedIn (Mountain View, CA)
- …Experience building ML applications, LLM serving, GPU serving. + Experience with search systems or similar large-scale distributed systems + Experience with ... billions of user queries Model Training Infrastructure: As an engineer on the AI Training Infra team,...PyTorch Lightning, LLMs, GNNs, MLFlow, Kubeflow and large scale distributed systems + Familiarity with containers and… more
- Google (Sunnyvale, CA)
- Systems Development Engineer II, AI Infrastructure _corporate_fare_ Google _place_ Kirkland, WA, USA; Sunnyvale, CA, USA **Early** Experience completing work ... this role, you will have the exposure to integrated AI infrastructure systems (GPU or TPU), from...reports and consultation. + Work with customers on defining distributed systems requirements, testing procedures, and proposing… more
- LinkedIn (Mountain View, CA)
- …leading / building deep learning systems + Hands-on experience developing distributed systems or other large-scale systems Preferred Qualifications: + ... billions of user queries. Model Training Infrastructure: As an engineer on the AI Training Infra team,...Rust, Scala + 5+ years of experience with large-scale distributed systems and client-server architectures + Experience… more
- ServiceNow, Inc. (Santa Clara, CA)
- …battle-tested in production and scalable across industries. **Who You Are:** You are a systems -minded, AI -native engineer who ships real software. You own ... context management + **Performance & observability:** Skilled in debugging distributed systems , tuning for latency, and implementing...frontier of enterprise AI -where your code powers AI transformation, your systems go live in… more
- Target (Sunnyvale, CA)
- …such as Vertex AI , Azure ML or Sagemaker + Experience using distributed training frameworks like Spark, Ray, TensorFlow Distributed + Experience with serving ... at https://corporate.target.com/careers/benefits . **JOIN TARGET AS A LEAD MACHINE LEARNING ENGINEER ** **About Us:** Working at Target means helping all families… more
- LinkedIn (Mountain View, CA)
- …AUC, diversity) Preferred Qualifications: + Proficient in modern programming languages used in AI and large-scale systems , including Python, Java, C++, and Go + ... of how LinkedIn connects members to opportunity. As an AI PhD intern, you'll work with massive semi-structured text,...activity data to design and build scalable, intuitive recommender systems that power core LinkedIn experiences - including the… more
- Google (Sunnyvale, CA)
- …As a Staff Software Engineer , you will have exposure to fully integrated AI infrastructure systems (GPU or TPU), from hardware to software design and ... Staff Software Engineer , AI Infrastructure _corporate_fare_ Google _place_...5 years of experience building and developing large-scale infrastructure, distributed systems or networks, or experience with… more
- Palo Alto Networks (Santa Clara, CA)
- …& Optimization: Contribute significantly to the detailed design of large-scale, distributed AI /ML systems , ensuring performance, reliability, security, ... all win with precision. **Your Career** As a Principal AI Engineer for the Enterprise AI...Experience** + 10+ years of experience in software engineering, distributed systems , or enterprise architecture, including at… more
- NVIDIA (Santa Clara, CA)
- …ideal candidate is an experienced technical leader with deep expertise in generative AI , distributed systems , and ability to convert proof-of-concept ... on the world. NVIDIA is seeking a Senior Software Engineer to serve as a Tech Lead, driving the...and best practices for developing, testing, and deploying agentic AI applications across distributed environments. What we… more
- LinkedIn (Mountain View, CA)
- …leading / building deep learning systems + Hands-on experience developing distributed systems or other large-scale systems Preferred Qualifications + ... work for Training infrastructure. As a Principal Staff Software Engineer on the AI Training Infra team,...Rust, Scala + 5+ years of experience with large-scale distributed systems and client-server architectures + Co-author… more
- Walmart (Sunnyvale, CA)
- …build dynamic, context-aware systems . 2. **Architecture ; Scalability:** + Architect scalable, distributed AI systems with a focus on performance, fault ... redefine customer experiences. We are seeking a Principal, Software Engineer with deep expertise in Generative AI ,...to lead the design, development, and deployment of advanced AI systems . This role involves architecting scalable… more
- NVIDIA (Santa Clara, CA)
- …with InfiniBand with IBOP and RDMA + Understanding of fast, distributed storage systems like Lustre and GPFS for AI /HPC workloads + Familiarity with deep ... We are seeking a Senior AI /ML Performance and Efficiency Engineer , GPU...to end + Debugging and optimization experience with NSight Systems and NSight Compute + Experience with debugging large-scale… more