- Bosch (Sunnyvale, CA)
- …journals such as CVPR, ICRA, IROS, RSS, NeurIPS and CoRL. **Job Description** As the Distributed Embodied AI Systems intern, you will perform research on ... that take advantage of technologies in the field of reliable distributed computing. We work with internal...future prediction for latency mitigation in distributed embodied AI systems . A… more
- Microsoft Corporation (Redmond, WA)
- …healthcare, economics, and the environment. Are you passionate about building the future of reliable , large-scale cloud and AI systems ? The ** Systems ... Interns to tackle cutting-edge challenges at the intersection of distributed systems , AI systems...letter. **Preferred Qualifications** + Experience of building scalable and reliable systems . + Demonstrated ability to develop… more
- Microsoft Corporation (Redmond, WA)
- **Overview** **Help shape the future of reliable AI systems ** . At Microsoft Research's AI and Systems Reliability Group (Redmond, WA), we push the ... the computing landscape. We are seeking **Senior Researcher - AI and Systems Reliability - Microsoft Research**...Systems Reliability - Microsoft Research** areas such as distributed systems and reliability, formal methods and… more
- NVIDIA (Santa Clara, CA)
- …design, or enterprise platform engineering. + Deep expertise in architecting large-scale distributed systems with a focus on reliability, performance, and ... record of publishing technical papers, architecture patterns, or thought leadership in AI systems . + Knowledge of observability tools, telemetry dashboards, and… more
- Cisco (San Jose, CA)
- …platforms, such as AWS, Azure, or Google Cloud. + Understanding of distributed systems concepts, including scalability, reliability, fault tolerance, and data ... Team** Our dedicated team members are building the future of Cisco's AI -driven platforms and data infrastructure, supporting innovation across the globe. You will… more
- Oracle (Nashville, TN)
- …Work closely with a collaborative and experienced global team. - Expand your knowledge in AI , cloud computing, and distributed systems . - Contribute to one ... tools to operationalize Large Language Models (LLMs) and agentic AI systems . Our goal is to empower...will contribute to the design and implementation of scalable, distributed systems that serve LLMs and support… more
- NVIDIA (Santa Clara, CA)
- …and inference more reliable , scalable, and efficient. If you're passionate about AI , distributed systems , and high-performance computing, we want to hear ... driving down cluster downtime towards zero, ensuring that our AI systems remain robust and reliable...detection. + Hands-On Coding & Optimization: Contribute to large-scale distributed systems with high-quality, production-level C++ and… more
- Oracle (Frankfort, KY)
- …Work closely with a collaborative and experienced global team. - Expand your knowledge in AI , cloud computing, and distributed systems . - Contribute to one ... tools to operationalize Large Language Models (LLMs) and agentic AI systems . Our goal is to empower...will contribute to the design and implementation of scalable, distributed systems that serve LLMs and support… more
- Oracle (Columbus, OH)
- …. This is a highly technical, hands-on role where you'll build large-scale distributed systems , optimize AI /ML workflows, and collaborate with ... observability, CI/CD pipelines, and operational excellence. Troubleshoot complex issues in distributed systems and participate in on-call rotations as needed.… more
- Oracle (Raleigh, NC)
- …learning, LLM applications, and agentic AI . Our team builds real-world AI systems and deploys scalable, production-ready solutions across Oracle's enterprise ... engineer to contribute to the design and deployment of advanced AI systems , including LLM-powered agents, Retrieval-Augmented Generation (RAG) pipelines,… more
- Charles Schwab (San Francisco, CA)
- …+ Champion reliability, monitoring, observability, and operational best practices for AI systems and data pipelines. + Collaborate with cross-functional ... in the development process. You will ensure that the systems we build are robust, reliable , and...troubleshoot complex problems with ambiguous or incomplete data in distributed systems . + Curiosity about new technologies… more
- Walmart (Sunnyvale, CA)
- …build dynamic, context-aware systems . 2. **Architecture ; Scalability:** + Architect scalable, distributed AI systems with a focus on performance, fault ... to lead the design, development, and deployment of advanced AI systems . This role involves architecting scalable...Walmart GTP, you will be building highly scalable and reliable APIs, services and applications which will drive the… more
- Amazon (Redmond, WA)
- …for a Data Engineering Manager who will design, implement, and operate globally distributed systems that enable Leo to achieve low single-digit-second query ... real-time analytics layer or lakehouse, and to support agentic AI capabilities on top. You'll build these systems...user experience in real time. We combine expertise in distributed systems , data lakehouse architectures, and applied… more
- Amazon (Redmond, WA)
- …is for a Data Engineer who will design, implement, and operate globally distributed systems that enable Leo to achieve low single-digit-second query responses ... real-time analytics layer or lakehouse, and to support agentic AI capabilities on top. You'll build these systems...user experience in real time. We combine expertise in distributed systems , data lakehouse architectures, and applied… more
- Meta (Menlo Park, CA)
- …leverage our large-scale GPU training and inference fleet through an observable, reliable and high-performance distributed AI /GPU communication stack. ... Skills:** Research Scientist, AI Networking (PhD) Responsibilities: 1. Enabling reliable and highly scalable distributed ML training on Meta's large-scale… more
- Charles Schwab (San Francisco, CA)
- …bring curiosity, creativity, and technical depth to help shape the next generation of reliable AI at Schwab. **What you have** **Required Qualifications** + 8+ ... you will play a key role in ensuring our AI solutions are reliable , scalable, and resilient-enabling...Experience implementing monitoring, alerting, and incident response for large-scale distributed systems . + Proven track record in… more
- GE Vernova (Niskayuna, NY)
- …neural network architectures (eg, CNNs, RNNs, Transformers). + Expertise in designing scalable, distributed architectures for AI systems . + Strong experience ... Azure, GCP) and containerization (Kubernetes, Docker). + Familiarity with large-scale distributed systems and database technologies. + Experience in creating… more
- eightfold.ai (Santa Clara, CA)
- …with opportunities. Responsibilities: + Research, design, development, and deployment of advanced AI agents and agentic systems . + Architect and implement ... About Eightfold : Eightfold is a global leader in AI -native enterprise talent platform, trusted by the world's largest...we are defining the next era of agentic talent systems . What sets Eightfold apart is not just the… more
- Paycom Online (Oklahoma City, OK)
- …give and receive concrete feedback.** + **Experience in deploying and scaling containerized, distributed software and AI systems using tools such as ... in "traditional" NLP tools** + **Experience in SOA, Modular Monolith Architecture, and distributed systems for AI training and inference** + **Familiarity… more
- Zebra Technologies (Holtsville, NY)
- …Knowledge of NLP, computer vision, or reinforcement learning. + Familiarity with multi-agent systems and distributed AI frameworks. + Strong communication ... processing (NLP), or computer vision. + Build, test, deploy, and maintain software and AI systems , ensuring high performance, security, and scalability. + Own a… more