OpenInfer has raised $8 million in funding to redefine AI inference for edge applications.
It’s the brain child of Behnam Bastani and Reza Nourai, who spent nearly a decade of building and scaling AI systems together at Meta’s Reality Labs and Roblox.
Through their work at the forefront of AI and system design, Bastani and Nourai witnessed firsthand how deep system architecture enables continuous, large-scale AI inference. However, today’s AI inference remains locked behind cloud APIs and hosted systems—a barrier for low-latency, private, and cost-efficient edge applications. OpenInfer changes that. It wants to agnostic to the types of devices at the edge, Bastani said in an interview with GamesBeat.
By enabling the seamless execution of large AI models directly on devices—from SoCs to the cloud—OpenInfer removes these barriers, enabling inference of AI models without compromising performance.
The implication? Imagine a world where your phone anticipates your needs in real time — translating languages instantly, enhancing photos with studio-quality precision, or powering a voice assistant that truly understands you. With AI inference running directly on your device, users can expect faster performance, greater privacy, and uninterrupted functionality no matter where they are. This shift eliminates lag and brings intelligent, high-speed computing to the palm of your hand.
Building the OpenInfer Engine: AI Agent Inference Engine

Since founding the company six months ago, Bastani and Nourai have assembled a team of
seven, including former colleagues from their time at Meta. While at Meta, they had built Oculus
Link together, showcasing their expertise in low-latency, high-performance system design.
Bastani previously served as Director of Architecture at Meta’s Reality Labs and led teams at
Google focused on mobile rendering, VR, and display systems. Most recently, he was Senior
Director of Engineering for Engine AI at Roblox. Nourai has held senior engineering roles in
graphics and gaming at industry leaders including Roblox, Meta, Magic Leap, and Microsoft.
OpenInfer is building the OpenInfer Engine, what they call an “AI agent inference engine”
designed for unmatched performance and seamless integration.
To accomplish the first goal of unmatched performance, the first release of the OpenInfer
Engine delivers 2-3x faster inference compared to Llama.cpp and Ollama for distilled DeepSeek
models. This boost comes from targeted optimizations, including streamlined handling of
quantized values, improved memory access through enhanced caching, and model-specific
tuning—all without requiring modifications to the models.
To accomplish the second goal of seamless integration with effortless deployment, the
OpenInfer Engine is designed as a drop-in replacement, allowing users to switch endpoints
simply by updating a URL. Existing agents and frameworks continue to function seamlessly,
without any modifications.
“OpenInfer’s advancements mark a major leap for AI developers. By significantly boosting
inference speeds, Behnam and his team are making real-time AI applications more responsive,
accelerating development cycles, and enabling powerful models to run efficiently on edge
devices. This opens new possibilities for on-device intelligence and expands what’s possible in
AI-driven innovation,” said Ernestine Fu Mak, Managing Partner at Brave Capital and an
investor in OpenInfer.
OpenInfer is pioneering hardware-specific optimizations to drive high-performance AI inference
on large models—outperforming industry leaders on edge devices. By designing inference from
the ground up, they are unlocking higher throughput, lower memory usage, and seamless
execution on local hardware.
Future roadmap: Seamless AI inference across all devices
OpenInfer’s launch is well-timed, especially in light of recent DeepSeek news. As AI adoption
accelerates, inference has overtaken training as the primary driver of compute demand. While
innovations like DeepSeek reduce computational requirements for both training and inference,
edge-based applications still struggle with performance and efficiency due to limited processing
power. Running large AI models on consumer devices demands new inference methods that
enable low-latency, high-throughput performance without relying on cloud infrastructure,
creating significant opportunities for companies optimizing AI for local hardware.
“Without OpenInfer, AI inference on edge devices is inefficient due to the absence of a clear
hardware abstraction layer. This challenge makes deploying large models on
compute-constrained platforms incredibly difficult, pushing AI workloads back to the
cloud—where they become costly, slow, and dependent on network conditions. OpenInfer
revolutionizes inference on the edge,” said Gokul Rajaram, an investor in OpenInfer. Rajaram is
an angel investor and currently a board member of Coinbase and Pinterest.
In particular, OpenInfer is uniquely positioned to help silicon and hardware vendors enhance AI
inference performance on devices. Enterprises needing on-device AI for privacy, cost, or
reliability can leverage OpenInfer, with key applications in robotics, defense, agentic AI, and
model development.
In mobile gaming, OpenInfer’s technology enables ultra-responsive gameplay with real-time
adaptive AI. Enabling on-system inference allows for reduced latency and smarter in-game
dynamics. Players will enjoy smoother graphics, AI-powered personalized challenges, and a
more immersive experience evolving with every move.
“At OpenInfer, our vision is to seamlessly integrate AI into every surface,” said Bastani. “We aim to establish OpenInfer as the default inference engine across all devices—powering AI in self-driving cars, laptops, mobile devices, robots, and more.”
OpenInfer has raised an $8 million seed round for its first round of financing. Investors include
Brave Capital, Cota Capital, Essence VC, Operator Stack, StemAI, Oculus VR’s Co-founder and former CEO Brendan Iribe, Google Deepmind’s Chief Scientist Jeff Dean, Microsoft Experiences and Devices’ Chief Product Officer Aparna Chennapragada, angel investor Gokul Rajaram, and others.
“The current AI ecosystem is dominated by a few centralized players who control access to
inference through cloud APIs and hosted services. At OpenInfer, we are changing that,” said
Bastani. “Our name reflects our mission: we are ‘opening’ access to AI inference—giving
everyone the ability to run powerful AI models locally, without being locked into expensive cloud
services. We believe in a future where AI is accessible, decentralized, and truly in the hands of
its users.”