Apple's assistant has spent more than a decade as a punchline, the feature people invoked to set a timer and little else. That changed on June 8, 2026, when the company walked onto the WWDC stage and unveiled a rebuilt assistant branded "Siri AI," a conversational, multi turn system meant to hold a thread across a whole exchange rather than fielding one command at a time. The surprise was not the redesign. It was the machinery underneath it.
When Siri AI hits a question too hard for an iPhone or even Apple's own data centers to answer, it hands that question to Google. Specifically, it routes the query to a custom 1.2 trillion parameter model built on Google's Gemini, running on Nvidia Blackwell B200 GPUs inside Google Cloud. Bloomberg has pegged the licensing arrangement at roughly $1 billion per year, one of the largest such deals the AI industry has seen. For a company that has spent years insisting your data never leaves its control, sending your most complex requests to a rival's silicon is a remarkable concession, and it reframes what Apple Intelligence actually is. The Apple Siri Nvidia Blackwell deal is now the backbone of the assistant's most demanding answers.
Three Routing Tiers and the One That Leaves the Building
Siri AI does not send everything to Google. Apple built a three tier routing system that decides, query by query, where the work should happen. Simple tasks stay on the device, handled locally by Apple Silicon: setting reminders, launching apps, quick factual lookups. Moderately complex requests travel to Apple's Private Cloud Compute servers, the privacy hardened data center tier Apple introduced with the first wave of Apple Intelligence.
The third tier is the new one. The heaviest reasoning tasks, the multi step questions that demand a frontier class model, leave Apple's infrastructure entirely and land in Google Cloud. That is where the custom Gemini model lives, and that is the leg of the pipeline the industry is scrutinizing. Most users will never know which tier answered them; the routing is invisible by design. But the architecture means a meaningful slice of Siri traffic now depends on hardware and a data center Apple does not own.
The scale jump is the reason. Apple's prior cloud based Apple Intelligence model ran on roughly 150 billion parameters. The new flagship, called AFM Cloud Pro, carries roughly 1.2 trillion total parameters, about eight times larger. It uses a mixture of experts architecture, which activates only a subset of those parameters for any given query, keeping inference costs manageable while giving the model a far deeper well of capability to draw on.
Apple Siri Nvidia Blackwell Deal
The Apple Siri Nvidia Blackwell deal is, at its core, a decision to rent capability rather than build it. Apple could have spent years and enormous sums training a 1.2 trillion parameter frontier model from scratch and standing up the GPU clusters to serve it. Instead, according to TechCrunch's November 2025 reporting on the near final agreement, Apple evaluated models from Google, OpenAI, and Anthropic before settling on Gemini as the foundation for Siri AI's top tier.
Choosing Google is loaded with history. The two companies are courtroom adversaries and search deal partners at the same time, and handing Google the intelligence layer of the iPhone assistant deepens a dependency Apple has publicly tried to reduce. Yet the logic is hard to argue with. Google already operates the data centers, already runs Gemini at scale, and could deliver a frontier model faster than Apple could build one on its own. The roughly $1 billion annual price tag, in that light, reads less like a licensing fee and more like a shortcut past a multi year engineering gap.
The hardware choice matters as much as the model. Running on Nvidia's Blackwell B200 GPUs is what makes the arrangement technically feasible at the scale Apple needs, and it is Nvidia's newest silicon that unlocks the privacy guarantees Apple was unwilling to give up.
Confidential Compute Made Google Palatable
Apple's whole brand rests on the promise that your data is yours. Sending queries to Google Cloud looks, on its face, like a betrayal of that promise. Apple's answer is a hardware feature baked into the Blackwell chips: confidential compute.
Nvidia's hardware based confidential compute encrypts the model weights, the user's input, and the inference results while they are being processed, not just at rest or in transit. The practical effect is that even Google, the operator of the cloud where the computation runs, cannot read the data flowing through it. The GPU processes encrypted material inside a protected boundary and returns an encrypted answer. Google sees ciphertext, not your question.
Sebastian Marineau Mes, an Apple software VP, framed the move as an extension rather than an abandonment of the company's privacy posture. Apple "wanted to avail ourselves of the latest technology from Nvidia," he said, and the company extended its Private Cloud Compute privacy framework to a third party cloud for the first time. That "first time" phrasing is the tell: Apple built its entire cloud privacy story around infrastructure it controlled end to end, and this is the moment it stretched that story to cover someone else's servers.
This report is free to read. Subscribers gain full access to the Speedway Scene archive and help sustain independent, rigorous journalism on the forces that move markets and power. Subscribe
The Parameter Count Behind the Upgrade
Numbers like 1.2 trillion parameters are easy to skim past, but the jump from 150 billion is the reason this deal exists at all. Parameter count is a rough proxy for how much a model can learn and how nuanced its reasoning can be. An eightfold increase is not an incremental upgrade; it is a different class of system, and Apple's own data centers were not built to serve it.
The mixture of experts design is what keeps that scale economical. Rather than firing all 1.2 trillion parameters for every query, the model routes each request to the relevant "experts," activating a fraction of the network. That is how a model this large can respond quickly enough for a live assistant without demanding an impossible amount of compute per answer. It also explains why the heaviest tier needs Blackwell class GPUs: even activating a subset of a trillion parameter model is a serious computational load.
For users, the payoff should be an assistant that can actually follow a conversation, reason through multi part requests, and handle the kind of open ended questions that made older Siri fall back on a web search. Whether Apple's routing correctly identifies which questions deserve that firepower is the practical test the beta will decide.
Rollout Timeline Across Apple's Operating Systems
Siri AI is set to launch in beta as part of iOS 27, iPadOS 27, macOS 27, watchOS 27, visionOS 27, and tvOS 27 later in 2026. A full rollout is expected around the September iPhone launch, the annual event where Apple's new hardware and its flagship software features usually arrive together. That timing puts Siri AI in front of the widest possible audience right as new iPhones ship.
Two markets are conspicuously absent at launch: the European Union and China, both held back by regulatory constraints. In the EU, the Digital Markets Act and data governance rules complicate a system that routes user queries to a third party cloud, even an encrypted one. China brings its own set of data localization and approval hurdles. For hundreds of millions of users in those regions, the rebuilt assistant will not arrive on the first wave, a gap that undercuts the "everywhere, for everyone" framing Apple prefers.
Tim Cook has since held what late June and early July 2026 reporting described as "constructive" talks with EU regulators over the Siri AI launch, a sign Apple is working to shorten that exclusion rather than accept it. The outcome of those conversations will determine how quickly the European gap closes.
Cook's Final Keynote as Chief Executive
WWDC 2026 carried a second headline beyond the assistant. It marked Tim Cook's final keynote as Apple CEO before he transitions to executive chairman on September 1, 2026. That Cook chose to close out his run as chief executive by unveiling an assistant whose smartest answers come from Google's model and Nvidia's chips is a fitting bookend for a tenure defined as much by supply chains and partnerships as by invention.
The strategic bet embedded here is that owning the experience matters more than owning every layer beneath it. Apple designs the routing, the privacy framework, the on device tier, and the user facing product. It licenses the frontier model and rents the GPUs. If confidential compute holds up and users cannot tell the difference between an answer from Cupertino and one from a Google data center, Apple gets a frontier assistant without a frontier lab, and the dependency stays invisible.
Stakes for Apple's Next Chapter
The risk is the mirror image. Apple has tied a core, brand defining feature to a competitor's model and a third party's hardware, at a reported cost near $1 billion a year, and any stumble in that pipeline, technical, contractual, or regulatory, now touches the iPhone's headline assistant. The Apple Siri Nvidia Blackwell deal is a wager that renting the hardest part of intelligence is smarter than building it. The beta later this year, and the incoming CEO who inherits the arrangement, will start settling whether that wager pays off.