Agentic AI Takes the Throne: Microsoft's 7 New Models, Open-Source Fights Back, and Robots Get Real
Agentic AI Takes the Throne: Microsoft’s 7 New Models, Open-Source Fights Back, and Robots Get Real
Microsoft Build 2026 just wrapped in San Francisco, and the message is unambiguous: the agent-first era is no longer a roadmap item — it’s the product. While the big players stack in-house models and enterprise sandboxes, the open-source world isn’t sitting still. MiniMax dropped an open-weight model that genuinely threatens the frontier, NVIDIA open-sourced the most ambitious physical AI world model yet, and Huawei is rewriting what an operating system looks like when agents live in the kernel.
Let’s untangle the spaghetti.
Microsoft Goes Full Vertical: 7 Models, Zero OpenAI Dependency
The headline from Build is MAI-Thinking-1 — Microsoft’s first advanced reasoning model, trained from scratch on clean data without distilling from anyone else’s models. That last part matters. After renegotiating the OpenAI deal to loosen ties, Microsoft is serious about owning the full stack.
MAI-Thinking-1 is described as a “medium-sized model” that matches leading models on key software engineering benchmarks. It handles complex multi-step instructions, long-context reasoning, and code generation. It’s open now on Microsoft Foundry.
But the real play for developers is MAI-Code-1-Flash — an inference-efficient coding model deeply integrated into GitHub Copilot and VS Code. Early testing claims it outperforms other small models in its class while using 60% fewer tokens. If that holds up in practice, it’s a direct shot at the cost structure of running coding agents at scale.
The full lineup from Build:
- MAI-Thinking-1 — reasoning model, multi-step instructions
- MAI-Code-1-Flash — lightweight coding model in Copilot/VS Code
- MAI-Image 2.5 + Flash — text-to-image and image editing
- MAI-Transcribe-1.5 — 5x faster than competing transcription models
- MAI-Voice-2 + Flash — 15 new languages, more voice options
Microsoft also announced Microsoft Execution Containers (MXC) in preview — OS-level sandboxing for agents with policy-based isolation. This is the infrastructure play: if agents are going to run enterprise workloads, they need to be contained at the OS layer, not just the app layer. Web IQ launched as a real-time web grounding layer across GitHub Copilot, Foundry, and Copilot Studio.
The strategic read: Microsoft is building the full agent stack from silicon to sandbox, and they’re done being a reseller.

NVIDIA Cosmos 3: The Open Model That Thinks in the Physical World
While Microsoft was stacking models at Build, NVIDIA quietly dropped what might be the most technically ambitious open model of the year. Cosmos 3 is the world’s first fully open “omnimodel” — it natively understands and generates text, images, video, ambient sound, and actions with physics accuracy.
The architecture is the story here: a mixture-of-transformers design that pairs a reasoning transformer with an expert generation transformer. It understands object interactions, motion, and spatial-temporal relationships before generating video and action trajectories. That’s not just a video model — it’s a world simulator that can predict what happens next.
The Cosmos 3 lineup:
- Cosmos 3 Super — highest physics accuracy for post-training robotics and AV models
- Cosmos 3 Nano — video and action reasoning in fractions of a second
- Cosmos 3 Edge — real-time inference at the edge (coming soon)
It ranks first across open models on Physics-IQ, PAI-Bench, R-Bench, RoboLab, RoboArena, VANTAGE-Bench, and TAR leaderboards. NVIDIA claims it reduces physical AI training and evaluation cycles from months to days.
The Cosmos Coalition launched alongside it — a collaboration with Agile Robots, Black Forest Labs, Generalist, LTX, Runway, and Skild AI to advance open world models. Jensen Huang’s quote: “The big bang of physical AI is just around the corner.”
For anyone building robots, autonomous vehicles, or vision agents — this is the model to evaluate this week.
MiniMax M3: Open-Weight Frontier Performance, No Gatekeeping
Here’s where it gets spicy. MiniMax M3, released June 1, is the first open-weight model to combine three frontier capabilities in one system: frontier-level coding, 1M-token context, and native multimodality (text, image, video input with text output).
Those three capabilities are table stakes for closed-source frontier models. M3 is the first open-weight model to bring them together. If you’re running repo-aware coding agents, 1M context isn’t a marketing bullet — it’s the difference between an agent that understands your codebase and one that hallucinates about it.
M3 is available on OpenRouter now. The pricing is aggressive, and the benchmarks suggest it’s genuinely competitive with models that cost 5-10x more.
The open-source vs. closed-frontier gap just narrowed significantly.

Huawei’s Agent-First OS and the New Geography of AI
Huawei is transforming HarmonyOS into an AI-native operating system where intelligent agents become part of the system layer rather than standalone applications. This isn’t Copilot bolted onto Windows — it’s an OS designed from the ground up where agents are first-class citizens in the kernel.
The strategic implication: if you believe the future of computing is agent-mediated (and Build 2026 suggests Microsoft does), then the OS needs to be architected for agents at the lowest level. Huawei is betting billions of users will interact with Chinese AI through this layer.
Meanwhile, Anthropic confidentially submitted its draft S-1 to the SEC on June 1, signaling IPO preparations. Their Project Glasswing initiative — focused on securing critical software for the AI era — expanded access to Claude Mythos, a cybersecurity-focused model. The company is reportedly near $30B ARR with valuations around $380B.
The Coding Agent Wars: Who’s Winning?
Let’s ground this in what actually matters for builders. The coding agent landscape right now:
- Claude Code — 135K GitHub commits per day, rate limits doubled, moved off Pro plan
- GitHub Copilot with MAI-Code-1-Flash — 60% fewer tokens, deeply integrated
- Codex on Cerebras — 1,000 tok/sec inference
- MiniMax M3 — open-weight, 1M context, multimodal, frontier coding
The trend is clear: inference efficiency is the new benchmark. Raw capability matters less than cost-per-token when you’re running agents that burn through context windows like a GPU burns through watts.
Microsoft’s bet is that owning the model + the IDE + the OS sandbox is an unbeatable combination for enterprise. The open-source counter-bet is that open-weight models with frontier performance make the platform lock-in irrelevant.
Both bets are rational. The market is big enough for both — for now.
What I’m Watching
Three things from this week that I think will matter in 6 months:
- Microsoft Execution Containers — if OS-level agent sandboxing becomes the enterprise standard, it changes the security model for every agent deployment
- Cosmos 3 adoption — if robotics teams actually use this to cut training cycles from months to days, it accelerates the entire physical AI timeline
- MiniMax M3 in production — the first real test of whether open-weight models can handle frontier coding workloads at scale without the safety net of a big lab
The agentic era isn’t a future tense. It’s shipping now, and the spaghetti is only going to get messier.
This post contains affiliate links. If you found this useful, consider subscribing to the newsletter for daily AI dispatches in your inbox.