NVIDIA Cosmos 3: Open Physical AI Model | TechPillow

NVIDIA Cosmos 3: The Open Omnimodel Powering Physical AI

The First Open Omnimodel for Physical AI

On 1 June 2026, NVIDIA launched Cosmos 3 at GTC Taipei during Computex — the first fully open omnimodel for physical AI. The model weights appeared on Hugging Face the same day under the OpenMDW-1.1 licence. Two sizes are available: Cosmos 3 Nano at 8 billion parameters, designed for local deployment on RTX hardware, and Cosmos 3 Super at 32 billion parameters, targeting data centre inference. The defining capability is native generation of not just text, images, and video, but also ambient sound and robot action data — joint angles, gripper positions, trajectory waypoints — within a single architecture. Cosmos 3 ranks first across more than seven robotics benchmarks, including RoboLab and RoboArena, making it the most capable open physical AI foundation model released to date.

What Physical AI Requires

Physical AI must do something language models do not: understand space, object relationships, motion, and physical constraints, then produce specific actions a robot or vehicle can execute. Cosmos 3 addresses this by training on 20 trillion tokens of multimodal data — real-world video, simulation environments, and teleoperation recordings of humans controlling robotic systems. The result is a model that can watch a scene, predict what will happen next, generate a plausible video of that outcome, and produce the action trajectory a robot should follow to achieve a specified goal — all within a single inference call.

The MoT Architecture: Reasoning Before Acting

Cosmos 3 uses a Mixture-of-Transformers architecture, abbreviated MoT, which is deliberately distinct from Mixture-of-Experts. MoT pairs a reasoning transformer with a generation transformer in a two-stage pipeline. The reasoning transformer first builds a structured internal representation of the input scene, including spatial relationships and predicted dynamics. The generation transformer then uses that representation to produce outputs. The architectural decision enforces planning before generation, preventing the failure mode where a generative model produces physically implausible output — a known problem in earlier physical AI world models.

RoboLab and RoboArena Performance

RoboLab tests physical AI policy models in simulation across language-guided manipulation tasks, assessing whether a model can interpret a natural-language instruction and translate it into a valid physical action plan. RoboArena compares policy models on DROID robots in real-world environments, testing performance against physical constraints rather than simulated ones. Cosmos 3 Nano leads both evaluations, a result that reflects the model's ability to generalise across task types rather than optimise for a single evaluation setup.

The Cosmos Coalition

NVIDIA launched Cosmos 3 alongside the NVIDIA Cosmos Coalition — a group of AI labs and robotics companies committed to using Cosmos 3 and DGX Cloud infrastructure for production model training while contributing their own research and evaluations to the ecosystem. Founding members include Agile Robots, Black Forest Labs, Generalist, LTX, Runway, and Skild AI. The composition matters: Runway and Black Forest Labs bring video generation expertise, Agile Robots brings humanoid deployment data, and Skild AI contributes generalised robot learning research. The coalition provides a commercial production ecosystem around Cosmos 3 from day one, rather than the typical pattern where open model releases accumulate research use cases slowly over months.

Access and Licensing

Cosmos 3 Nano and Super are available immediately on Hugging Face and through build.nvidia.com. NVIDIA has published training recipes, a detailed technical report, and Hugging Face Diffusers integrations through the NVIDIA-NeMo GitHub repository. The OpenMDW-1.1 licence permits commercial use, modification, and redistribution, with attribution requirements and specific terms teams should review before deploying in production physical AI pipelines.

What This Means for Indian Technology Teams

India's manufacturing, logistics, and automotive sectors are accelerating robotics adoption as organised warehousing expands and domestic industrial production grows. Physical AI is the layer that would allow a robot in an assembly line or fulfilment centre to understand natural-language task descriptions, adapt to novel objects, and generate its own action plans rather than following rigid pre-programmed sequences.

Cosmos 3's open-weight release changes the economics for Indian AI teams working in this space. Before June 2026, building a capable physical AI system required either expensive proprietary API access or years of in-house research. An 8-billion-parameter open model that leads on established robotics benchmarks, available at no per-token cost and fine-tunable on custom robot demonstration data, substantially lowers that barrier. For Indian teams in manufacturing, logistics, and autonomous systems, the evaluation process can start today — the weights, training recipes, and technical report are all publicly available.

The Bottom Line

NVIDIA Cosmos 3, launched 1 June 2026 at GTC Taipei, is the first fully open omnimodel for physical AI — available in 8B Nano and 32B Super configurations on Hugging Face under OpenMDW-1.1. Trained on 20 trillion tokens using a Mixture-of-Transformers architecture, it ranks first across seven-plus robotics benchmarks including RoboLab and RoboArena. The Cosmos Coalition — Agile Robots, Black Forest Labs, Generalist, LTX, Runway, and Skild AI — provides a commercial production ecosystem from launch. For Indian teams in manufacturing, logistics, and autonomous systems, Cosmos 3 is the first open physical AI foundation model with the benchmark performance and industry backing to evaluate for production deployment.

Frequently Asked Questions

What is NVIDIA Cosmos 3 and when was it released?+

NVIDIA Cosmos 3 is the world's first fully open omnimodel for physical AI, released on 1 June 2026 at GTC Taipei during Computex. It is designed for robotics, autonomous vehicles, and physical AI applications, and can natively understand and generate text, images, video, ambient sound, and robot action data — including joint angles, gripper positions, and trajectory waypoints — within a single architecture. The model is available in two sizes on Hugging Face under the OpenMDW-1.1 licence: Cosmos 3 Nano at 8 billion parameters and Cosmos 3 Super at 32 billion parameters.

What benchmarks did Cosmos 3 achieve and what do they measure?+

Cosmos 3 ranks first across more than seven robotics evaluation frameworks, including first position on RoboLab and RoboArena. RoboLab tests physical AI policy models in simulation across language-guided manipulation tasks, measuring whether a model can interpret a natural-language task description and produce a valid physical action plan. RoboArena compares policy models on DROID robots in real-world environments, testing performance against physical constraints. Cosmos 3 Nano leads both evaluations, reflecting its ability to generalise across task types rather than optimise for a single benchmark.

What is the Mixture-of-Transformers architecture used in Cosmos 3?+

Cosmos 3 uses a Mixture-of-Transformers architecture, abbreviated MoT. Unlike a Mixture-of-Experts model, which routes individual tokens to specialist parameter sets during inference, MoT operates as a two-stage sequential pipeline: a reasoning transformer first builds a structured representation of the input scene, including spatial relationships and predicted dynamics, and a generation transformer then uses that representation to produce outputs such as video frames, ambient audio, or action trajectories. The sequential structure enforces planning before generation, preventing the failure mode where generative models produce physically implausible output.

Who are the Cosmos Coalition members and how can developers access Cosmos 3?+

The NVIDIA Cosmos Coalition founding members include Agile Robots, Black Forest Labs, Generalist, LTX, Runway, and Skild AI — companies spanning humanoid robotics, video generation, and generalised robot learning. Coalition members use Cosmos 3 technologies and DGX Cloud for production training while contributing models, research, and evaluations to the ecosystem. Developers can access Cosmos 3 weights on Hugging Face, test the model through NVIDIA's API at build.nvidia.com, and find training recipes and documentation through the NVIDIA-NeMo GitHub organisation. The OpenMDW-1.1 licence permits commercial use with attribution requirements.

Written by

TechPillow Team

Sharing insights on technology, product development, and the Indian tech ecosystem.

All Articles

NVIDIA Cosmos 3: The Open Omnimodel Powering Physical AI