Audio article narrated by OpenAI

The Empire State Building went from first sketches to opening in eighteen months. Seventy years later, the Salesforce Tower took twelve years from concept to completion. The buildings became more complex; the planning processes that precede them became proportionately slower. Construction labour productivity fell roughly 40 percent between 1970 and 2020 while the broader economy more than doubled. Every other major industry found ways to systematise. Construction largely did not.

The more revealing question isn’t whether technology can change this. It’s why the industry that most needs the change is the last to make it, even when the tools are already working in every adjacent sector. Sean Young sits at an unusual vantage point on this: he’s spent 25 years moving from practitioner-facing software at Autodesk to enterprise hardware at HP to platform infrastructure at NVIDIA, which means he’s watched the same adoption curve from every altitude. What follows is a conversation about what it looks like when the technical barriers genuinely fall, and what structural barriers remain standing.

About Sean Young

Sean Young is Director of AECO, Geospatial & AI Solutions Industry Marketing at NVIDIA, where he shapes go-to-market strategy for NVIDIA’s construction, geospatial. Over 25 years, his career has moved from 3ds Max product management at Autodesk, through enterprise hardware at HP, to Omniverse business development and now industry GTM at NVIDIA, a trajectory that spans every layer of the AEC technology stack from the tool a practitioner touches daily to the infrastructure that powers the tools themselves. He speaks regularly at Digital Built World, Geo Week, and AEC Innovate, and contributes to the AI Directory published by AEC Magazine.

“As a human, I can use one application at a time. Imagine your agent using four applications at the same time. You go home to sleep — your agents are still working.” Sean Young

The pipeline nobody believed was possible

The demo Sean’s team is currently running starts with an RFP. A single prompt extracts the key data into Excel: square footage, geolocation and usage requirements. That tabular data, alongside a conceptual pencil sketch, feeds into Rhino with Grasshopper. Geometry options emerge. Those options route into ArchiCAD to produce a full BIM model: windows, doors, floors, facade. Construction document sheets are generated. They land in Bluebeam. One prompt, one building’s worth of documentation.

“You do the prompt and you walk away from your computer. You go do something else. You work on the next project.”

The phrase “work on the next project” contains the challenge hidden inside the opportunity. An agent generating construction documents overnight solves a throughput problem and immediately creates a review problem. McKinsey research captures the inversion precisely: when an agentic system can generate 5,000 reports overnight, organisations “will come up against a new bottleneck because the human capacity to review those reports won’t exist.” Moving work from human hands to agents doesn’t eliminate the bottleneck; it relocates it. The question isn’t whether to walk away from the computer. It’s whether the organisation has rebuilt its review and decision processes to meet what comes back.

This is not a research paper. It’s a working experiment. And the framing matters: the goal isn’t to demonstrate that agents can replace architects. It’s to understand what becomes possible when the ceiling on individual throughput is lifted. One person, one prompt, one building’s worth of construction documents. The human-in-the-loop sits at design selection, if they want to be there at all.

The honest qualification Sean adds is that the current generation of LLMs doesn’t have a reliable spatial understanding of 3D relationships. You can convert a 3D space into words, and an agent can process those words, but the relational logic that makes MEP coordination work, where a duct must clear a beam that must not conflict with a column, is not yet reliable enough to be trusted at production quality. The AEC technology stack is responding to this from multiple directions: fine-tuned VLMs and node-graph approaches that define element relationships explicitly; purpose-built geometry engines that reason in three dimensions rather than translating space into text; and spatial world models, a research-stage direction Autodesk has backed with a $200 million investment in World Labs, designed to understand three-dimensional environments natively rather than through language proxies. Each direction has made measurable progress. None has closed the gap to production-quality MEP coordination. The path is visible. The gap is real.

Cosmos and the difference between seeing and simulating

“What it’s going to look like is not the problem we’re trying to solve. It’s how it’s going to work.”

Most AEC digital twins stop right there at what they look like. A photorealistic model of a building or site, navigable in real time, updated with sensor data, tells you what you’d see. It cannot tell you how the building behaves under wind load, crowd pressure, or a concrete pour schedule gone wrong. The debate over the term has become so entangled that this gap rarely surfaces. Sean’s point is sharper: a visualisation shows what something looks like; a simulation shows how it works.

The distinction is not cosmetic. A visualisation built on a game engine renders surface geometry optimised for frame rate. It cannot tell you whether a cladding system fails under wind load, whether a crowd evacuates safely, or whether a concrete pour schedule creates thermal stress. Those answers require physical data: mass, friction, material properties and breaking behaviour under load. Game engines were built for the former; physics simulators are built for the latter. Most AEC “digital twins” are the former, presented as the latter.

NVIDIA’s Cosmos is the company’s world model: a diffusion model trained on physics, including computational fluid dynamics. Where spatial intelligence companies like World Labs are building world models that understand three-dimensional space geometrically, Cosmos understands the physical implications of that space. A VLM understands what it sees. Cosmos understands how what it sees behaves. A construction safety system built on Cosmos can assess that a worker is about to walk off a ledge, not because it recognises the image, but because it understands the physics of what follows.

The primary application of Cosmos is the generation of synthetic training data for robotics and autonomous systems. The workflow is: scan a real environment, build a digital twin, feed it into Cosmos and generate millions of domain-randomised permutations. Different weather conditions, different lighting, unexpected objects, people in unexpected places and animals. The key technical requirement is that the sensor simulation must match the exact sensors of the target machine: not just RGB cameras, but lidar and radar, because the training data needs to reflect what the machine will actually perceive in the field.

Sean used an autonomous vehicle example to make this concrete: the car drives a road once, capturing a point cloud with segmentation and classification. That road will never look identical again. So you build the digital twin as a source of truth for the permanent elements, lane markers, building positions, tree locations, and then use Cosmos to generate every possible variant of what else might be on that road. The training data that comes out reflects not just the physics of the road but the physics of the sensors.

“Construction site is the most complex environment to simulate. A factory doesn’t change. You set it up, and then it runs the same way for months. A construction site changes every day. New people, new trades, new subcontractors coming in. Not everybody’s on the same technology. People are working off paper, PDFs.”

This is the core engineering challenge NVIDIA is working through. Factories are tractable: the geometry is fixed, the sensor positions are permanent, the asset is its own revenue engine and therefore justifies the modelling investment. Construction sites have none of those properties. The same challenge applies wherever physical build-outs happen at scale: mining operations, data centre commissioning and large-scale electrification projects. Any environment that is temporarily in a state of becoming, rather than a state of being, strains the assumptions that make persistent digital twins tractable. Worse, the digital twin premise of persistent sensor placement breaks down on sites where sensors must be moved weekly because the structure is still being built.

The two practical uses Sean identifies for a physics-based construction simulation are distinct and both significant. The first is to test and compare construction and operational strategies before committing: simulate the concrete phasing this way, then that way; compare the outcomes before steel is placed. The second, which is more forward-looking, is training AI agents in the pre-built environment across construction phases. An agent trained to recognise phase A, phase B and phase C doesn’t need to encounter a new site from scratch; it already knows what the site should look like at each stage.

The token economy and the end of billing by the hour

Jensen Huang’s framing at GTC 2026, that engineers should spend AI tokens worth roughly half their annual salary to stay fully productive, is operating inside NVIDIA as a live experiment. Sean’s team member exceeded the company’s token allocation. Sean’s response: “You win the prize, dude. Jensen would be so proud of you.” The productivity logic is straightforward: if an agent is running four applications simultaneously while you sleep, and the token cost is $100 per day, the denominator for that cost is the output of what is effectively a 24-hour working day.

Dylan Patel’s data from SemiAnalysis confirms the empirical version of this from outside NVIDIA: the firm went from tens of thousands in AI spend to $7 million annualised in under a year. One economist built in three weeks what would have taken a team of 200 a year. Patel’s structural insight, as documented in Dylan Patel: The Infinite Demand for Tokens on Invest Like the Best, is that execution has become cheap; competitive advantage concentrates entirely in choosing the right ideas and pointing tokens at them.

That framing is right for the phase most teams are in right now. The immediate priority is exactly what Jensen describes: spend tokens freely, experiment at scale, discover firsthand what agents can do. The ROI measurement comes after the learning phase, not before it. Technology companies at the forefront of AI adoption are still working out how to distinguish agent activity that moves work forward from agent activity that simply runs, but for teams still discovering what agents are capable of, freely burning tokens is the correct starting point. As early adopters are finding, high token burn is a necessary but insufficient signal of output; once you’ve built genuine hands-on competency, the question of which tokens produce value becomes worth asking.

For AEC, Sean’s formulation of the consequence is blunt:

“If you can do what used to take a thousand people and nine months in a couple of days with ten people, time and materials billing no longer applies. How do you monetise that value? We have to think about a different way to price AEC services if they’re not performed by humans anymore.”

This isn’t a projection. It’s a description of a transition already underway, at least at the information-intensive end of the AEC workflow: estimating, specification writing, RFI drafting, submittal review, safety plan generation. These are execution tasks, where AI pulls together fragmented data, decisions, and outputs into a coherent system, even when the underlying systems are not standardised. The AEC firm that used to charge for the 1,000 people doing that work will need a different model for the ten people directing agents that do it.

The measurement challenge is a second-order problem. In the learning phase, the priority is accumulating hands-on experience: more experimentation, more agent runs, more discovery of what works. That is the value Jensen’s framing is pointing to. Once a team has built genuine competency, the next step is applying that learning to concentrate token spend on decisions that matter, distinguishing purposeful agent activity from agents running without clear direction and inflating usage without producing decisions. The firm that earns the long-term productivity gain is the one that commits fully to the learning phase first, then develops the judgment to refine for value.

The practical version of this surfaces in how experienced practitioners describe working at the frontier. Running several agents in parallel, the bottleneck shifts from doing the work to reviewing what the agents return, fast enough to keep them moving. The next structural step isn’t more agents; it’s an agent that predicts what other agents should do and autonomously assigns work. The human role shifts from doing to directing to orchestrating: each transition requires a different kind of attention, not less of it.

Why AEC lags manufacturing: the services-for-hire problem

The token economy argument carries an implicit business model implication that most AEC conversations skip past. If token cost replaces labour cost as the unit of execution, the firm’s cost structure changes; but so does its revenue structure. A service business that charges by the hour can’t simply absorb a 10x productivity gain without renegotiating its rates. The question of who pays for the digital infrastructure, and why, connects directly to why AEC has always lagged the industries it is most often compared to.

The manufacturing comparison comes up repeatedly in AEC conversations, and it usually frustrates practitioners because the structural differences are obvious. Sean’s version of the explanation is the cleanest articulation of the actual mechanism:

“In manufacturing, the customer owns the factory. They’re investing $5 billion in it. They want a digital twin because what happens in the factory is their source of revenue. If they can make it 10 percent better, they make 10 percent more money. The AEC industry is services for hire. If the customer isn’t paying for the digital twin, we’re not going to make it.”

It is a rational economic response to a misaligned incentive structure. The AEC firm does not own the asset. The client owns the asset and rarely funds the digital twin because it is not an engineering firm with compounding returns from simulation data. They’re buying a building. The firm that designed and built it has no incentive to invest in knowledge infrastructure that belongs to the client.

The forcing function Sean identifies is not technical persuasion. It’s a business model disruption. When ten people with agents can do what 1,000 people used to do in nine months, the business model built around selling those 1,000 hours of labour collapses regardless of whether anyone chooses to disrupt it. The disruption is automatic.

As Dustin Schafer observes, most AEC firms are either too busy or too constrained by their current business model to invest in capabilities that pay off over the long term. Construction firms are structured to manage pain points and respond to external pressure; they recruit and promote the firefighter-type manager, not the strategist. The AEC firm that wins isn’t the one that develops the best software tools; it’s the one that changes its business model most effectively to apply those tools. That transition is not a technology decision. It’s an operating model decision.

When an organisation uses AI as a technology of coordination, greater autonomy drives better coordination, and better coordination enhances autonomy, creating a self-reinforcing flywheel. Reaching this goal may initially require navigating some fragmentation. In the early stages of AI adoption, agile teams within individual departments will likely be quicker to develop their own workflows and gain institutional knowledge independently than to wait for cross-departmental coordination to create a unified rollout. In a 2025 Microsoft field experiment involving 388 employees at Gap Inc., researchers examined the impact of an AI-first collaborative workflow. The results showed that participants who were required to use this workflow produced lower-quality work and experienced greater friction than those who were allowed to create their own methods. Interestingly, participants who received mindset training that encouraged them to view AI as a thought partner rather than just a tool doubled their chances of producing high-quality documents.
The real risk lies not in starting off in separate departments, but in remaining isolated within those silos. Companies that automate processes such as estimating in one department, managing RFIs in another, and scheduling in a third can quickly develop true competency, as long as there are guardrails in place to eventually connect these silos into a cohesive system that enhances overall performance. The flywheel of success requires capabilities to be established before coordination among departments can be effectively implemented.

The knowledge-sharing culture that isn’t

What NVIDIA’s internal AI adoption reveals is less about tools and more about the operating model. The company isn’t managing AI rollout through a programme; it’s watching a culture self-organise around shared learning at a speed no formal programme could produce.

“We have recurring meetings where hundreds of people join to share what they’ve learned about how to use agents and how to be more productive. There’s all sorts of internal websites to support this. It’s just popping up everywhere.”

The spontaneity is the signal. Knowledge-sharing at NVIDIA isn’t mandated from the top as a policy; it emerges because the operating model makes it the rational choice. This is Jensen Huang’s operating model in practice. The T5T mechanism documented in The Nvidia Way (Tay Kim, 2025), where every employee sends Jensen a weekly email detailing the five things they’re working on and what they’ve observed in their markets, is an engineered solution to the information-hoarding problem. Jensen reads 100 of these a day. “Strategy isn’t what I say, it’s what they do,” he has said. “I want information from the edge.” The architecture of NVIDIA’s knowledge-sharing is not a cultural aspiration; it’s an operating system.

The attempt to replicate this in AEC firms runs into the same dynamic. A shared channel is created for AI tool learnings, explicitly designed to build an open knowledge pool. The usual behaviour is near silence. The channel isn’t ignored; it’s just empty. The same Microsoft research names the mechanism: without intentional design, people default to what the researchers call “parallel play” — each person running their own AI session independently, never building on what colleagues have learned. The shared channel is the enterprise version of parallel play. Everyone is using the tool; nobody is collaborating with it.

The HBR research on high-performing teams, documented in the What Sets Superteams Apart from the Rest episode of HBR IdeaCast, identifies proactive help-seeking and sharing work before it’s finished as the most differentiating behaviours of top-performing teams. The number-one source of meaning on those teams is being part of the team itself. In enterprise AEC, the incentive structure is the precise inverse: the person who shares a productivity technique has made themselves slightly more replaceable and their colleagues slightly more competitive. At NVIDIA, the incentive to share exceeds the incentive to hoard because the operating model explicitly rewards collective learning. That condition doesn’t exist in most AEC firms, and no amount of channel creation will install it from below.

Sean’s advice on how to start has evolved significantly over 12 months, and the shift reflects how quickly the entry cost has dropped:

“A year ago my advice was: go hire data scientists, AI developers, MLOps specialists. Now I’m saying: just start with agentic tools. Claude, Codex, Cursor. You don’t even need to know about MCP anymore because you get connectors inside Claude. Just go try that and be amazed at what is possible.”

The shift from “build a data lake with a normalised schema before you touch AI” to “point agents at your heterogeneous repositories and let them figure it out” happened in under a year. The entry cost has dropped to near zero. The residual barrier isn’t technical literacy; it’s organisational permission.

Where that permission is most readily granted, Sean finds, is at the edges: with AEC startups whose product roadmap is built entirely around one specific workflow problem. They’re use-case-focused, the ROI is measurable within the first 3 months, and the team is invested in the customer’s success in a way a large software vendor cannot be. What Sean points to, customers who started as users and became investors, is a signal that the value exchange has changed. The startup isn’t just selling software; it’s selling accelerated learning, and the organisations willing to commit to that exchange are the ones building competency faster than anyone deploying a platform rollout.

When the compressions compound into a structural shift

The arc of this conversation traces a consistent line from the technical to the organisational to the economic. The fully agentic pipeline is operational in controlled experiments; Cosmos represents a physics layer that visualisation-only digital twins cannot replicate. The token economy makes execution cheap and concentrates advantage in the quality of ideas. The services-for-hire model is the structural reason AEC lags behind manufacturing, and business-model disruption is the forcing function that will eventually override it. The knowledge-sharing culture gap among AEC professionals is not a character flaw; it’s a rational response to an incentive structure that doesn’t reward openness.

What connects these is compression. The agentic pipeline compresses design-to-documentation time. Cosmos compresses the gap between a physical environment and a trainable simulation of it. The token economy compresses the cost of execution. Each compression, taken individually, appears to be a productivity gain. Taken together, they compress the organisational model that AEC has operated on for a century.

The billing model, the staffing model, the knowledge-hoarding model: each of these was rational when execution was expensive and expertise was scarce. When execution becomes cheap and agents become capable, the only thing left to defend is the quality of the idea being executed and the trust of the person authorising the work. Those two things, insight and trust, are not compressible. Everything that surrounds them is.

The question for AEC firms isn’t whether this compression will arrive. Sean’s observation is the evidence: a year ago, people were just dipping their toes into ChatGPT and Copilot; this year, they’re deploying MCP integrations and building GenAI workflows with open-source tools. The compression is already in progress. The question is whether the organisational model is being rebuilt in parallel with the technical capability, or whether firms are installing faster engines in a vehicle whose chassis was designed for a different road.