The data that was never created

Audio article narrated by OpenAI

Annie Liang came to construction from a discipline built entirely around accumulation: portfolio advisory at UBS, where the output was always an evidence-based position compounded over time, never a single trade. That distance from practice sharpened her diagnosis. The barrier to AI in construction isn’t culture or conservatism; it’s the absence of something more fundamental. The expert-verified data that would make AI useful in construction was never deliberately created, and what wasn’t captured in the moment cannot be recovered retroactively. What follows is a conversation about data as a strategic asset, why the AEC industry has been running in trading mode for decades, and what it would actually take to start building the portfolio.

—

About Annie Liang
Annie Liang is co-founder and CEO of Billie Onsite, a platform focused on multimodal on-site data capture for the AEC industry. Before founding Billie Onsite, she worked at UBS in Active Portfolio Advisory, managing portfolios of $10M to $300M USD across family offices and corporate clients. That background in structured, evidence-based portfolio thinking shapes how she approaches both product development and competitive strategy in construction tech.

—

“Our thesis has always been: there’s a lot of data from existing projects, in documents and emails, but it wasn’t captured, processed, and verified by experts at the time with enough quality to train AI.” Annie Liang

The conversation around AI in AEC has settled into a familiar rhythm: adoption is slow, the industry is conservative and culture is the obstacle. Annie pushes past that explanation quickly. The real constraint isn’t resistance; it’s the absence of something more fundamental. The data needed to make AI useful in construction has never been created to sufficient quality or with sufficient expert verification. It isn’t locked away somewhere, waiting to be unlocked. It largely doesn’t exist in a usable form.

That diagnosis reframes the entire challenge; it shapes everything Billie Onsite does in the market.

When portfolio thinking meets product strategy

Annie’s first job out of university was at UBS, on the Active Portfolio Advisory team. The work involved designing long-term portfolio strategies for wealthy clients, making disciplined asset allocation decisions across global equities, bonds, and alternatives. The output was always an opinionated, tested, evidence-based view: not a hot take.

She describes two distinct modes of serving clients in investment: the trading-oriented mode and the portfolio advisory mode. In the first, an advisor calls the client with a tactical idea. They execute a trade. The success rate is roughly 50 percent. Accountability is diffuse: “I gave you advice; you made the decision.”

Portfolio advisory works differently. You build something over time, with a clear view of risk-adjusted return, tested across similar clients. The output is consistent, repeatable, and verifiably valuable.

“The portfolio advisory model is comparable to a product mindset. You’re building something longer term, putting together a very opinionated piece of work, invested in your opinion about the industry.”

The parallel to AEC tech is direct. Consultancy in construction tech resembles the trading mode: bespoke solutions, ad hoc delivery, variable accountability. Product development resembles portfolio advisory: standardised, repeatable, with a demonstrable view on return. The question Annie raises, and doesn’t leave unanswered, is whether AI’s effect on software development blurs this distinction enough to collapse the product model into consultancy again.

Her answer is that productised thinking isn’t dead; it just needs to become more vertical. The product world isn’t disappearing; it’s specialising.

Procore as infrastructure, not destination

The verticalisation argument deserves more precision than it usually gets. Annie uses defect management as a concrete case study: Procore includes a defect tool that site managers primarily use to log observations. But the stakeholder chain around defect data is considerably longer.

A consultant compiling a defect report wants comprehensive technical details, photographic evidence, and references to specifications. A contractor receiving that report has to intellectualise every item: tracing each back to drawings, building codes, and contractual obligations, then potentially raising RFIs. That layer of technical processing was handled by QA teams and technical directors, not by site teams or a generalised defect tool.

“Products will get more and more verticalised to specific niches. There’s a whole chain of stakeholders involved at different project stages, a consultant wants one thing, a contractor receiving that report needs something entirely different.”

The implication is that platforms like Procore are becoming more like infrastructure: foundational data environments that specialist tools are built on top of, rather than comprehensive end-to-end solutions. The analogy to foundational model providers is instructive. Just as developers build specialised applications on top of an API without expecting the API provider to serve every use case, construction startups are building vertical products on top of platforms that provide the data substrate.

The risk Annie identifies is that startups drifting into this vertical space without a productised foundation become, effectively, consultancies: custom-building whatever the client wants in the moment. For startups to survive this era, there still needs to be a layer of productised thinking to fall back on. Without it, the product becomes the founder, and that’s not a scalable position.

Where the AI data problem begins

The dominant narrative around AEC and AI focuses on adoption speed: the industry is slow, culture is conservative and change management is hard. Annie’s analysis goes to a different layer. The adoption challenge is downstream of a more fundamental structural condition.

“People don’t focus on data because that’s not the core output. The only times data management becomes necessary are from three factors: compliance requirements, safety regulations, and quality control.”

Construction firms are organised around delivering physical buildings. The complexity of coordinating a single project across dozens of subcontractors and managing distributed risk already consumes most organisations’ operating capacity. Data management becomes a priority only when external pressure forces it: regulatory compliance documentation, safety reporting requirements, and defect tracking linked to contractual liability. BIM made significant progress as a common data environment, but it succeeded mainly where those external pressures were strongest.

The consequence for AI is structural. Training a useful construction AI model requires expert-verified, high-quality data about real project decisions, observations, and outcomes. That data was never deliberately created; the gap compounds across every project, every year, every retirement.

Matt Goldsberry at HDR put it precisely: “No firm has all project data in a single data lake. Each new project starts from the same baseline rather than building on past work.” A firm that completes 50 projects should find project 51 dramatically easier. Instead, it’s almost as hard as project one. Each project generates knowledge; almost none of that knowledge transfers. The industry has been running its data strategy in trading mode: project-to-project, no accumulation, no compounding.

Sarah Buchner, founder of TrunkTools, quantifies what that means at the individual level: “99 percent of the data we’re getting has zero IP in it… and 40 percent of our workforce goes out of the window in the next five years.” The expert who understood why a particular structural decision was made on the 2021 office development has moved on. Their calibrated judgment, built from fifteen years of project failures, design decisions, and field corrections, walked out the door with them. Reconstructing the reasoning behind each decision retroactively is a level of manual effort no organisation will actually undertake.

Billie Onsite’s thesis is that the only viable path to this data is to create it in real time. The name isn’t coincidental: Billie stands for “Build-It-Like-Learning-Is-Endless,” a direct expression of the company’s philosophy of knowledge accumulation. The implication is structural: quality data cannot be retrieved retroactively; it can only be created in the moment, embedded in the workflow that generates the decisions themselves.

The mechanism is specific: when an on-site expert captures an observation, the platform’s AI processes the input and produces a structured output. When the expert corrects that output, those corrections are the quality data. The expert verification isn’t an additional task; it’s embedded in the normal flow of reviewing what the AI has produced. The data accumulates as a byproduct of doing the actual work.

“If you ask a construction expert to deliberately go label data and contribute to the company knowledge base, it’s unnatural. But if you design a workflow that fits seamlessly into what’s driving project outcomes right now, that verification feedback is the quality data you need for your AI future.”

This reframes what “AI adoption” means in construction. The question isn’t whether to use AI on projects; it’s whether to treat the data generated during AI use as a strategic asset.

The stakes aren’t symmetrical. Andrew Stevens, founder of Sakura Sky and an advisor who has spent two decades building data-driven products, put it plainly: “AI is a great amplifier. If you’re really good at what you do, AI will make it better. If you’re not so good at what you do, AI is really going to expose that.” The construction firms with strong data foundations will compound. The ones without will find that every AI tool they adopt makes the gap more visible, not less.

Stevens’s second observation cuts deeper: “Software comes and goes, but it’s data that persists.” The organisations that deliberately start building that foundation now are creating something that becomes harder for competitors to replicate with each passing project. The organisations that don’t are starting ‘project 51’ from the same baseline as ‘project one’.

The convergence that tech alone can’t deliver

The broader AI industry tends to treat data as a procurement problem: hire experts, label datasets, train models. Annie’s assessment is that this approach hits a ceiling in construction because the valuable data is inseparable from the context in which it was generated.

Tech companies can hire construction experts to label synthetic data. They cannot replicate the contextual richness of an expert making a real-time decision on a live project, under actual constraints, with direct accountability for the outcome. That signal quality is structurally unavailable to a tech company working in isolation.

Skanska’s experience confirms this from the industry side. Mike Zeppieri, Skanska’s VP of Data and Technology, described how Skanska made a deliberate investment in data strategy before the AI wave: it set up data lakes, established nomenclature standards, and had hard conversations around simple things like naming. When generative AI arrived, Skanska could treat it as an evolution of its data journey rather than a disruption to manage. Companies without that foundation are still having the nomenclature conversations, years behind.

The convergence will eventually come from both directions. Tech providers need real-world expert data at scale; they can’t get it without industry cooperation. And the industry, at some point, will recognise that its accumulated project knowledge is a competitive asset worth protecting.

“The industry will turn around and say: there are foundational models doing something for our industry, but as a multinational player we also have proprietary data we don’t want to share. Building faster, higher quality, more compliant, less rework, that’s our competitive advantage. Why share it with competitors?”

Startups occupy a specific position in this convergence. They’re neither tech giants nor established industry players; they’re the connective tissue. Their role isn’t simply to build tools, but to bring the right workflow to organisations that haven’t yet thought deliberately about data capture, and to design that workflow so the data stays proprietary to the organisation that generates it.

The 300-site detour that became a weapon

The go-to-market challenge in construction tech is one of the most consistent patterns among founders building in this space: the people who need the solution are not the people who buy it. Site managers, engineers, and field workers experience the pain directly; procurement decisions get made by project directors, innovation teams, and finance.

Billie Onsite didn’t sidestep this tension; it worked through it in a specific sequence.

In the first year of product development, Annie and her team went directly to end users: more than 300 construction sites. The goal wasn’t to sell. It was to build end-user intelligence about how capturing actually works in the field.

“Capturing is not about filling in a form. You will never fill out a form in the moment — you wait until end of day or end of week. That’s why you lose data. Capturing is in the moment, when the expert thinks something is important — just capture it then.”

That insight, accumulated across hundreds of conversations with site managers, shaped the product’s core capturing capability. It also created something that proved valuable in a completely different context: evidence.

When the product was mature enough to pitch at the enterprise level, the team faced the standard construction tech challenge: no access to the decision-makers, no proof points, no enterprise references. But they did have data points from end users across the industry. They could walk into a conversation with a project director and, with specificity, say what field workers wanted and why existing form-filling tools were failing them.

The end-user intelligence gathered outside procurement conversations became the sales argument inside them. Senior management isn’t primarily interested in usability; it cares about data quality, reporting speed, compliance documentation, and rework reduction. But demonstrating that those outcomes are blocked by how data is currently captured in the field, with specific evidence from those who do the work, reframes the conversation from product evaluation to operational problem-solving.

The value proposition that landed at the management level was direct: embrace unstructured data on site, let AI structure it, and management receives clean data in the format it needs. The insight into in-the-moment capture, gathered from 300 site visits, made that pitch credible rather than theoretical.

Annie’s 300-site research phase itself demonstrates the principle she would later build into the product. She captured expert knowledge in the moment, directly from practitioners in the field, rather than reconstructing it retroactively from documents or surveys. That intelligence became defensible product knowledge. The method was the message.

Why the first drop of intelligence still has to be human

One thread in the conversation that doesn’t fit neatly into the AI-in-AEC narrative, but matters for how Billie Onsite is built, concerns technical debt and the limits of AI-assisted development.

Annie’s framing is precise: AI responds to a human prompt. The smallest unit of software development still begins with a human providing the first drop of intelligence. If the person providing that prompt lacks technical expertise, the AI’s choices about libraries, architecture, and scalability are never verified against an experienced foundation. The product works until it doesn’t; when iteration compounds the gaps, AI patches them, and the structural instability accumulates.

“It’s quite dangerous if no technical expertise was involved in iterating a product. There always has to be a human foundation, and everything gets verified against that layer.”

This is not a new tension in software. Grady Booch, one of the original architects of object-oriented design and UML, argues that every major shift in software productivity, from machine code to assembly language, from compilers to high-level languages, followed the same pattern: each new layer raises the level of abstraction, removing tedium from work that is already understood, while the locus of judgment moves upward. “The fundamentals are not going to go away,” Booch argues. “The tools we apply will change.” What AI automates well is patterns it has been trained on, repeated many times. What it cannot substitute is systems-level judgment: the decisions about how components connect, where things break at scale, and what consequences cascade when the architecture is wrong from the first iteration. Annie’s “first drop of intelligence” is precisely that judgment.

The counterbalance is that AI has genuinely absorbed one of the most time-consuming costs of rigorous development: documentation. Billie Onsite does documentation 100 per cent live, and uses AI to make it visual rather than purely textual, turning technical documentation into diagrams and interactive representations that people can read without parsing prose. The boring, high-discipline work that previously created its own form of technical debt now happens continuously, with minimal overhead.

The principle isn’t that AI replaces the need for technical rigour; it’s that AI handles the parts of rigour that require repetition and consistency, freeing the human layer to focus on decisions that require judgement. The same logic that applies to project data, the expert provides the first drop, AI processes and structures, the expert verifies, applies to how the product itself is built.

When the data becomes the building

The structural insight that connects Annie’s investment background, Billie Onsite’s product thesis, and the broader AI moment in construction is this: the AEC industry has never treated information as a primary output. Its entire operating model is organised around delivering a physical asset, and data management has always been a secondary obligation, driven by compliance and quality pressures rather than strategic intent.

The AI era doesn’t change the industry’s primary output. Buildings still get built. But it creates, for the first time, a mechanism to capture, at the moment it’s generated, the expert knowledge embedded in every site decision, every defect observation, and every RFI resolution: verified, structured, and retained as something the organisation owns.

The organisations that recognise this shift early and design their workflows around it aren’t just adopting AI. They’re creating the proprietary foundation that makes their AI more capable than anyone else’s. The portfolio, in this context, isn’t a collection of tools; it’s the accumulated expert knowledge of every project they’ve ever built.

That’s the convergence Annie is positioning for: not AI adoption as a feature decision, but data capture as the long-term asset strategy of the construction firm itself. The firms that understand this now are creating a proprietary knowledge base that compounds with each completed project. The firms that don’t are starting project 51 from the same baseline as project one, and will keep doing so until someone else’s accumulated knowledge makes theirs irrelevant.