Starting With One Site: The Tier 1 Pilot Playbook

A Tier 1 contractor in Australia runs anywhere from 50 to 200 active projects at once. Each one has its own site compound, its own concrete supplier, its own diesel cards, its own AdBlue account, its own hired plant fleet, its own subcontractor panel, and its own waste contractor. When the head office decides it's time to move carbon reporting off spreadsheets and onto an AI-agent platform, the temptation is to flip everything at the same time. We'd push back on that hard.

Flipping 200 sites in one quarter is how projects fail. Not because the technology can't handle the volume, but because every site has quirks that never show up in a demo. A concrete supplier that sends dockets as faxed PDFs. A subcontractor whose tax invoice bundles fuel, labour, and plant hire into one line. A JV partner who owns 40% of the emissions but 50% of the spend. You can't learn your way through 200 of those simultaneously.

Start with one site. Prove the agents work against the real documents. Then scale.

Why One Site First Is the Right Call

Tier 1 sustainability leads are under pressure from two directions at once. AASB S2 mandatory climate disclosures are live for Group 2 entities from financial years starting 1 July 2026, which means most Tier 1 builders are either reporting now or preparing to. NGER obligations have been running for years and the Clean Energy Regulator has been increasingly direct about data quality expectations. Both frameworks want the same thing at their core, which is activity-based emissions data with a traceable source document for every record.

The risk in a portfolio-wide switchover isn't the maths. Diesel times 2.7 kg CO2-e per litre is diesel times 2.7 kg CO2-e per litre whether you're running it on one site or two hundred. The risk is that the material diversity on a single Tier 1 project, concrete mixes, AdBlue, hired plant, subcontractor claims, waste manifests, is already complex enough to break anything that hasn't been tested against it. Multiply that by 200 sites and you're debugging across an entire portfolio while still trying to file an NGER return by 31 October.

A pilot gives you a contained sandbox. One site, one project license, one set of supplier documents, one reporting scope. The agents learn the material aliases for your specific suppliers. The team learns the workflow. By the time you scale, the patterns are baked in.

Choosing the Pilot Site

Not every site makes a good pilot. The wrong choice will either hide problems or amplify them past what's useful for learning.

Look for representative material diversity. The pilot site should touch the material types you'll see across the rest of the portfolio. Bulk diesel and fuel card fuel. Multiple concrete grades with at least one EPD available. AdBlue. Hired plant and equipment. A mix of wholly owned and subcontracted works. Temporary site power from either grid connection or generator, ideally both. Waste disposal through a licensed contractor. Freight movements for material deliveries.

Look for document volume you can actually manage. A pilot site doesn't need to be your biggest project. It needs to be one where the site team has time to engage with early extractions and flag corrections. Somewhere between 200 and 600 emission-relevant documents per quarter is usually workable. Much less and you won't stress-test the agents. Much more and the team gets swamped before the Trust Graduation agent has had time to lift auto-confirm thresholds.

Ownership structure matters. A wholly owned project is simpler from a reporting perspective. But if your portfolio includes JVs, which most Tier 1 portfolios do, running the pilot on a JV project is worth considering. It surfaces equity-share allocation logic early and tests whether your ownership percentages flow correctly through to the consolidated emissions view.

Finally, think about access. The pilot team needs to be able to get supplier documents, talk to the site engineer, and pull batching plant dockets without asking three layers of approval for every request. Pick a site where the project manager is on board with being the test case.

What the First 30 Days Look Like

The opening month is about getting the document flow connected and letting the agents see real data for the first time.

OneDrive or SharePoint folder sync typically goes live first. Site administrators are already saving supplier PDFs, batching plant dockets, and fuel card exports into project folders somewhere. Pointing Carbonly's native OneDrive and SharePoint sync at those folders means new documents get picked up automatically as they land, without anyone having to email or upload them. Per-project email ingestion is usually configured at the same time, so invoices that arrive directly from suppliers can be forwarded to a project-specific address and land in the same workflow.

Within days, the Data Health agent starts running completeness checks. It flags missing periods, duplicate invoices, and documents that parsed with low confidence. The team's job in month one isn't to hit 100% accuracy. It's to confirm or correct the early extractions so the system learns what "right" looks like for this site's specific suppliers.

Concrete dockets are usually the first interesting test. A Tier 1 pilot site might receive 20 grades of concrete across the project, N25 through N50, self-compacting, shotcrete, flowable fill, each with a different emission intensity and ideally a different EPD. Early extractions will need confirmation. Mix codes get learned against material library entries. EPDs for the suppliers you actually use get imported alongside the NGA 2025 emission factors for materials without product-specific data. By the end of month one, concrete is typically one of the most reliable categories.

Days 30 to 60: Trust Graduation Starts Doing Work

The Trust Graduation agent is the one that makes a single-site pilot pay off at scale. It tracks per-material confirmation patterns. As the team confirms or corrects extractions over time, the agent lifts the auto-confirm threshold for materials where accuracy is consistent and keeps it low for materials that still need eyes on them.

What that means in practice is that by around day 45, the agents are auto-confirming the diesel deliveries from your bulk fuel supplier because the last 30 invoices were extracted cleanly. But they're still flagging the hired plant invoices for review because the line-item formatting varies between hire companies and the team has corrected a few. Trust is not a global setting. It's earned per material, per supplier, per extraction pattern.

The Cross-Document Correlation agent also comes into its own in this window. It connects related documents that a spreadsheet workflow would treat as independent. Fuel card CSV exports reconcile against bulk diesel deliveries at the site tank. Concrete dockets cross-reference against batching plant summary statements. AdBlue deliveries match against supplier invoices. When the two sides disagree, the agent flags the variance rather than silently letting both records sit in the ledger.

This is also usually when the first subcontractor Scope 3 data starts flowing. A civil subcontractor's claim that bundles plant, fuel, and labour into a single lump-sum line is notoriously hard to break down. The pilot site is where you figure out which of your subcontractors can supply activity data (litres of diesel, tonne-kilometres of haulage) and which ones you'll need to handle with a spend-based fallback while you work on getting better data later. Both pathways are supported. Getting to activity-based coverage everywhere takes time, and a pilot is where you build that supplier-by-supplier map.

Days 60 to 90: Audit-Ready Records and Variance Narratives

By the third month, the volume of confirmed records is enough to start pressure-testing the audit trail. Every record should link back to its source document with an immutable trail of who extracted it, what factor was applied, which unit conversion was used, and when the record was last changed. This is the single biggest difference between a spreadsheet emissions ledger and a platform-based one, and it's what your AASB S2 assurance provider will ask to see.

The Narrative Intelligence agent starts producing variance commentary. Why did diesel consumption spike in March compared to February? Because earthworks moved into a new phase and two additional excavators were on site. The commentary is pulled from the data patterns themselves rather than written by hand, which means it's ready in hours rather than weeks when the reporting cycle opens.

The Report Readiness agent runs completeness checks against both NGER and AASB S2 disclosure requirements. It tells you which categories are ready to report, which have gaps, and which have data quality flags that need resolving before a submission. NGER requires AR5 global warming potentials. AASB S2 requires AR6. The platform switches between the two automatically depending on which output you're generating, so the same underlying activity data produces the right answer for each framework without a recalculation exercise.

By day 90, a well-run pilot site should have audit-ready records for Scope 1 fuel combustion, Scope 2 electricity, Scope 3 purchased goods (with EPDs for concrete and steel where available), Scope 3 waste, and Scope 3 freight. Subcontractor Scope 3 will typically be a mix of activity-based and spend-based by this point, which is honest and normal.

Construction Edge Cases the Pilot Should Surface

A pilot that doesn't hit these cases isn't doing its job. Make sure the site choice exposes them.

Concrete mix variations. A 32 MPa mix with 30% fly ash replacement has a materially different embodied carbon profile to the same strength mix without SCM. If the supplier has an EPD, it should be imported and used. If not, the fallback is a grade-based estimate from the material library. The pilot is where you figure out which suppliers have EPDs and which don't.

AdBlue and DEF. It's easy to miss because volumes are small and it's not technically fuel. But it's part of diesel engine operation and should be tracked. Supplier invoices typically arrive separately from diesel.

Hired equipment fuel. The fuel consumed by hired excavators, cranes, generators, and compactors is yours to account for under operational control, but the invoices come from the hire company and often bundle mobilisation, hire rate, and fuel charges into one document. The extraction needs to pull the fuel component out specifically.

Subcontractor invoices with embedded Scope 3. A concreter's claim includes their own fuel, their own plant, and the concrete itself. Which parts are already captured elsewhere in your ledger and which parts need to be added as Scope 3 is a boundary question every Tier 1 wrestles with. The pilot is where you draw the lines and document them.

Hired generator fuel on sites without grid power. Common on early earthworks phases. Often invoiced as a combined hire-plus-fuel package.

Waste disposal tickets. Mixed construction waste, clean fill, contaminated soil, scrap steel. Each has a different emission profile and often a different disposal pathway. Waste tickets tend to arrive as weighbridge dockets rather than tax invoices, which means extraction has to handle a different document shape.

Carbonly Is Currently Running Exactly This Pattern With a Tier 1 Contractor

We're working with a Tier 1 Australian construction company on this approach right now. One site first, real documents, real edge cases, proving out the AI before expanding across the portfolio. That's the engagement and that's as much as we'll say about it.

Deciding When to Scale

A pilot ends when the signals line up, not when the calendar says so. The ones worth watching:

Audit-ready records are accumulating across every major scope category for the site, linked to source documents, with an immutable trail.

Trust Graduation thresholds have lifted for the materials that dominate site emissions. Diesel, electricity, concrete, steel, and waste are typically the first to graduate. Subcontractor Scope 3 and hired plant often stay under manual review longer, which is fine.

Variance narratives from the Narrative Intelligence agent match what the site team would have written. If the commentary says "diesel up 18% due to additional earthworks plant in the period" and that's what actually happened, the agent is doing its job.

Folder sync is running without intervention. Documents are landing, getting extracted, getting matched to materials, and surfacing for review without anyone manually pulling files. This is the single biggest indicator that the pattern is ready to scale, because it means the clerical work has genuinely moved off the sustainability team.

When those four signals are stable, scaling to the next five or ten sites in a second wave is a different exercise to scaling to them cold. The patterns the agents learned on the pilot site (concrete supplier aliases, hire company invoice layouts, subcontractor claim structures) carry over. The team has a workflow that works. The audit trail approach is documented.

Pilot Logistics

A single-site pilot sits at one project license in Carbonly's pricing model, typically a Small or Medium tier depending on document volume and material diversity. Pricing is per-project, by size, with a $100 per month minimum. Carbonly allocates the tier based on the actual profile of the site, not a self-selected dropdown.

If you're a Tier 1 sustainability lead or ESG manager considering this approach, the next step is a conversation about which site in your portfolio makes the best pilot candidate. We'd rather talk through it with you than write a generic checklist. Email hello@carbonly.ai or join the waitlist and we'll get a call in the diary.