You Don't Have a Carbon Problem. You Have a Document...

A GHG Protocol survey found that 83% of companies say their biggest barrier to emissions disclosure is difficulty accessing relevant data. Not difficulty calculating emissions. Not confusion about which framework to use. Accessing the data.

That tracks with everything we've seen building carbon accounting systems. The calculation part of carbon reporting is, frankly, boring arithmetic. Diesel litres multiplied by 2.71 kg CO2-e per litre. Electricity kWh multiplied by your state's grid factor. Gas gigajoules multiplied by 51.53 kg CO2-e per GJ. The NGA Factors workbook publishes every number you need, and NGER prescribes the methods down to which measurement determination division applies.

The hard part is getting those litres, kilowatt-hours, and gigajoules out of 8,000 documents sitting in six different systems.

The math is easy. The documents are not.

Consider a mid-to-large construction contractor operating across 20 active projects in three states. Every quarter, their sustainability team needs to collect and process fuel dockets from 15 different diesel suppliers (each with their own invoice layout), electricity bills from sites that switch between temporary generators and grid connections, subcontractor invoices that bundle materials and labour into a single line, fuel card statements with 3,000 individual transactions in CSV, delivery receipts photographed on a site manager's phone and sitting in a WhatsApp group, purchase orders with quantity but no unit (or unit but no quantity), and multi-page supplier statements covering 15 different material grades across 8 delivery dates.

That's north of 8,000 documents per quarter. Each document contains somewhere between 1 and 50 emission-relevant data points buried in layouts the sustainability team has never seen before.

And here's the thing that makes this worse than regular document processing: you can't just extract totals. An invoice that says "materials and labour - $180,000" is useless for carbon accounting. You need the breakdown. How many cubic metres of concrete? What grade? How many litres of diesel? Was that AdBlue or regular unleaded? The level of granularity carbon reporting demands is higher than what most procurement or finance systems capture.

PwC's 2025 Global Sustainability Reporting Survey found that 90% of organisations still rely on spreadsheet-based sustainability data collection. Ninety percent. And when asked what would have most improved their reporting, 47% said more effective use of technology and 46% said earlier confirmation of data availability. Not better emission factors. Not clearer frameworks. Better data plumbing.

Six sub-problems hiding inside one

When someone says "carbon data collection is hard," they're usually mashing together at least six distinct problems. It's worth pulling them apart because they each need different solutions.

Format chaos. PDFs from energy retailers. CSVs from fuel card providers. Excel spreadsheets from project administrators (each with their own column layout, naturally). Word documents with consumption data in tables. Scanned paper from subcontractors who still fax things. Photos of delivery dockets taken at 6am on a building site. Every format requires different processing logic to get data out.

Zero standardisation across suppliers. Your electricity retailer puts consumption on page 3, column 2. A different retailer puts it on page 1 in the summary box. A third puts kWh in one place and MWh in another. Australian energy retailers each use different bill layouts, and no two are the same. This is why rigid template-based document processing breaks constantly - every new supplier or format change means manual reconfiguration.

Missing and incomplete fields. A fuel docket with litres but no date. A delivery receipt with a material description ("20MPa concrete") but no volume. A purchase order that lists 8 items with quantities and 3 items with just dollar values. The data you need for an emission calculation is rarely all present in the same document. Sometimes you're stitching together a purchase order, a delivery receipt, and an invoice to reconstruct a single activity data point.

Scattered sources. One project manager emails bills directly. Another saves them to a shared OneDrive folder. A third uploads them to a project management platform. Fuel card data lives in a supplier portal. Waste manifests come from a different supplier portal. Some documents are still physical paper sitting in a site office 400 kilometres away. Before you can process anything, you have to find it.

Volume. The ANAO examined 545 NGER reports and found that 72% contained errors, with 17% containing significant errors. That audit is from earlier NGER reporting years, but the root cause hasn't changed: when humans manually transcribe thousands of data points from documents into spreadsheets, mistakes are inevitable. At 8,000+ documents per quarter, even a 2% error rate means 160 documents with wrong numbers flowing into your NGER submission.

Timing. Documents don't arrive in neat batches at quarter-end. They trickle in. A January electricity bill might arrive in February, get scanned in March, and sit in someone's inbox until the sustainability team chases it in April. By then, the billing period overlaps with the next quarter. Some invoices are estimates that get corrected later. Some are final bills for sites that closed months ago. Matching documents to reporting periods is a problem in itself.

Why most carbon software solves the wrong half

We've looked at a lot of carbon accounting platforms over the years - before building our own and while keeping an eye on the market. The pattern is striking: most of them are really good at the part that's already easy.

Beautiful dashboards showing your emissions by scope. Charts comparing this quarter to last. Target-tracking against your SBTi commitment. PDF report generation aligned to GRI or TCFD formatting. Some even have scenario modelling.

All of that is useful. None of it helps you get 3,000 rows of fuel card transactions, 200 electricity bills, and 47 scanned delivery dockets into the system in the first place.

The typical onboarding workflow for most platforms looks like this: log in, navigate to a data entry screen, manually type in your electricity consumption for each site and billing period. Or, if you're lucky, download a CSV template, manually populate it from your bills, and upload it.

You've shifted the data entry from one spreadsheet to another. The document problem remains untouched.

This is why sustainability teams still spend the majority of their reporting time on data entry rather than analysis. We've broken down where the 400 hours in a typical NGER report actually go - and 65% of it is document handling, not carbon accounting. It's why the NGER deadline on 31 October generates so much stress every year - not because the NGER calculation methodology is confusing (it's well documented in the Measurement Determination), but because nobody can find the June gas bill for the Brisbane warehouse.

What a document-first approach actually means

When we built Carbonly, we started from the document problem and worked backwards to the calculations. Not the other way around.

That sounds obvious. But it meant making design decisions that most carbon accounting platforms don't make. The system had to accept any file format a sustainability team might encounter - PDF invoices, Excel exports, CSV fuel card dumps, Word documents, scanned images, even photos. Eight file formats, because that's what the real world throws at you.

It also meant the extraction engine couldn't rely on rigid templates. We tried that approach early on and abandoned it within weeks. A template that works for one energy retailer's invoice breaks when they update their layout. A template built for one fuel card provider's CSV doesn't work for the next one. Every new supplier or format change required manual template reconfiguration - which defeats the purpose.

Instead, the system reads documents the way a human would: understanding the layout, identifying what the document is (fuel docket vs electricity bill vs supplier statement), finding the relevant numbers, and knowing which ones matter for emissions. When a multi-page supplier statement lists 15 material grades across 8 delivery dates, it needs to extract each line, not just the summary total.

But extraction is only half the document problem. The other half is matching.

The matching problem nobody talks about

Say the system correctly extracts "32.5N GP cement - 45 m3" from a delivery docket. Now what?

That material description needs to map to an emission factor. But the NGA Factors workbook doesn't have an entry for "32.5N GP cement." It has categories for "Portland cement" and "blended cement" with different factors depending on clinker content. A fuel docket might say "B20 diesel" - that's a biodiesel blend, and the emission factor calculation is different from regular automotive diesel because you need to account for the biogenic fraction.

This is where carbon reporting data extraction gets genuinely tricky. The material descriptions on invoices and delivery dockets don't use the same language as emission factor databases. "Unleaded petrol," "ULP," "regular 91," and "E10" might all appear on different suppliers' documents, and they map to different factors depending on the ethanol content.

We built a 5-tier matching system for this. The system tries exact matches first, then normalised aliases, then category-level matches, then AI-assisted matching, and finally flags items it can't confidently match for human review. And when someone corrects a match - telling the system that "B20" from a particular supplier means 20% biodiesel blend - it remembers that correction for every future document from that supplier.

That learning loop matters at scale. A construction contractor working with 50+ material types across 15 suppliers shouldn't need to manually classify the same material twice. The first correction should be the last one.

We're honest about the limits here, though. Some materials are genuinely ambiguous. A delivery docket that says "aggregate - 12 tonnes" could be several different products with different emission factors depending on source and transport distance. The system flags these rather than guessing. Scope 3 embodied carbon for construction materials remains one of the hardest matching problems in the industry, and we won't pretend it's fully solved.

Provenance: why the audit trail starts at the document

Here's something that AASB S2 and NGER compliance both demand that most people underestimate: traceability.

Under NGER, corporations must keep records for 5 years from the end of each reporting year, in a format that can be easily accessed by external auditors. Under AASB S2, your climate disclosures face assurance - limited assurance from year one, progressing to reasonable assurance over subsequent years. The auditor isn't just checking your total. They're checking that you can trace every number back to a source.

The Clean Energy Regulator made this painfully clear in July 2025 when it accepted an enforceable undertaking from an ASX-listed energy producer. The company had "inadvertently misstated components of its NGER reports" across multiple periods. The regulator's statement was pointed: while the NGER Act doesn't explicitly require internal control systems, their absence "can lead to persistent and significant reporting inaccuracies."

The fix? Three years of mandatory reasonable assurance audits at the company's own cost, plus an external consultant to rebuild their data collection and control systems from scratch. That's hundreds of thousands of dollars in remediation costs, all because the chain from source document to reported number had gaps.

When your carbon accounting system is document-first, every emission record carries its provenance with it. This electricity bill, this page, this line, this consumption figure, multiplied by this emission factor (NGA Factors 2025, Table 1), equals this Scope 2 figure. When an auditor asks "where did this number come from?" the answer is one click, not a three-day scavenger hunt through shared drives and email attachments.

The volume test

Processing documents accurately is one thing. Processing thousands of documents accurately is a different thing entirely.

A mid-market company with 30 facilities generates roughly 360-720 utility documents per quarter (electricity, gas, water, waste across all sites). A large construction contractor with 20+ active projects - including fuel dockets, material deliveries, equipment hire, and subcontractor invoices - can generate 8,000+ documents per quarter. We've had prospects describe scenarios north of 10,000 documents in a single quarter.

At that volume, manual processing isn't slow. It's impossible. A skilled analyst can process roughly 15-20 utility bills per hour - finding the consumption figure, checking the billing period, verifying the unit, entering it into a spreadsheet. At 8,000 documents per quarter, that's 400-530 hours of data entry. Per quarter. That's 2.5 to 3.3 full-time staff just typing numbers.

And those numbers assume no errors, no re-work, and no time spent chasing missing documents or reconciling discrepancies. In practice, the real cost of manual carbon data entry is closer to double the raw entry time once you account for quality checking, error correction, and document management overhead.

A document-first system processes the same 8,000 documents in hours instead of weeks. Not because it's magic - because it's doing the same thing a human does (read the document, find the numbers, match them to emission factors), just faster and without getting tired at document 4,000.

What this means for your NGER and ASRS preparation

If you're a Group 2 ASRS entity facing your first mandatory reporting period from 1 July 2026, Australian Treasury has estimated preparation costs between $750,000 and $1.6 million. A significant chunk of that cost is data infrastructure - figuring out where your emission-relevant data lives, how to collect it, and how to get it into a system that produces auditable numbers.

If you're an existing NGER reporter, you already know the October deadline stress. The 2025-26 reporting year introduces new legislative amendments including market-based reporting for biomethane and hydrogen consumption, plus updated emission factors for gas flaring. These aren't hard to implement if your data is clean. They're a nightmare if you're still reconciling last quarter's fuel dockets.

The point isn't that carbon accounting is easy. Scope 3 is genuinely difficult - we won't pretend otherwise. Supplier data collection involves chasing third parties who may not track their own emissions. Scope 3 Category 1 (purchased goods) often relies on spend-based estimates with 30-40% uncertainty.

But Scope 1 and 2? The scopes that NGER mandates and that AASB S2 requires with no safe harbour protection? Those are document problems dressed up as carbon problems. Fix the document problem and the carbon calculations fall out the other end.

Stop looking at your emissions dashboard and wondering why the numbers aren't right. Start looking at the 4,000 documents sitting in your shared drive, your email, your fuel card portal, and that filing cabinet in the site office. That's where the problem is. That's where the fix starts.

Related Reading:

How Our AI Pipeline Actually Processes Utility Bills - the 7-phase extraction process for getting data out of messy documents
File Hashing Stops Double-Counting in Carbon Accounting - why duplicate detection matters at scale
Spreadsheets vs Carbon Accounting Software: The Real Costs - a three-year cost comparison including hidden data entry time
ASRS Assurance Requirements: What Auditors Actually Check - why traceability from source document to emission record matters under ASSA 5010