Your AI Should Earn the Right to Touch Your Carbon Data

Every sustainability manager we've spoken to in the past twelve months has said some version of the same thing: "I want AI to help with carbon reporting, but I can't afford to get a number wrong."

That fear isn't irrational. Under Section 19 of the NGER Act, a non-compliant report attracts civil penalties of up to 2,000 penalty units - that's $660,000 at the current Commonwealth rate. AASB S2 disclosures don't sit in a sustainability report that nobody reads. They go in your annual report, right next to the financial statements, subject to the same director liability provisions. Scope 1 and 2 emissions carry full legal exposure from day one - no safe harbour, no modified liability protection.

So the instinct to keep AI away from compliance data makes sense. The problem is that the alternative - manual processing - isn't actually safer.

The Manual Fallacy

The ANAO's performance audit of the NGER scheme found that 72% of 545 reports examined contained errors. Seventeen percent had significant errors. These weren't companies using AI. They were using spreadsheets, manual data entry, and - in many cases - consultants.

The Clean Energy Regulator's enforceable undertaking against a major energy producer in July 2025 made this concrete. The company inadvertently misstated components of its NGER reports across multiple reporting periods. The regulator's assessment: potential weaknesses in internal control systems. The remedy: three years of reasonable assurance audits at the company's own cost, plus an external consultant to rebuild their data control systems entirely.

They didn't manipulate anything. They just had bad processes. And their NGER data was wrong for years before anyone caught it.

So here's the actual question. It's not whether AI might get a number wrong. It's whether AI, with the right controls, gets numbers wrong less often than the process you're using now. For most organisations processing hundreds or thousands of source documents per year, the answer is yes - but only if you don't skip the trust-building part.

AI Shouldn't Start With the Keys

The mistake most people imagine when they think about AI in carbon accounting is an AI that reads a fuel docket, decides it's diesel, applies an emission factor, creates a Scope 1 record, and commits it to your NGER submission. All by itself. No human involved.

That would be reckless. And it's not how we built Carbonly.

We think about AI trust the way you'd think about a new hire in your sustainability team. You don't hand a graduate analyst the NGER submission on their first day and say "handle it." You let them work alongside someone experienced. You review their output. You correct their mistakes. Over time, as they prove they understand the NGA emission factors, the facility boundaries, the scope classifications, you give them more responsibility. This is the agentic AI workflow approach - specialised agents handling discrete tasks with human oversight at decision points.

AI should earn trust the same way. Through demonstrated accuracy, over time, with a human watching.

We built three distinct phases into Carbonly's automation. Each one increases the level of AI autonomy - but only when the data proves it's earned. And every phase keeps a human in a position of control.

Phase One: AI Watches and Suggests

In the first phase, AI processes every document that enters the system. It reads the invoice, identifies the material type, matches it to an emission factor from the NGA Factors workbook, calculates the emissions, and generates a complete suggestion.

But it changes nothing.

The sustainability manager does their normal work - classifying documents, entering data, applying emission factors manually. Alongside their work, the system shows what the AI would have done. "For this document, I would have classified the material as diesel, matched it to the NGA emission factor of 2.71 kg CO2-e per litre, and created a Scope 1 record of 8,230 kg CO2-e for this facility."

This is where trust begins. Not with a sales pitch. With evidence.

Over weeks, the sustainability manager compares their own work against the AI's suggestions. They notice patterns. The AI gets the material classification right on 97% of documents. It correctly identifies the consumption quantity. It applies the right state-based electricity factor - 0.78 for Victoria, 0.64 for NSW, 0.20 for Tasmania. It catches that a gas bill is in megajoules, not gigajoules.

And occasionally, they catch the AI getting something wrong. A line item that bundles two materials together. A supplier invoice format it hasn't seen before. An unusual unit of measurement. Those edge cases matter. They're what separates a system you can trust from one you can't.

The whole point of this phase is that nobody's compliance data is at risk. The AI is practising. The human is evaluating. Both are learning. We're not sure exactly how long most organisations will want to stay in this phase - it depends on volume and complexity. Some move on in three weeks. Some take three months. Both are fine.

Phase Two: AI Does the Work, Humans Approve

Once the AI has demonstrated consistent accuracy - and the sustainability manager has seen enough evidence to believe it - the system moves to a second phase.

Now, AI processes documents and creates emission records automatically. But every record lands in a draft state. Nothing is confirmed until a human reviews it and clicks approve.

This is where the real time savings begin. Instead of opening each PDF, finding the consumption figure, looking up the emission factor, typing numbers into fields, and double-checking the scope classification - the sustainability manager sees a queue of pre-populated records. Each one shows the source document, the extracted data, the matched emission factor, the calculated emissions, and a confidence score.

High-confidence records (the ones where the AI has seen this exact document format dozens of times) take two seconds to approve. A quick glance at the numbers, confirm, move on.

Lower-confidence records get more scrutiny. Maybe the AI found two possible emission factors. Maybe the consumption figure was ambiguous. Maybe the document format was new. The system flags these explicitly: "Confidence 78% - consumption quantity unclear, two possible values found on page 2."

The human decides. Always.

AI also investigates anomalies during this phase. It spots a consumption figure that's three times higher than the same facility's previous quarter. It identifies what looks like a duplicate invoice - same supplier, same date, same amount, different file names. It flags a facility that hasn't submitted any electricity data for the current period. For each anomaly, it presents the evidence and suggests a resolution. "This appears to be a duplicate of Invoice #4471 processed on 12 March. Recommend deletion."

The sustainability manager reviews the evidence and approves or rejects the suggestion. The system tracks everything: out of 150 AI suggestions this month, 148 were approved without modification. Two were corrected. That's a 98.7% approval rate - and that number becomes part of the audit trail.

This matters more than it sounds. When the assurance practitioner asks how your emissions data was generated, you don't wave your hands. You show them a documented approval rate across hundreds of records.

Phase Three: AI Acts Within Boundaries

For organisations that have run the second phase long enough to trust the numbers - and we mean actually trust, based on tracked accuracy, not based on hope - there's a third phase.

In this phase, AI auto-confirms emission records that meet a high-confidence threshold. The threshold is configurable. Most organisations we've spoken to set it at 95% or above. Below that threshold, records still go to the human review queue.

The result looks something like this in a weekly summary: "AI processed 47 documents this week. Created 142 emission records. 139 were auto-confirmed (all above 95% confidence). 3 require your review. Resolved 12 anomalies - 11 auto-resolved below your materiality threshold, 1 needs your decision."

The sustainability manager spends 15 minutes reviewing exceptions. Not 15 hours processing everything.

But here's where the guardrails are critical. And we want to be specific about what never gets automated, regardless of confidence score.

Compliance-critical actions stay manual. If the AI detects a gap in a mandatory NGER reporting field - a missing facility identifier, an unassigned scope classification, an emission factor that doesn't match any NGA category - it doesn't guess. It stops and asks. Emission factor mismatches between what the document suggests and what the system expects always get flagged for human review. Scope misclassifications - something that should be Scope 1 appearing as Scope 2, or vice versa - always require human confirmation.

Materiality thresholds are configurable. The AI only auto-resolves anomalies below a CO2-e threshold that the organisation sets. If a duplicate invoice represents 50 kg CO2-e, auto-resolution might be appropriate. If it represents 5,000 kg CO2-e, it goes to a human. The organisation decides where that line sits.

The kill switch is instant. An administrator can revert to any previous phase at any time. Not next week. Not after a support ticket. Immediately. If something feels wrong - a new document format the AI hasn't seen, a change in emission factors mid-year, a regulatory update - the organisation can pull back to full human review in one click and stay there until they're comfortable again.

Every AI Action Gets an Audit Trail

This is the part that most AI tools get wrong. They focus on speed and skip documentation.

Every action the AI takes in Carbonly - every classification, every emission factor match, every calculation, every anomaly resolution - is logged with four things: what the AI did, why it did it (the reasoning chain), the confidence score, and whether a human approved, modified, or overrode it.

That's not just good practice. Under ASSA 5000, the assurance standard that governs sustainability assurance engagements from January 2025, the practitioner must obtain an understanding of the entity's processes for identifying sustainability information, including internal controls. If your process is "AI did stuff," that's not an auditable answer. If your process is "AI processed 2,400 documents over 12 months with a tracked 98.3% approval rate, all decisions logged with reasoning and confidence scores, compliance-critical items always reviewed by a qualified human" - that's a control system.

The difference between an AI you can defend under assurance and one you can't isn't accuracy. It's documentation.

We wrote in detail about what auditors actually test under ASSA 5010, the standard that sets the assurance phasing timeline. Year 1 limited assurance covers Scope 1 and 2 emissions, governance, and selected strategy disclosures. By Year 4 (financial years from 1 July 2030), every mandatory climate disclosure requires reasonable assurance - the same standard as a financial audit. The progression from limited to reasonable assurance is, in itself, a graduated trust model. The regulatory framework expects your data processes to mature over time. Your AI should mature with them.

Why Graduated Trust Matters for AASB S2

ASRS mandatory reporting is already live for Group 1 entities. Group 2 starts from financial years beginning 1 July 2026. Group 3 follows from 1 July 2027.

The reporting workload is about to multiply. And for most mid-market companies, the team isn't growing. The budget isn't doubling. The same two or three people who were already struggling with NGER are now expected to produce AASB S2 disclosures that include climate scenario analysis, transition plans, and Scope 3 emissions from their second reporting period onward.

AI isn't optional for these teams. It's the only way the maths works on headcount. But adopting AI without a trust framework is how you end up explaining to an assurance practitioner why you can't trace a specific emission record back to its source document.

An AI system that spent six months in observation mode, graduated to human-approved processing with a documented 98% approval rate, and now auto-confirms high-confidence items within defined materiality thresholds - that tells a story an auditor can follow. It shows a deliberate control environment. It shows a rational basis for the level of automation. And it shows an organisation that takes data quality seriously enough to build trust incrementally rather than flipping a switch and hoping.

That last part matters. The CER's 2025-26 compliance priorities emphasise that "accurate and timely reporting is critical to maintaining scheme data integrity." They're using advanced data analysis to identify high-risk reporters. Your response to that scrutiny can't be "we're pretty sure the AI is right." It has to be "here's the evidence."

What This Looks Like in Practice

Consider a property manager with 40 commercial buildings across three states. Each building generates electricity bills, gas bills, water invoices, and waste collection records. That's roughly 640 source documents per quarter. Under NGER, they're reporting to the Clean Energy Regulator. Under AASB S2, they're disclosing in their annual report with limited assurance from Year 1.

Month one: AI processes all 640 documents in observation mode. The property manager does their normal manual work. At the end of the month, they compare. The AI correctly classified 621 of 640 documents. Nineteen needed correction - mostly unusual waste manifests from a new contractor.

Month two: the AI has learned from those corrections. Classification accuracy hits 98.4%. The property manager starts trusting the Scope 2 electricity calculations because the state-based emission factors are straightforward - the AI is just multiplying kWh by the published NGA factor. Hard to get wrong.

Month three: they move electricity and gas processing to the approval phase. The property manager reviews draft records in batches of 50, approving most in seconds. Waste and water stay in observation mode because the document formats are more variable.

Month six: electricity and gas are auto-confirmed above 95% confidence. Waste has graduated to approval mode. The property manager spends about 20 minutes per week on exceptions instead of three days per month on data entry.

The whole time, every AI decision is logged. Every human approval is timestamped. Every override is recorded with a reason. When the assurance practitioner arrives, there's a complete trail from source document to emission record.

That's not a hypothetical utopia. That's what graduated trust looks like when you build it into the system architecture from the start.

The Three Questions to Ask Any AI Carbon Tool

If you're evaluating AI for carbon accounting - whether it's Carbonly or anything else - ask these three questions:

Can I run the AI in observation mode first? If the answer is no, and the AI immediately starts creating or modifying compliance data, walk away. You have no way to validate accuracy before it affects your numbers.

Does the AI log its reasoning, not just its output? Knowing that the AI created a 4,200 kg CO2-e Scope 1 record isn't enough. You need to know why. What document did it read? What material did it classify? What emission factor did it apply? What was the confidence score? Without this, you can't defend the number under assurance.

Can I pull back instantly? Graduated trust only works if it goes both ways. If the AI starts producing unexpected results - because of a format change, a new supplier, an emission factor update - you need to be able to revert to full human control immediately. Not after a phone call to support. Now.

If a tool can't answer yes to all three, it hasn't earned the right to touch your carbon data.

Carbonly.ai is built for Australian mandatory reporting - NGER, AASB S2, and everything in between. Our graduated trust model means AI earns your confidence through demonstrated accuracy, not promises. Every AI action is logged, every human decision is recorded, and the audit trail is there when the assurance practitioner arrives.

Related Reading: