Carbon AI: What's Real, What's Hype, and What Actually Works

74% of Big Tech's AI climate claims lack evidence. In carbon accounting, AI washing is even worse. Here's what AI actually does for emissions tracking — from document extraction to material matching — and how to tell the difference between real capability and a marketing checkbox.

Denis Kargl February 23, 2026 11 min read
Carbon AIArtificial IntelligenceEmissions Tracking
Carbon AI: What's Real, What's Hype, and What Actually Works

A February 2026 report examined 154 AI climate claims from Big Tech and found zero verified examples of AI actually reducing emissions. Not a few. Zero. Meanwhile, 74% of those claims lacked any substantiation at all.

That's the backdrop to every carbon software vendor putting "AI-powered" on their homepage. Including us. And that bothers us enough to write this.

Carbon AI is the fastest-growing marketing term in sustainability software right now. The global carbon accounting software market is projected to hit USD $96 billion by 2032, growing at 25.7% annually. Every player in the space has slapped "AI" on their product page, whether they're running multi-agent LLM pipelines or a glorified lookup table with an IF statement. The term has become so diluted that it tells you almost nothing about what a product actually does.

So here's what we think carbon AI should mean — and doesn't, most of the time.

AI Washing Has Hit Carbon Accounting

In March 2024, the US SEC charged two investment advisers — Delphia and Global Predictions — with making false and misleading statements about their AI capabilities. The combined penalty was $400,000. Delphia claimed its AI could "predict which companies and trends are about to make it big." It couldn't. The AI capabilities they marketed didn't exist.

Australia isn't far behind. ASIC has flagged AI washing as a regulatory concern, noting that companies exaggerate or falsely claim AI use "to make the company appear more innovative or technologically advanced than it actually is." Under Australian Consumer Law, those claims carry potential penalties of up to $50 million or 30% of adjusted turnover — whichever is greater.

Now apply that to carbon accounting. When a vendor says their platform uses "AI-powered emissions calculation," what does that mean? Is the AI reading your utility bills and extracting consumption data? Is it matching line items to emission factors from the NGA Factors workbook? Or is it a static calculation engine that multiplies kWh by a fixed number — the same thing an Excel formula does?

Most of the time, it's closer to Excel.

We build carbon AI for a living. We've spent 18 years building data platforms for BHP, Rio Tinto, and Senex Energy before starting Carbonly. And we'll be the first to tell you that most of what gets sold as "AI" in carbon software isn't artificial intelligence in any meaningful sense. It's automation. Automation is great — it's just not AI.

The Four Places Where AI Genuinely Helps

After processing tens of thousands of utility bills, invoices, and waste manifests through our system, we've identified four specific areas where AI — actual AI, not if-then rules — makes a material difference in carbon accounting. Everything else is automation dressed up.

1. Document Processing and Data Extraction

This is where AI earns its keep. A sustainability analyst at a 30-site company processes roughly 180 utility bills per quarter. Each one has a different format. AGL's electricity bills look nothing like Origin's. A gas invoice from Alinta uses megajoules; one from ActewAGL uses cubic metres. Some arrive as scanned PDFs. Others as photos taken at an angle on someone's phone.

Traditional OCR — the technology that's been around for decades — needs a template for each document format. Someone defines where on the page each field sits. When AGL changes their bill layout (and they do), the template breaks. Someone builds a new one.

Large language models changed this. A multimodal AI model reads a document the way a human does — understanding layout, context, and which numbers actually matter for an emissions calculation. Our 7-phase AI pipeline uses specialised agents for classification, extraction, validation, normalisation, emission factor matching, calculation, and audit trail generation. Each agent does one job. That's not a marketing slide — it's an architecture decision we made because a single model trying to do everything gets it wrong about 20% of the time.

And 20% error rate will wreck your NGER submission.

The real result is that we automate 70-80% of data collection that was previously someone squinting at PDFs for three weeks every quarter. That remaining 20-30% still needs human review. Anyone claiming 100% automation is selling you something.

2. Material Matching and Emission Factor Lookup

This one's harder to explain but arguably more important. When you upload a fuel receipt that says "Premium Unleaded 95 — 847.3L" our system needs to match that description to the correct NGA emission factor. For common fuels, that's straightforward.

But what about a waste manifest that says "Mixed C&D — predominantly concrete and timber"? Or a procurement record for "40MPa structural concrete, fly ash blend"? Or a chemical purchase described as "Solvent #4, industrial grade, 200L drum"?

These need to be matched to the right emission factor from the right database. Our AI Document Engine uses a 5-tier material matching system that checks exact matches first, then tries normalised names, then falls back to AI-powered semantic matching against our material library — which includes the full NGA database, EPD data, and a global emission factor cache.

Here's the part we're honest about: the AI generates a confidence score for every match. High confidence matches flow through automatically. Low confidence matches get flagged for human review. And every time a user corrects a match, the system learns. We maintain a Material Learning table where AI-generated context descriptions improve future matching accuracy. It genuinely gets better with use.

But it's not magic. New materials still trip it up. Ambiguous descriptions still produce uncertain matches. We haven't solved the "random abbreviation on a handwritten fuel docket" problem, and we're not sure anyone has.

3. Anomaly Detection and Data Quality Flags

Here's where we need to be especially honest. Our anomaly detection uses threshold rules and statistical z-score analysis. It flags when a site's electricity consumption is three standard deviations above its rolling average, or when a fuel receipt shows a volume that would exceed a truck's tank capacity. That's useful. It catches data entry errors that would otherwise flow through to your NGER submission unchecked.

But it's not a machine learning model. It's statistics. Good, well-configured statistics — but statistics, not deep learning. Some vendors in this space imply they're using neural networks to detect anomalies when they're doing exactly what we're doing: z-scores and thresholds.

We think that's fine. Z-scores work well for this problem. You don't need a neural network to spot that a 5-site company suddenly has electricity consumption at one site that's ten times the quarterly average. You need a system that checks, flags, and forces someone to confirm before the number flows into a regulatory report.

The value isn't the sophistication of the maths. It's that it happens automatically, on every data point, every time. A human reviewer gets tired by bill number 40. The system doesn't.

4. Natural Language Queries Against Carbon Data

This is the newest application and, honestly, the one we're least certain about long-term. Our Natural Language Assistant lets users ask questions like "What were our total Scope 2 emissions across Victorian sites last quarter?" instead of building spreadsheet filters.

It works well for straightforward queries. It's genuinely useful for a CFO who needs a number for a board pack and doesn't want to learn pivot tables. But it struggles with nuanced questions — "Compare our Scope 1 transport emissions excluding refrigerant leakage, year on year, adjusted for the NGA factor change between 2024-25 and 2025-26." That kind of multi-layered analytical question still needs someone who understands both the data model and the methodology.

We think natural language interfaces for carbon data will get much better. We're not sure they'll replace a skilled analyst anytime soon. They're a faster front door, not a replacement for the room behind it.

What's Not AI (But Gets Called AI Anyway)

A few things that carbon software vendors routinely label as "AI" that aren't.

Emission factor calculations. Multiplying consumption by an emission factor is arithmetic. It's a formula: kWh x state grid factor = kg CO2-e. When someone says their platform uses "AI-powered emissions calculations," they probably mean they automated the lookup. That's useful. It's not AI.

Report generation. Populating a template with your emissions data and producing a PDF is mail merge. If the system is using an LLM to write narrative sections of your sustainability report, that's a different (and frankly concerning) story — because ASRS disclosures carry director liability, and you really don't want GPT hallucinating your transition plan.

Dashboard visualisations. Charting your emissions over time with filters by scope, site, and period is business intelligence. It's been around since the 1990s. Calling it AI because the chart updates when you click a filter is like calling a light switch AI because it responds to input.

API integrations. Pulling data from an accounting system via API is integration engineering. It's plumbing. Good plumbing — the kind that saves you from manual CSV exports — but not intelligence.

None of this is criticism. We use all of these capabilities in our own platform. Automated emission factor lookup from NGA Factors is one of the most valuable features we ship. We just don't think calling it AI is accurate or honest.

How to Evaluate AI Claims From Carbon Software Vendors

With ASRS Group 2 reporting starting from July 2026 and Group 3 from July 2027, a lot of Australian businesses are buying carbon accounting software for the first time. Here's what to ask.

"Show me how it handles a document it's never seen before." Upload an invoice from an obscure regional retailer. If the system extracts it correctly without someone building a template first, that's likely real AI document processing. If it says "unsupported format" or requires manual entry, the AI claim is thin.

"What happens when the AI gets a match wrong?" A good system produces a confidence score and flags low-confidence matches for review. If everything comes back at 100% confidence, the system is either lying about its certainty or not using AI at all — just a static lookup.

"Where's the audit trail?" ASRS assurance requirements under ASSA 5010 mean auditors need to trace any reported figure back to a source document. If the AI extracted a number, the audit trail should show which document it came from, what the AI read, what confidence level it assigned, and whether a human confirmed it. No audit trail, no point.

"What specific AI models or techniques do you use?" This isn't a gotcha. A vendor using GPT-4o for vision extraction, BERT-based models for text classification, or a retrieval-augmented generation system for factor matching should be able to tell you that. A vendor who says "proprietary AI" and changes the subject is hiding something — usually the fact that there's no AI.

At Carbonly, we support multiple LLM providers — Gemini, OpenAI (GPT-4o), Anthropic (Claude), and custom endpoints — because different models have different strengths for different parts of the pipeline. Classification performance differs from extraction performance differs from factor-matching performance. We're not religious about which model is best. We're religious about which model is most accurate for each specific task.

The Honest Limitations

Here's what carbon AI can't do yet — at least not reliably.

It can't fix bad source data. If your facilities aren't metering properly, or your waste contractor sends you an annual estimate instead of weighed disposal records, AI won't invent accuracy from nothing. Garbage in, calculated garbage out.

It can't solve Scope 3. Not really. AI can help match procurement spend to emission factor categories. It can process supplier questionnaires faster than a human. But Scope 3 data quality remains fundamentally limited by what your suppliers can or will give you. That's a relationship problem, not a technology problem.

It can't replace professional judgement on methodology choices. Should you use location-based or market-based Scope 2? Should a refrigerant top-up be reported as a leak or maintenance? Is that joint venture within your operational control boundary? These are judgement calls that require understanding of the NGER Act, GHG Protocol, and your specific operating arrangements. AI can surface the relevant guidance. It can't make the call for you.

And it can't guarantee compliance. No software can. The ANAO found that 72% of NGER reports contained errors, with 17% being significant. Better tools reduce that rate — we believe dramatically. But carbon reporting involves judgement, estimation, and organisational boundary decisions that no AI system fully automates. Anyone guaranteeing zero-error compliance through AI is making the kind of claim the ACCC would find interesting.

What Carbon AI Actually Looks Like in Practice

For an Australian business facing mandatory reporting under ASRS, here's what a well-built carbon AI system does in practice.

Your sustainability team uploads 200 utility bills from last quarter. The AI classifies each one — electricity, gas, water, waste, fuel — without templates. It extracts consumption data, billing periods, meter numbers, and site identifiers. It matches each data point to the correct NGA emission factor for the right state and the right reporting year. It flags anomalies — that site in Queensland that apparently used 400% more gas than usual, which turns out to be an estimated read that needs correction. It generates emission calculations with a full audit trail linking every tonne of CO2-e back to its source document.

That process used to take a team three weeks. With AI that actually works, it takes hours. The team reviews flagged items, confirms uncertain matches, and moves on to the work that actually requires human intelligence — reduction strategy, target-setting, scenario analysis for transition planning.

That's what carbon AI should be. Not a marketing term. A genuine reduction in the drudge work that stops sustainability professionals from doing their actual job.

If you're evaluating carbon accounting software for ASRS compliance, ask the hard questions. Upload real documents. Check the audit trail. And don't confuse automation with intelligence — even though both have value.


Related Reading: