Statistical Anomaly Detection for Emissions Data: Catching Errors Before Your Auditor Does
72% of NGER reports contain errors. Most aren't deliberate — they're invisible until an auditor finds them. Statistical anomaly detection using z-scores, rolling averages, and missing data rules catches the gas bill entered as GJ instead of MJ, the facility that went to zero in Q3, and the electricity spike that means someone processed the same invoice twice.
A natural gas bill for 45,000 GJ sat in a spreadsheet for three months before anyone noticed it should have been 45,000 MJ. That's a factor of 1,000. At the NGA Factors 2025 emission factor of 51.53 kg CO2-e per GJ for natural gas, the error inflated that single facility's Scope 1 figure by about 2,270 tonnes of CO2-e. Nobody caught it because the spreadsheet didn't know the difference, and the person who entered it had already moved on to the next bill.
This is the kind of error that makes the ANAO's finding — that 72% of 545 NGER reports contained errors, with 17% containing significant ones — completely unsurprising. Carbon accounting errors aren't usually fraud. They're typos, unit confusion, missing documents, and copy-paste mistakes that slip through because nobody has a system watching for them.
Statistical anomaly detection is that system. Not machine learning. Not a black box. Basic statistics — z-scores, rolling averages, threshold checks, gap detection — applied to emissions data in a way that flags problems before they reach your NGER report or ASRS disclosure. We built it into Carbonly because we'd spent years watching these exact errors compound inside mining and energy companies, and we were tired of hearing about them only after the Clean Energy Regulator had already found them.
The errors that humans miss
Let's be specific about what goes wrong. These are patterns we've built detection rules around because they show up constantly in real utility data.
Unit confusion. Natural gas gets billed in megajoules in some states and gigajoules in others. A bill showing 45,000 MJ is a normal quarterly gas bill for a medium commercial premises. Entered as 45,000 GJ, it becomes a gas consumption figure that would rival a small industrial plant. In a spreadsheet, both numbers sit in the same column. Both look like reasonable numbers unless you're comparing them against historical patterns for that specific site — which nobody does manually across 50 or 200 facilities.
Transcription errors. An electricity bill reads 847,293 kWh. Someone types 874,293 kWh. That's a 27,000 kWh difference, roughly 17 tonnes CO2-e in NSW at the current state factor of 0.64 kg CO2-e/kWh. Not large enough to look obviously wrong. Not small enough to be irrelevant under assurance. These single-digit transposition errors hide inside large datasets because no human is comparing each entry against the source document twice.
Missing data. A facility's September gas bill didn't arrive. Nobody chased it. The quarterly figure shows as zero, and the annual total drops by 25% for that site. Or worse — the missing quarter gets averaged with the others and the gap disappears into a plausible-looking annual number. Under NGER, you need to keep records for five years from the end of the reporting year. A missing bill isn't just a data gap. It's a compliance gap.
Duplicate processing. The same electricity bill enters the system twice — once from email, once from a shared drive. Your emissions double for that billing period. We've written separately about how SHA-256 file hashing catches exact duplicates, but anomaly detection catches the downstream effect: a facility that suddenly shows twice its normal consumption in a given month.
Wrong emission factors. Using the national average electricity factor (0.62 kg CO2-e/kWh) instead of Victoria's state factor (0.78) understates emissions by 25%. Using it instead of Tasmania's (0.20) overstates by 210%. The CER's 2025-26 compliance priorities mention using "advanced data analysis tools" to identify high-risk reporters. A facility whose year-on-year emissions trend doesn't match the state grid factor changes is exactly the kind of signal those tools would catch.
Every one of these errors carries financial weight. Under Section 19 of the NGER Act, a non-compliant report can attract civil penalties of up to 2,000 penalty units — $660,000 at the current Commonwealth rate of $330 per unit. Beach Energy's enforceable undertaking in July 2025, for "inadvertent misstatements" across multiple reporting periods, required them to fund three years of reasonable assurance audits plus an external consultant to rebuild their control systems. They didn't cook the books. They just had bad data processes.
Five rule types that catch different problems
We're not going to pretend this is sophisticated AI. It isn't. Carbonly's anomaly detection runs five types of statistical rules against incoming emissions data. Each one catches a different failure mode.
Threshold rules flag when a facility's emissions or consumption exceeds a limit you set. This is the simplest check — if a small office building suddenly reports 500,000 kWh in a quarter, something's wrong. You configure the ceiling based on what's physically reasonable for each facility. A warehouse in Geelong consumes a different amount of electricity than a data centre in Homebush. The thresholds need to reflect that, which means someone has to set them. We can't auto-generate thresholds for a facility we've never seen data from, and we won't pretend otherwise.
Outlier detection uses z-score analysis against a 90-day rolling window. A z-score measures how far a data point sits from the mean, expressed in standard deviations. If your facility's electricity consumption has averaged 120,000 kWh per month with a standard deviation of 8,000 kWh, and this month's bill comes in at 180,000 kWh, that's a z-score above 7. We flag anything above 3 standard deviations as an anomaly and above 5 as critical. The maths is straightforward — mean, standard deviation, distance from mean — and it works because utility data for a given facility tends to follow a roughly normal distribution with seasonal variation.
Trend analysis compares each facility's 30-day rolling average against a 90-day baseline. This catches the slow drift that z-scores miss. A facility whose gas consumption creeps up 5% every month for six months won't trigger an outlier alert on any single bill. But the trend line diverges from the baseline, and that divergence means either something real changed (new equipment, extended operating hours) or something's wrong with the data (wrong meter, wrong allocation, wrong unit).
Missing data rules detect gaps in expected periodic data. If a facility has been submitting monthly electricity bills for 18 months and then nothing arrives in October, we flag it. This sounds trivial, but it's one of the most common sources of understated emissions. A gas bill that never arrives means a quarter of Scope 1 gas combustion goes unreported. Nobody notices because the annual total just looks like a reduction — which is exactly what the sustainability manager wanted to see, so they don't question it.
Pattern rules identify deviations from expected recurring patterns. If a facility's electricity consumption follows a clear seasonal pattern — higher in summer for cooling, lower in autumn — and a January bill comes in at winter levels, something deserves investigation. Maybe the cooling system was offline. Maybe the bill is for the wrong period. Either way, it's worth checking before it flows through to an emission calculation.
What this looks like on a real dataset
Consider a property manager running 40 commercial sites across three states. That's roughly 480 electricity bills per year, plus gas, water, and waste. Let's walk through what anomaly detection catches that a spreadsheet doesn't.
Site 14 in Melbourne submits a March electricity bill for 340,000 kWh. The 90-day average for that site is 155,000 kWh, with a standard deviation of 12,000 kWh. The z-score is 15.4. Critical anomaly. Someone investigates and finds the bill was for two meters combined — the tenant's and the landlord's — but only the landlord's consumption should count under the operational control boundary.
Site 27 in Brisbane hasn't submitted a gas bill since July. Missing data rule triggers in September. The property coordinator emails the gas retailer. Turns out the bills were being sent to a previous contact who'd left the company. Four months of gas consumption was sitting in an inbox nobody checked.
Site 3 in Adelaide shows a steady upward trend in electricity consumption — 8% above the 90-day baseline and climbing. Trend analysis flags it. Investigation reveals the site installed new LED lighting (which should have reduced consumption), but the old halogen circuits were never disconnected. Both systems running simultaneously. Not a data error — an operational finding that also happens to affect the emissions number.
None of these would have been caught in a quarterly spreadsheet review. The spreadsheet has no memory. It doesn't know that site 14 normally uses 155,000 kWh. It doesn't know that site 27 submits gas bills monthly. It doesn't flag trends.
Why this matters more now than it did two years ago
Two things changed. ASRS and assurance.
ASRS Group 2 entities begin reporting for financial years starting from 1 July 2026. Under ASSA 5010, limited assurance over Scope 1 and Scope 2 emissions is required from Year 1. That means an external auditor will be testing your data. Not just your final number — your data pipeline. Source documents to extraction to emission factor to calculation. An auditor performing limited assurance will select a sample of facilities and trace the numbers back to the utility bill.
If that trace reveals a facility where the electricity consumption doubled one month and nobody investigated, that's an assurance finding. If it reveals a quarter with zero gas consumption and no explanation, that's another finding. If it reveals that the same bill was processed twice and no system caught it, the auditor has reason to question whether your entire dataset is reliable.
The Beach Energy case is instructive. The CER's statement noted that while the NGER Act "does not explicitly require corporations to implement such controls," the absence of internal control systems "can lead to persistent and significant reporting inaccuracies." They weren't penalised for lacking anomaly detection specifically. But the remedy — three years of external assurance plus a rebuilt control system — is exactly what anomaly detection is designed to prevent. Catch the error before it reaches the report. Don't wait for the CER's analytics tools to catch it first.
For the 961 NGER registered controlling corporations, and the many more entities entering ASRS reporting over the next two years, data quality controls are shifting from "nice to have" to "the thing your auditor asks about in the first meeting."
Honest limitations
We're not going to oversell this. Statistical anomaly detection is effective for what it does. But it has clear boundaries.
Minimum data requirement. Z-score analysis needs at least 5 data points in the 90-day rolling window to produce meaningful results. A new facility with two months of bills doesn't have enough history. Until the baseline builds up, you're relying on threshold rules and manual review. There's no shortcut around this — statistics need data.
Errors within the normal range. If someone enters 125,000 kWh instead of 120,000 kWh, that 4% error is well within normal variation. The z-score won't flag it. Threshold rules won't catch it unless you've set very tight bounds, which then generates too many false positives. Small errors that stay within a facility's normal consumption range require document-level validation — checking the extracted number against the source document — not statistical outlier detection.
Threshold configuration. Threshold rules need someone to set them. For a company with 200 facilities, that's 200 thresholds to configure, ideally by utility type. We provide defaults based on facility category, but they're approximations. A cold storage warehouse consumes five times the electricity of a standard warehouse. If you don't configure the threshold accordingly, you'll get either false positives or missed anomalies.
Emission factor errors. Anomaly detection looks at consumption data — kWh, GJ, litres. If the consumption number is right but the wrong emission factor is applied, the emissions figure is wrong without triggering any anomaly in the consumption data. Factor validation is a separate check in the pipeline. Anomaly detection and factor validation work together, but they're not the same thing.
Seasonal patterns need history. Pattern rules require at least a full annual cycle to establish seasonal baselines. In the first year of data collection, they're essentially blind. We're honest about this because pretending otherwise would mean missing exactly the kind of errors we're trying to catch.
The real point
Carbon accounting errors don't announce themselves. A spreadsheet with 4,800 cells of utility data looks the same whether it's right or 15% wrong. The sustainability manager who assembled it probably did their best. But "did their best" doesn't satisfy the Clean Energy Regulator, and it won't satisfy the auditor who shows up to test your ASRS Scope 2 numbers.
Statistical anomaly detection isn't glamorous. It's z-scores and rolling averages and gap checks. But it sits between your raw data and your compliance report, catching the unit confusion and the missing bills and the duplicate entries that would otherwise flow straight through to your NGER submission. In a regulatory environment where the CER uses its own data analytics to identify high-risk reporters, having your own detection layer isn't optional anymore. It's the minimum.
If you're processing more than a few dozen utility bills per reporting period, start by running your last 12 months of data through anomaly detection and see what it finds. The number that comes back will tell you whether your current process is working — or whether you've been submitting errors you didn't know existed.
Related reading:
- Your NGER Report Is Harder Than It Looks — what Section 19 actually demands and where the errors originate
- How SHA-256 File Hashing Stops Double-Counting — catching duplicate documents before they inflate your numbers
- NGER Compliance Automation — building a system vs filing a report
- Why Most Carbon Calculators Give You a False Sense of Accuracy — the emission factor accuracy problem
- ASRS Assurance Requirements — what auditors actually test and how to prepare