The 5-Tier Material Matching System for Carbon...

"B20 Biodiesel Blend" appeared on a fuel docket from a construction site in Queensland last month. Our extraction pipeline pulled it out cleanly - 312 litres, $623.40, date, site address, the lot. Perfect extraction. Then the material matching engine had to decide: is that a 20% biodiesel blend with an 80% petroleum diesel base? Is it automotive or stationary? The NGA Factors 2025 workbook has separate entries for "Diesel oil" (Table 8 for stationary, Table 9 for transport), "Biodiesel" (a standalone fuel type with different biogenic fractions), and no entry at all for "B20" as a named material. Getting the match wrong doesn't shift your number by a rounding error. It shifts it by 20-40%, depending on whether you apply the biogenic CO2 adjustment correctly.

We wrote a higher-level overview of emission factor matching recently. That post explained what each tier does and why. This one goes deeper - into the preprocessing that happens before any matching starts, the 18 regex patterns that parse material names, the 7-step AI context generator, and the specific mechanics of how the system learns from human corrections. If you're evaluating carbon accounting software and someone tells you they "automatically match emission factors," this is the level of detail you should be asking for.

The gap between real documents and the NGA database

The NGA Factors 2025 workbook contains 139+ emission factors covering fuels, electricity, industrial processes, waste, and land use. Each factor has a precise, official name. "Diesel oil" in Table 8. "Gasoline (petrol) for use as fuel in an engine" in Table 9. "Natural gas distributed in a pipeline" in Table 4.

Nobody writes those names on an invoice.

Australian energy retailers use whatever description fits their billing system. Origin Energy bills might say "Gas Supply." AGL calls it "Natural Gas Usage." A smaller retailer might print "Envestra Gas" or just "Gas." A fleet fuel card statement from BP lists "DIESEL" in caps. The servo down the road prints "ULP 91" on the receipt. A mining company's bulk fuel delivery note says "ADO" (Automotive Diesel Oil) - a term the NGA Factors workbook doesn't use at all.

This naming inconsistency isn't a minor annoyance. Under NGER, diesel used in transport falls under Division 2.3 of the Measurement Determination and references Table 9 factors. Diesel used for stationary energy (generators, boilers) falls under Division 2.2 and references Table 8. The emission factors differ because the methodologies account for different combustion conditions. An NGER reporter who applies a transport factor to generator fuel - or vice versa - has a methodological error. The kind the ANAO found in 72% of NGER reports it audited.

So the challenge isn't just "match diesel to diesel." It's "match 'ADO 142.6L' from a blurry fuel docket to the correct diesel entry in the correct NGA table, for the correct combustion context, with a documented reason for the choice."

What happens before any matching starts

Before a material name hits Tier 1, we run it through preprocessing. This matters more than most people realise.

The preprocessing pipeline strips parenthetical content first. "Natural Gas (reticulated)" becomes "Natural Gas." "Electricity (peak + off-peak)" becomes "Electricity." Parentheticals on invoices are usually retailer-specific qualifiers that confuse matching without adding useful information for factor selection.

Then it removes equipment-type suffixes. The system recognises patterns like "-Grid," "-Mobile," "-Stationary," and strips them from the material name while preserving the equipment context as metadata. So "Diesel-Mobile" becomes "Diesel" with an equipment flag set to "mobile" - which later tells the matching engine to prefer transport factors over stationary ones.

Whitespace normalisation sounds trivial. It isn't. We've seen material names with tab characters, non-breaking spaces, trailing whitespace, and double spaces mid-word from OCR errors on damaged documents. "Natural Gas" (double space) won't match "Natural Gas" in a direct lookup unless you normalise first.

After cleaning, the system runs the material through 18 regex patterns organised into four categories - Energy/Fuels, Transport, Materials, and Waste. These patterns don't do the matching themselves. They classify the material into a domain, which narrows which emission factors are candidates. If the regex engine identifies "ULP 95" as a transport fuel, the matching engine doesn't waste time comparing it against waste disposal factors or industrial process emissions.

The equipment pattern detection runs in parallel. Eight patterns cover the main equipment types: mobile/vehicle, stationary/generator/boiler, grid/mains, aviation, marine, rail, construction, and manufacturing. When a line item says "Generator Diesel - Site 4" the system catches "Generator" and flags stationary use before any tier gets involved. That context is critical for the diesel transport-versus-stationary split.

The 7-step AI context generator

This is the piece most carbon accounting platforms don't have, and it's the piece that makes Tier 3 work.

Every material in our library doesn't just have a name and an emission factor. It has a block of AI-generated context - a structured description of what that material looks like in the wild, on real documents, with real naming variations. The context generator builds this in seven steps.

Step 1: Parse the material name into keywords, equipment types, and aliases. For "Diesel oil - transport" this produces keywords ["diesel", "oil", "fuel", "petroleum"], equipment type "transport/vehicle", and aliases ["ADO", "automotive diesel", "diesel fuel", "ULSD"].

Step 2: Add category-based context. Because this material is in the "Liquid Fuels - Transport" category, the generator adds context about NGA Table 9, Division 2.3 of the NGER Measurement Determination, typical units (litres, kilolitres), and the energy content value (38.6 GJ per kilolitre) that's needed for certain calculation methods.

Step 3: Add scope-based hints. Transport diesel is Scope 1. The context includes that classification, so downstream processes know this isn't a Scope 2 or Scope 3 item without having to re-derive it.

Step 4: Learn from similar materials already confirmed by users in the organisation. The generator queries the database for the five closest confirmed matches. If three previous users matched "DIESEL" from BP fuel cards to this factor, those confirmed names get added to the context. This is where organisational knowledge feeds back into the system at the library level, not just the alias level.

Step 5: Add spend-based context for materials where the unit might be currency. If someone uploads an invoice that says "$14,500 - fuel supply" without a quantity in litres, the system needs to know that a spend-based factor exists for this category and what its units are (kg CO2-e per AUD). Step 5 preloads that fallback path.

Step 6: Generate a human-readable description. Not for display - for the matching engine. Something like: "Diesel oil consumed as transport fuel in road-registered vehicles. Common invoice names include DIESEL, ADO, automotive diesel, diesel fuel. NGA Table 9, Division 2.3. Scope 1 emission. Energy content 38.6 GJ/kL. Do not confuse with stationary diesel (Table 8, generators/boilers) which uses a different emission factor."

Step 7: Deduplicate and clean. Remove redundant aliases, normalise formatting, ensure no contradictory hints made it in from the database query in Step 4.

The result is a rich, searchable context block for every material in the library. When Tier 3 tries to match "Premium Unleaded 95" against the library, it's not doing string comparison against "Gasoline (petrol) for use as fuel in an engine." It's searching against a context that includes "premium," "unleaded," "95 octane," "ULP," "petrol pump," "service station fuel," "91/95/98 octane grades" - because the context generator put all of those in there.

How the tiers cascade - and when they stop

We've covered the tier overview before. What matters here is the confidence thresholds and what triggers a handoff versus a flag.

Tier 1 (direct match, confidence 1.0) and Tier 2 (alias match, using the stored alias confidence) are fast lookups. The key design decision in Tier 2: any alias stored below 0.85 confidence gets flagged for review rather than auto-applied. One person's uncertain correction doesn't become gospel for the whole organisation.

Tier 3 (AI context match) is where the 7-step generator earns its keep. The matching searches against the context blocks - keyword overlap and semantic similarity - not against bare material names. "Premium Unleaded 95" matches because the context for the petrol factor contains "premium," "unleaded," "95 octane," "ULP," and "service station fuel." Confidence threshold for auto-apply: 0.8.

Tier 4 (fuzzy/vector match) uses Levenshtein distance, token overlap, and embeddings. It catches OCR typos - "Diesl" to "Diesel" - and structural naming variations. But it always requires review, no matter the confidence score, because fuzzy matches generate too many plausible-but-wrong candidates.

Tier 5 (LLM fallback) sends the full line item with all document context to a language model. Confidence must exceed 0.6 to even produce a suggestion. Always requires review. And there's a specific behaviour we built in: when the extracted quantity is monetary rather than physical, the LLM prefers spend-based emission factors. "$45,000 - catering services" gets matched to a kgCO2-e/AUD factor, not a per-meal factor that would require data the invoice doesn't contain.

Items that don't auto-apply land in one of three review buckets: not_found (nothing in the library is even close), low_confidence (a plausible match the system doesn't trust enough), or multiple_matches (two or more factors scoring similarly). That last bucket is the dangerous one. "Gas" matching both "Natural gas distributed in a pipeline" and "Liquefied petroleum gas" at nearly equal confidence - both plausible, significantly different emission factors. Without a human picking, the system would have to guess. It doesn't.

How the learning loop closes

When a human confirms a match - correct suggestion or manual correction - three things happen simultaneously.

The confirmation gets stored in the materialLearning table as an alias, with the original text, matched factor, confirming user, and a confidence score. Corrections to wrong Tier 5 suggestions get stored with slightly lower initial confidence than confirmations of correct Tier 2 matches - the system treats LLM guess corrections as more tentative.

The AI context generator re-evaluates the matched material's context block. If "ADO" was just confirmed as an alias for transport diesel, Step 4 of the generator now includes that data point for future rebuilds.

And the next time any document in that organisation contains "ADO," Tier 2 catches it instantly. Faster, more confident, no review needed (assuming confidence exceeds 0.85).

We've seen organisations go from roughly 40% of materials being matched at Tier 1 to over 70% combined Tier 1+2 within their first few months of use. That's not because the NGA Factors database changed. It's because the alias table filled in the specific naming gaps between that organisation's documents and the official factor names.

But here's the honest limitation: the learning loop depends entirely on humans actually reviewing flagged items carefully. If someone clicks "accept" on every suggestion without checking, bad aliases propagate. We've built safeguards - first-time matches get highlighted differently, low-confidence items show warning indicators, and there's an alias audit view where admins can review and revoke stored matches. But we can't force diligence. The system is only as good as the attention its users put into the first few dozen review rounds.

Where this still breaks

We're not going to oversell this.

Completely novel materials need manual factor entry. When a client first uploaded an invoice for HVO100 (hydrotreated vegetable oil - a renewable diesel), the library had no factor for it. The NGA Factors workbook doesn't have a specific entry either. Tier 5 suggested a biodiesel factor, which wasn't quite right - HVO has different lifecycle emissions characteristics. A human had to research the correct factor, create a new library entry, and only then could the learning loop take over. First encounters with genuinely new materials still require expertise.

Tier 5 LLM matching can be confidently wrong. Language models don't say "I don't know" naturally. When presented with an ambiguous material and asked to pick the best factor, the LLM sometimes constructs a plausible-sounding rationale for a match that's actually incorrect. That's why Tier 5 is the last resort, and every Tier 5 match requires review. We're not sure there's a way to fully solve this without a human in the loop. We've tried confidence calibration, chain-of-thought prompting, and asking the model to express uncertainty - it helps at the margins but doesn't eliminate the problem.

Spend-based matching is inherently less precise. When an invoice line says "$12,500 - professional services" and the system matches it to a spend-based factor, that factor is a sector average derived from input-output models. It doesn't know if those services came from a one-person home office or a data centre. The BCG GAMMA survey estimated 30-40% error rates in corporate emissions calculations, and spend-based factor selection is a big contributor to that range. We flag spend-based matches distinctly in the audit trail so users and auditors know the precision tier of every number.

The diesel transport-versus-stationary problem isn't always solvable from the document alone. If a fuel docket just says "DIESEL - 200L" with no site context and no equipment information, the system can't know whether it went into a truck or a generator. Preprocessing catches explicit clues ("Generator Diesel," "Fleet Fuel"), and site-level defaults can fill gaps if the organisation has classified their sites. But ambiguous cases still exist. We flag them as multiple_matches and ask a human. Not glamorous, but honest.

Why this matters for NGER and ASRS audits

Under NGER, the Clean Energy Regulator checks methodology, not just numbers. Applying Table 9 factors when Table 8 applies (or vice versa) is a methodological error - the kind that can trigger an enforceable undertaking, like the one Beach Energy faced in July 2025. Records must be kept for five years from the end of the reporting year. That means every factor selection decision needs to be traceable back to its source for half a decade.

Under ASRS assurance requirements, auditors applying ASSA 5010 will trace emission factors back to the NGA Factors workbook and verify not just that you used the right number, but that you used the right number for the right application. A material matching system that logs which tier produced the match, what the confidence score was, which NGA table and division the factor came from, and who confirmed it - that's what turns a calculation into an auditable disclosure.

The 5-tier system isn't clever engineering for its own sake. It exists because the gap between what appears on an Australian fuel receipt and what the NGA Factors database calls that material is real, it's messy, and getting it wrong has compliance consequences that start at penalty notices and scale up from there.

If you're evaluating carbon accounting tools, ask one question: when the system matches "B20 Biodiesel Blend" to an emission factor, can it show you why it picked that factor, at what confidence, from which tier, and who confirmed it? If the answer is "it just does it automatically," that's not confidence. That's a black box with a compliance risk inside. This is the difference between real AI in carbon accounting and marketing hype - transparency in how the system arrived at its answer.

Related Reading: