Natural Language Queries for Emissions Data

It's a question that comes up constantly inside companies preparing for their first ASRS Group 2 disclosure: "Can I just ask the system what our Victorian Scope 1 emissions were last financial year, or do I need to get someone from sustainability to build me a report?" CFOs ask it. Operations directors ask it. Boards ask it in slightly more polite phrasing.

That's the exact problem. Carbon accounting platforms are built for sustainability specialists. Dashboards assume you know what filters to apply, which scope categories matter, and how to export the right slice of data. Ask any sustainability manager what they actually spend their time on, and a surprising chunk of it is just answering questions from other people in the organisation. The CFO wants last quarter's total. The operations director wants facility-level breakdowns. The board wants a year-on-year comparison. Each request means someone drops what they're doing, opens the platform, clicks through filters, exports to Excel, and emails a spreadsheet.

According to PwC's 2025 Global Sustainability Reporting Survey, 90% of organisations still rely on spreadsheet-based sustainability data collection. And a BCG GAMMA survey found that 86% of companies still manage emissions data manually. Even companies that have moved beyond spreadsheets into dedicated carbon accounting software still face this problem - the data lives inside the platform, and getting it out requires someone who knows how the platform works.

We built a natural language query assistant into Carbonly because we got tired of watching this play out. Not a chatbot. Not generative AI that invents plausible-sounding numbers. A structured query engine that parses plain English questions into database queries and returns actual data from your actual emissions records.

The bottleneck nobody talks about

There's a stat that gets thrown around in our industry: sustainability teams spend 60-80% of their time on data collection and reporting. Watershed's 2026 State of Corporate Sustainability report found that half of sustainability teams are just one to five people, spending what amounts to half a year on data collection, standardisation, and cleaning, then another two to three months on mapping and calculations.

But there's a sub-problem inside that statistic. It's not just collecting data - it's redistributing it. Once emissions figures are in the system, people across the organisation need access to specific slices of that data. And those people aren't going to learn to use a carbon accounting platform. They shouldn't have to.

A Deloitte study found that 32% of companies now designate the CFO as the primary person responsible for ESG reporting, with another 16% sharing responsibility between CFO and CSO. That means nearly half of the senior leaders accountable for emissions data don't spend their days inside sustainability platforms. They need answers, not training on how to build custom reports.

With ASRS Group 2 entities now required to prepare climate-related financial disclosures for financial years starting from 1 July 2026, this isn't theoretical. CFOs at mid-sized Australian companies - entities with consolidated revenue of $200 million or more, or 100+ employees - are going to need quick access to emissions figures. Not next week, after someone builds a report. Now.

What a structured query engine actually does

Let's be precise about what we're talking about, because the distinction matters.

A chatbot takes your question, sends it to a large language model, and gets back a generated response. Sometimes that response is right. Sometimes it's confidently wrong. The model doesn't query your database - it predicts what a helpful answer would look like based on pattern matching across its training data. That's fine for drafting emails. It's dangerous for emissions numbers that end up in NGER submissions or ASRS disclosures.

A structured query engine does something fundamentally different. It takes your natural language input - "What were our Scope 2 emissions in NSW last quarter?" - and breaks it into components it can understand. Action: sum. Entity: emissions. Filter: Scope 2, state NSW, date range last quarter. It then runs a real database query against your actual emissions records and returns a precise number with a source trail.

In Carbonly, this works through a fast keyword parser with 40+ entity mappings. It's not waiting for an LLM to think about your question. For common queries, it recognises what you're asking in milliseconds. The actions it supports - count, sum, list, compare - cover what people actually ask. The entities it understands - emissions, incidents, targets, projects, materials - map to the data structures that matter for carbon accounting.

Here's what that looks like in practice.

You type: "Total Scope 1 emissions by state FY2025"

The parser breaks that down: action = sum, entity = emissions, filter = Scope 1, groupBy = state, dateRange = FY2025 (which it knows means July 2024 to June 2025, because it was built for Australia, not retrofitted). It queries the database and returns a breakdown - Victoria: 4,230 t CO2-e, Queensland: 2,870 t CO2-e, NSW: 1,540 t CO2-e. Real numbers from your records.

You can then follow up: "Compare that to FY2024"

And get a year-on-year comparison. Within a session-based conversation, the system holds context for up to 50 messages, so you're not starting from scratch with each question.

The queries people actually ask

We designed the system around the questions that sustainability managers, CFOs, and operations directors actually ask. Not the questions you'd build a dashboard for - the ad-hoc ones that come up in meetings, board prep, and email threads.

"Which facility had the highest emissions last month?" Action: list, sorted by emissions descending, dateRange last month. This is the question an operations director asks before a quarterly review and currently requires exporting all facility data and sorting in Excel.

"How many incidents were reported in Queensland this year?" Action: count, entity: incidents, filter: state QLD, dateRange this year. This query crosses from emissions into compliance - linking to our environmental incident tracking module, which captures spills, leaks, and equipment failures alongside their emissions impact. One of the advantages of having the assistant cover incidents, targets, and materials alongside emissions data.

"List all projects with status active" - a project management query, not an emissions one. But sustainability managers track emission reduction projects alongside measurement, and being able to query across both saves switching between screens.

"Compare Scope 2 emissions this quarter vs last quarter by business unit." This is the kind of multi-dimensional query that in a traditional platform requires selecting a scope filter, a date range, a comparison period, and a group-by dimension. In the assistant, it's one sentence.

And critically, the system understands Australian date conventions natively. "FY2025" means July 2024 to June 2025. "Last quarter" calculates relative to today's date. "Last 90 days" does what it says. You can type "yesterday" and it works. This seems trivial until you've used a platform built for the US market that thinks your financial year starts in January and refuses to accept "FY" as a valid date filter.

Role-aware - because not everyone should see everything

One thing we were deliberate about: the assistant respects user permissions. A viewer-level user asking about emissions gets emissions data. They don't get financial figures, cost projections, or pricing information tied to reduction projects.

This matters more than it might seem. When a board member gets access to query emissions data directly, you don't want them accidentally pulling up commercially sensitive project costs or supplier pricing. When an external auditor is reviewing data, they should see what they need and nothing more.

The role-aware design means the CFO can query emissions, costs, and targets. The facility manager can query their own site data. The board observer can see total emissions and trends. Nobody has to build separate dashboards or restrict access through clunky workarounds. The data boundaries are built into the query engine itself.

What it can't do - and why that's a feature

Let's be honest about the limitations, because overpromising is the easiest way to erode trust in any tool.

The assistant can't answer questions outside the data in your system. If you haven't uploaded your gas bills for Q3, and you ask "What were our gas emissions in Q3?", you'll get whatever's there - possibly zero, possibly incomplete. It won't estimate. It won't fill gaps. It gives you what the data says. We think this is the right design choice for a system that feeds into NGER compliance and ASRS disclosures, where making up numbers isn't an option.

It won't generate insights or recommendations. Ask "How should we reduce our Scope 2 emissions?" and it doesn't know what to do with that. It's a query engine, not a strategy consultant. We have a separate carbon reduction planning module for that kind of analysis, but the natural language assistant isn't trying to be everything.

It has a 50-message cap per session. We set this deliberately - long conversations accumulate context that makes the parser slower and less accurate. Start a new session, get a fresh conversation. For the types of questions people actually ask (quick data lookups, comparisons, facility breakdowns), 50 messages is more than enough.

Complex multi-step analysis still needs a human. "What's our emissions trajectory if we switch all Victorian sites to renewable energy by 2028, accounting for the grid decarbonisation trend?" - that's a scenario modelling question, not a query. The assistant might get you the baseline Victorian emissions figure you need to start that analysis. But the analysis itself requires human judgement, assumptions, and a proper modelling tool.

And because it's a keyword parser, not a free-form AI, it sometimes won't understand creative phrasing. "How dirty is our Victorian operation?" isn't going to return Scope 1 emissions for your Melbourne facility. "Scope 1 emissions Victoria" will. We've optimised for clarity and speed over conversational flexibility. That tradeoff means faster responses, more reliable results, and no hallucination risk - but it also means you occasionally need to rephrase.

We're still iterating on the parser. Some queries that feel obvious to a human - "emissions from last July" - can be ambiguous to a system that needs to decide if you mean July this year, July last year, or the financial year starting July. We handle these through Australian date conventions by default, but edge cases exist. We're not going to pretend otherwise.

Why this matters now, not in two years

The timing is specific to Australia. ASRS Group 2 entities start reporting for financial years beginning 1 July 2026. That means mid-2026 through mid-2027 will see a wave of mid-sized companies producing their first mandatory climate disclosures. Many of these companies have one sustainability person, or none - the CFO is doing it alongside everything else.

Under AASB S2, entities need to disclose Scope 1, 2, and 3 greenhouse gas emissions, plus detailed information about climate-related risks and opportunities that could affect cash flows, access to finance, or cost of capital. That requires the CFO to be able to interrogate the emissions data directly, not just receive a static report once a quarter from the sustainability team.

PwC's 2025 survey found that 28% of companies now use AI for sustainability reporting - nearly triple from 11% the year before. But most of that AI usage is in drafting disclosures and identifying risks, not in data access. The data access layer - the ability for non-specialists to query emissions figures directly - is still barely addressed.

And it's not just about reporting. When a customer asks for your emissions data (and they will), the sustainability manager shouldn't need three days to compile the answer. When the auditor asks for facility-level Scope 2 by state for the last two years, that shouldn't trigger a panic. These are database queries. They should take seconds.

The spreadsheet comparison nobody's making

Every organisation that moved from spreadsheets to carbon accounting software celebrated getting their data into a proper system. Better structure. Better audit trails. Fewer formula errors. All true.

But they traded one access bottleneck for another. With spreadsheets, at least anyone could open the file (assuming they could find it). With software, only trained users can extract data. The sustainability platform becomes another silo - better organised than the spreadsheet, but equally opaque to the CFO who just needs a number for a board paper.

Natural language query doesn't replace the platform. It sits on top of it. The data still lives in a structured database with full audit trails and emission factor documentation. The dashboards still exist for detailed analysis. But for the 80% of interactions that are "just tell me a number" - a query interface that speaks English is the difference between a two-minute answer and a two-day turnaround.

We built this for a specific reason. After 18 years working in enterprise data platforms at mining and resources companies, we'd seen what happens when critical operational data lives behind specialist interfaces. People stop asking questions. They make decisions based on outdated information or gut feel. Or they build shadow spreadsheets - extracting data from the official system into their own Excel files that immediately start diverging from the source of truth.

Emissions data shouldn't work that way. Not when it's going into mandatory disclosures reviewed by auditors and regulators.

What to do with this

If you're evaluating carbon accounting platforms, ask vendors one question: "If my CFO wanted to know our Scope 2 emissions in Queensland for the last financial year, how would they get that number?" If the answer involves exporting CSVs, building custom reports, or "we can train them on the dashboard" - that's the access bottleneck you'll live with for the next five years.

The data is the hard part. Getting it into the system, validated, and calculated correctly - that's where 90% of the effort should go. But once it's there, accessing it shouldn't require a specialist. It should require a question.

Related Reading: