Generative AI models mostly inaccurate when sourcing statistical data, finds trial
UK
– Most current large language models failed to accurately answer a question
focused on UK statistics, in an experiment published by the MRS Census and
GeoDemographics Group (CGG).
According
to the report, most of the AI models got the answer wrong or refused to answer
the question. Only one system returned the correct answers (according to ONS)
the first time, while another got it right on the second attempt at prompting
(with no changes to the prompt).
The
report found that while the outputs looked coherent in terms of their
vocabulary and grammar, the quality of the numbers provided was poor.
Additionally, running the same question again was likely to result in a
different answer.
Report
authors Jaan Nellis and Peter Furness conducted the trials to look at what AI
tools can do against a particular query about particular data, following a
discussion in a CGG meeting earlier this year.
Speaking
to Research Live, Furness explained: “It’s no longer the preserve of
experts to get access to public datasets – anybody can type something into
Google and get an AI summary, and it goes away and finds data and comes back
with answers. That’s wonderful, but on the other hand, in the wrong hands –
i.e. in the hands of perhaps less skilled people, people with axes to grind –
it could be quite dangerous if the numbers being put out are not accurate. If
there was a warning flag to be raised, we wanted to raise it.”
The
report authors’ expectation, borne out in the test results, was that if you
repeat a question, you get different results.
Nellis
said: “We tested Google’s chatbot and Google’s search agent in June/July. They
should be one and the same, but they weren’t, back then – we got different
results. Then we came back in September because we had some queries we wanted
to resolve, and lo and behold Gemini was consistent across the Google platform
– they were at least reporting the same things. [AI models] are refining all
the time and you would hope that that refinement would lead to a better
situation.
“We
thought that asking for GDP was a simple thing to ask for, but it’s quite
complicated because there are lots of different GDPs, so you need to be
relatively cognisant of which GDP you want. The one we asked for was what we
considered to be the most common, standard metric.”
The
errors in the experiment results, according to Nellis, were primarily due to
the search algorithm selecting old webpages with out-of-date figures.
While
there are two different core ways an AI model can operate – in one, you ask an
LLM a question and receive a response that is not using any external data but
rather what it has learned – most systems use a technique called RAG
(retrieval-augmented generation).
Nellis
explained: “RAG runs a search algorithm to get what it considers to be a tight
sample of pages that will have the answer in there somewhere. That’s the way we
would do it if we were doing it by hand, but you get the issues with the search
algorithm – is the search algorithm accurate? It’s slightly arbitrary which
pages get pushed to the front and which don’t.”
The
report also looked at whether there are tools in development which may be able
to select statistical data more precisely. One tool, StatGPT, released by the
IMF in partnership with EPAM Systems, reports only on high-quality sources by
using SDMX-compliant data queries to source its RAG data. SDMX (statistical
data and metadata exchange) is an ISO standard to describe statistical data and
metadata and standardise queries across data providers.
Furness
likens this to data producers having “nice little hooks sticking up in their
data which the tools can then find and plug into”, but said this is not in
place in the UK at the moment. While ONS, through its partner NOMIS, and the UK
Data Service both provide an SDMX API, StatGPT is not currently connected to
either. The CGG report recommends that the tool – initially for testing, and if
shown to be useful for customer facing deployment – should be connected to
these UK sources.
📌 Visit Us:
🌐
Website: https://statisticsaward.com/
🏆 Nomination: https://statisticsaward.com/award-nomination/ecategory=Awards&rcategory=Awardee
📝 Registration: https://statisticsaward.com/award-registration/
Comments
Post a Comment