Audit AI search tools now before they distort your research


Search tools powered by large-scale linguistic models (LLMs) are changing how researchers access scholarly information. a device, scite assistant, using GPT-3.5 to generate responses from millions of scientific papers. other, took outLLM uses LLM to write answers to search articles in scholarly databases. Communication Finds and integrates research claims in papers, but SciSpace It bills itself as an ‘AI research assistant’ that can explain math or text in scientific papers. All these tools provide natural-language answers to natural-language questions.

Search tools tailored to academic databases can be used to provide alternative ways of identifying, ranking and accessing papers using an LMS. In addition, researchers can use general artificial intelligence (AI)-powered search systems like Bing to narrow down queries that only target academic databases like CORE, PubMed, and Crossref.

All search systems affect scientists’ access to knowledge and influence how research is conducted. All have unique abilities and limitations. I know this from my own experience search smart, It is a tool that allows you to compare the power of 93 common search engines, including Google Scholar and PubMed. AI-assisted natural language search tools will undoubtedly have an impact on research. The question is: how?

Before mass adoption of LLMS in academic pursuits, the remaining time must be used to understand the opportunities and limitations. Independent audits of these tools are critical to future-proofing access to knowledge.

All LLM-assisted search tools have limitations. LMAs can ‘cheat’: make up papers that don’t exist, or summarize content incorrectly by making up information. Although dedicated academics are less likely to think that a set of LLM-assisted search systems are querying scientific databases, the extent of their limitations remains unclear. And since AI-assisted search systems, even open-source ones, are ‘black boxes’ – their mechanisms for matching terms, results and queries are unclear – methodological analysis is needed to determine whether important results are missed or subtly biased. Certain types of paper, eg. Incidentally, I have found that Bing, scite Assistant and SciSpace tend to produce different results when repeated searches, leading to irreversibility. The lack of clarity means that many limitations are likely to be found.

Already, Twitter threads and viral YouTube videos promise that AI-assisted search can accelerate systematic reviews or facilitate brainstorming and knowledge summarization. If researchers are not aware of the limitations and biases of these systems, the research results will be compromised.

There are rules for LLMs in general, some in the research community. For example, publishers and universities have enacted policies to prevent LLM-enabled research misconduct such as misappropriation, plagiarism, or false peer review. Institutions such as the US Food and Drug Administration approve AI for certain uses, and the European Commission is proposing its own legal framework on AI. But more focused policies are needed, especially for LLM-assisted searches.

Working on Search Smart, I developed a way to systematically and transparently evaluate the functionality of databases and their search systems. I’ve found capabilities or limitations often left out or incorrectly defined in the FAQs of the search tools themselves. At the time of our study, Google Scholar was the most commonly used search engine by researchers. However, we found its ability to interpret Boolean search queries such as OR and AND to be inadequate and underreported. Based on these findings, we recommend against relying on Google Scholar for primary search functions in systematic reviews and meta-analyses.M. Gusenbauer & NR Haddaway Res. Synth methods 11, 181-217; 2020).

Although search AIs are black boxes, their performance can still be evaluated using ‘metamorphic testing’. It’s a bit like car-crash-testing: it only asks how the occupants survive various crash scenarios without having to know how the car works inside. Similarly, AI testing should prioritize performance in specific tasks.

LLM inventors should not be relied upon to perform these tests. Instead, third parties should systematically audit the functionality of these systems. Organizations that compile evidence and advocate for evidence-based practices, such as the Cochrane or Campbell Collaboration, would be good candidates. They can conduct audits themselves or jointly with other entities. Third-party auditors may want to partner with librarians who can play an important role in teaching information literacy around AI-assisted search.

The purpose of these independent audits is not to decide whether LLMs should be used or not, but to provide clear, practical guidelines for only those tasks that AI-assisted searches can do. For example, an audit tool may be used for a search to help define the scope of a project, but cannot reliably identify papers on the topic due to ghosting.

Researchers need to test AI-assisted search systems before widely promoting unintentionally biased results. A clear understanding of what these systems can and cannot do can only improve scientific rigour.

Competing interests

MG is the founder of Smart Search, which tests academic search systems.

We offer you some site tools and assistance to get the best result in daily life by taking advantage of simple experiences