Large language models continue to struggle with hallucinations, presenting a major roadblock for real-world enterprise applications. Reducing these errors is a messy business, forcing model developers to navigate a strict tradeoff where eliminating factual errors often suppresses valid answers.
In a new paper, Google researchers introduce the concept of “faithful uncertainty,” a metacognitive technique that aligns a model’s response with its internal confidence. This alignment allows the model to offer appropriately hedged hypotheses, such as “My best guess is,” instead of defaulting to an unhelpful “answer-or-abstain” binary.
In real-world agentic AI applications, this metacognitive awareness acts as an essential control layer. It empowers autonomous systems to accurately determine when their internal knowledge is sufficient and when they must dynamically trigger external tools or search APIs to resolve deficits.
The utility tax of current mitigation strategies
Understanding why LLMs hallucinate hinges on separating two capabilities: a model knowing facts versus knowing what is known. Historically, most factuality gains in AI have come from expanding the knowledge boundary, meaning developers simply pack more facts into the model’s parameters through larger scale and more training data.
However, expanding a model’s knowledge does not automatically improve its boundary awareness, which is its ability to distinguish the known from the unknown and recognize its own limitations.
“There are broadly two ways to improve LLM factuality,” Gal Yona, Research Scientist at Google and co-author of the paper, told VentureBeat. The first is continuing to teach the model more facts. But, Yona notes, “model capacity is finite, and the long tail of knowledge is effectively infinite.”
Once models hit this limit, the hope is they know what they don’t know and simply abstain from answering. However, this is inherently difficult for LLMs.
“This is why most practical attempts to reduce hallucinations through various interventions don’t actually make it to deployment,” Yona explains. “They do reduce hallucinations, but they also hurt utility, because the model ends up refusing to answer questions it actually does know.”
This inability to distinguish between knowns and unknowns creates what the paper’s authors call the “utility tax.” Enforcing a zero-hallucination standard requires the model to abstain whenever it is even slightly uncertain, discarding massive volumes of completely valid information. For example, the authors demonstrate that reducing an underlying 25% error rate down to a strict 5% target forces developers to discard 52% of the model’s correct answers.
Treating all errors as hallucinations forces enterprise systems to choose between trustworthiness and helpfulness. Application developers are generally unwilling to pay this massive utility tax and render their models unhelpful.
Consequently, they optimize systems to prioritize coverage, forcing models to operate in a state where they continue to generate confident hallucinations.
Reframing hallucinations as confident errors
To move past the utility tax, the researchers propose to stop treating any factual error as a hallucination. Instead, they reframe hallucinations as “confident errors”: incorrect information delivered authoritatively without appropriate qualification.
This subtle reframing dissolves the strict “answer-or-abstain” dichotomy and allows the model to express its uncertainty.
In this new framework, if a model makes a factual mistake but appropriately hedges its response (e.g., by stating, “I am not completely sure, but I think…”), it isn’t a hallucination. It is simply a hypothesis offered to the user for consideration. By expressing uncertainty, the AI preserves its utility—sharing whatever partial or likely knowledge it has—without violating the user’s trust.
However, if an AI assistant hedges all its responses with a disclaimer, the user is forced to double-check everything, defeating the purpose of the tool entirely.
The solution the researchers propose is “faithful uncertainty.” This approach requires aligning a model’s linguistic uncertainty, or the words it uses to express doubt, with its intrinsic uncertainty, which is its actual, internal statistical confidence in that specific answer. This ensures the model only hedges when its internal state genuinely reflects conflicting or low-probability information.
Faithful uncertainty forms a core component of “metacognition,” the AI’s ability to be aware of its own uncertainty and act on it. To understand this practically, consider the intuitive example of consulting a doctor. We do not trust doctors because they are all-knowing. We trust them because they reliably distinguish between a confident diagnosis (“You have a fracture”) and an educated hypothesis (“It might be a sprain, but let’s run some tests”).
Practical implications for enterprise AI
Under the new framing, errors where a model is genuinely confident but factually incorrect are categorized as “honest mistakes.” This casts knowledge expansion (training the model on more data) and faithful uncertainty as completely complementary efforts. Knowledge expansion pushes the absolute knowledge boundary outward to minimize honest mistakes, while faithful uncertainty honestly communicates wherever that boundary currently lies.
This new framing has important implications for agentic applications. The shift to agentic AI might make it seem like knowing what the model doesn’t know is redundant, since models can just search external databases. However, access to external tools actually amplifies the need for faithful uncertainty. In agentic systems, metacognition becomes the central control layer that governs the entire system.
External tools solve the storage problem because the model no longer needs to encode every fact into its parameters. However, this introduces a new control problem: managing when to retrieve information, verify facts, and orchestrate these external tools. Without faithful uncertainty, an agent is essentially flying blind and must rely on external, static heuristics or over-engineered scaffolds.
“The model might search for something it already knows confidently—wasting latency and cost for no gain. Or the opposite: it confidently answers from memory when it should have searched, producing a plausible but wrong output,” Yona said. Today’s agent harnesses try to solve this externally with query classifiers or always-search rules, but Yona notes that these are “static and brittle.” By using its intrinsic uncertainty to regulate its own behavior, the agent dynamically optimizes its tool use, choosing to invoke a search tool only when its internal confidence is genuinely low.
Beyond deciding when to search, faithful uncertainty is critical for evaluating the results of a search. If a tool returns low-quality or unexpected information, a metacognitive agent does not blindly accept whatever appears in its context window. Instead, it uses its uncertainty awareness to weigh the retrieved external signals against its own internal priors. This prevents sycophantic behavior where the system might otherwise trust external sources that conflict with its actual known knowledge.
The bootstrapping paradox: The catch to teaching uncertainty
For enterprise builders, achieving this faithful uncertainty is trickier than it sounds. It requires teaching models the syntax of uncertainty through supervised fine-tuning (SFT). Because pre-trained models are mostly fed authoritative text, they must be explicitly taught to say things like, “I’m not entirely sure, but I think VentureBeat was founded in…”
But SFT introduces a “bootstrapping paradox.” Unlike standard training datasets where the “right answer” is the same regardless of the model, the ground truth for uncertainty is the model’s own dynamic knowledge base.
“Here’s the catch: the ‘correct’ expression of uncertainty is inherently dynamic, because it depends on what this particular model knows or doesn’t know at this particular point in training,” Yona said. “If you train on a label that says ‘I don’t know X’ but the model actually does know X, you’ve taught it to hallucinate uncertainty… The training data is static, but the target is a moving one, and that’s the fundamental tension teams need to grapple with.”
The road to self-aware AI
For enterprises looking to implement these capabilities without expensive retraining, prompting serves as the most accessible entry point. “Prompt engineering is already something most engineers do today, this provides the lowest-friction path to improving metacognitive behavior today,” Yona said. Enterprise developers can explore frameworks like MetaFaith, an open-source project previously co-authored by Yona, to begin applying metacognitive prompting to off-the-shelf models.
However, Yona cautions that “there is still substantial headroom that prompting alone doesn’t solve,” meaning the industry will eventually need to rely on advanced reinforcement learning (RL) to bake metacognition deeply into model training.
Ultimately, as enterprises transition from isolated chat applications to complex, multi-agent workflows, self-awareness will become a defining prerequisite for reliable autonomy. But evaluating whether a model truly possesses this awareness remains a profound technical challenge.
“How do you actually evaluate whether a model can sense its internal states?” Yona asks. “Even in humans, it’s hard to define or separate ‘true’ self-monitoring abilities from a capable reliance on proxies. We face exactly the same challenges with LLMs: a model might learn to mimic the style of uncertainty without truly sensing its internal state. Developing evaluation frameworks that can tell the difference is one of the most important open problems in this space.”