Why AI Needs Better Data to Deliver Real Results-Dun & Bradstreet

D&B Editors

2025-10-17

In a recent survey of U.S. knowledge workers, more than a third of executives described generative artificial intelligence (GenAI) adoption as “a massive disappointment.” Almost two-thirds said GenAI adoption has led to tension and division within their company.

So what’s happening? Why aren’t AI applications living up to their potential to accelerate business productivity, efficiency, and performance?

Often, an organization’s data is to blame.

Because artificial intelligence systems require accurate and trustworthy data, companies should focus on strong data management to keep their data current, accessible, and actionable.

“Large language models, which power GenAI, are so good that they create the illusion that you can just throw in any data and gold starts flying out. That's obviously not the case,” explains Ilya Meyzin, Dun & Bradstreet’s Head of Data Science. “The old adage of ‘garbage in, garbage out’ still applies. For data to be consumable by artificial intelligence, it needs to be clean, current, contextualized, and compliant.”

According to Meyzin, enterprises don't always understand the impact of data quality on AI performance and output.

“One common misconception is that during implementations, AI is the hardest part. Often, it’s actually the easiest part of complex projects,” Meyzin explains. “The hardest part is getting all the data right — extracting data, moving it across multiple systems, and so on. When AI implementations fail, data is frequently the cause.”

What Makes Data AI-Ready?

Artificial intelligence applications thrive on structured, reliable, and meaningful data. Master data management (MDM) helps ensure that the data used by AI systems is comprehensive and reliable.

Evaluating and improving data quality are crucial steps for master data management strategies. Organizational teams should focus on five core pillars to help them effectively assess the state of their data.

Use These Core Pillars to Help Gauge Data Quality

1. Accuracy: Inaccurate data can lead to flawed AI output and weakens business decisions that use it. MDM helps ensure that data is validated against trusted sources — reducing errors and enhancing reliability.

2. Completeness: Missing data can prevent AI models from performing optimally. This pillar supports MDM frameworks by identifying and filling data gaps. That strengthens data so AI can use it effectively.

3. Consistency: Variable data can confuse AI algorithms and lead to inefficiencies. This pillar helps promote uniformity within MDM by encouraging and reinforcing data governance policies.

4. Timeliness: AI models require current information to generate meaningful, relevant insights. MDM strategies often include mechanisms for real-time data updates and synchronization.

5. Provenance: Understanding where data comes from and how it was changed increases transparency and trust in AI results. MDM processes usually include audit trails and metadata management to support data lineage.

PIDs and Interoperability Matter for MDM and AI

The five core pillars of data quality also impact persistent identifiers and interoperability, especially in the context of AI applications and master data management.

Persistent identifiers (PIDs), such as unique customer IDs or product SKUs, help link data to the correct business entity. PIDs reduce duplication and classification errors. They also help departments and systems focus on the same entity, even if some attributes vary. PIDs also make it easier to trace the origin and history of a data record.

Interoperability is the ability of AI systems and technologies to use data in real time or near-real time. It depends heavily on strong data standards and definitions. Sharing these among teams can reduce an enterprise’s risk of data siloes and data errors.

Improve AI Performance With Master Data Management

Master data management focuses on creating a single source of truth for organizations. This single version of truth is a “golden record” — a unified view that helps ensure teams and systems are all working from the same reliable information.

MDM helps eliminate data silos and conflicting, duplicative entries. Because stakeholders use the same up-to-date information, organizational teams stay coordinated and aligned.

The adoption of generative AI or machine learning technology by an organization makes master data management even more critical. To help prevent AI models from being trained incorrectly or developing biases, it’s crucial to always keep data clean, accurate, and organized.

AI can itself help enterprises improve MDM by performing these functions:

Data integration, cleaning, and validation (to reduce errors and improve consistency)
Data enrichment (to improve the value of data)
Developing actionable insights for decision-making
Identifying subtle or broad trends invisible to human analysts
Reducing B2B fraud through entity resolution, flagging synthetic identities, and spotting material misrepresentations

How Can Poor Data Mislead AI? 

There are so many ways for data to be incorrect. Here are some common problems with “dirty data” and how it can damage the performance of AI applications.

AI hallucinations and misinformation: Missing information or data gaps can cause generative AI to "hallucinate." This means it can produce false and bizarre content without any indication that it has done so.
Connection to data: Hallucinations often stem from poor-quality training data or lack of access to accurate, contextual information during inference.
Why it matters: If your AI generates results based on incomplete or disorganized data, it may make mistakes. This puts decisions at risk, especially those intended to support customers or compliance.
Bias in AI and ethical use: When data contains inaccurate or biased information, AI tends to replicate and expand on the errors. AI may even amplify historical inaccuracies or gaps. 
Connection to data: Bias in AI often originates from biased or unrepresentative training data.
Why it matters: Especially in hiring, lending, or healthcare, biased outputs can lead to significant legal and reputational risks.
Shadow AI: When employees use AI tools outside of official IT oversight, it creates hidden risks. These tools may not comply with company policies, data governance standards, or security protocols.
Connection to data: Shadow AI often involves unvetted tools accessing sensitive or proprietary data without proper safeguards.
Why it matters: It can lead to data leaks, compliance violations, and inconsistent outputs. These may undermine trust and increase organizational risk.
Injection attacks: When malicious inputs are inserted into an AI system, they can manipulate its behavior or outputs. These attacks exploit vulnerabilities in how the system processes user-provided data, often leading to unintended or harmful results.
Connection to data: Injection attacks typically occur when input validation is weak or absent, allowing attackers to feed crafted data into prompts, queries, or models.
Why it matters: These attacks can expose sensitive information or cause reputational damage, particularly in customer-facing applications or decision-making systems.

From Smarter Models to Global Data Moves: Three Trends to Watch

From nimble new models to autonomous agents to global data flows, three key trends are redefining what’s possible for (and what’s challenging) teams to more successfully implement master data management for optimal AI performance.

1. Agentic AI: Autonomous Helpers for Data Management

Agentic AI refers to systems that can act independently to automate and complete complex tasks — without constant human input. Agentic AI can identify and resolve duplicate records across systems, enrich master data by pulling in verified external sources, and monitor data flows and flag inconsistencies before they cause downstream issues.

These agents learn from patterns and adapt over time. That means less manual cleanup, fewer bottlenecks, and more time for teams to focus on strategic work.

2. Small Language Models: Lightweight, Targeted, and Smart

Small language models (SLMs) are designed for specific tasks. They are faster and cheaper to deploy than LLMs. They can be easier to adjust for industry-specific language and data.

SLMs also can be easier to secure, since they can run on-premises or in protected environments. They can support master data management by classifying data, validating entries, and even translating between different data schemas.

3. Cross-Border Data Transfers: A Growing Challenge for AI and MDM

Enterprises increasingly rely on cross-border data exchanges to operate efficiently and innovate at scale. As companies share customer records, supply chain data, and AI training sets more frequently, international regulations have multiplied, and they often include different requirements for security, privacy, etc.

A higher volume of cross-border data transfers can complicate how companies track data lineage and provenance. This can pose a challenge for master data management. Enterprises may need to build region-specific governance frameworks to comply with regulatory requirements.

With MDM, organizations can better prepare themselves to navigate this terrain effectively and scale AI responsibly.

Humans Still Matter for AI and MDM: Why Oversight Is Essential

Even as AI becomes more capable and autonomous, human oversight remains critical, especially when data-driven decisions affect customers, compliance, or brand reputation.

Ultimately, the goal of AI systems is to empower humans. “AI or GenAI will follow the same trends as we’ve seen over the last century with any technology,” adds Meyzin. “Technology allows us to free up our time that we spend on very manual, repetitive tasks and spend more of our energy and focus on really interesting, higher-value, more challenging aspects of whatever it is that we want to accomplish.”

By designing systems where AI and people collaborate, organizations can build data ecosystems that are efficient, trustworthy, and resilient. Disciplined data practices that blend automation with human review help organizations scale responsibly and manage data ethically. In the context of master data management, this balance is key.

To unlock the full potential of artificial intelligence systems, organizations must shift their focus from flashy tools to quality data. Clean, current, and usable data can mean the difference between AI that disappoints and AI that delivers. Companies that invest in artificial intelligence and prioritize enterprise-wide data quality will be best positioned to drive real productivity, performance, and competitive advantage.

Why AI Needs Better Data to Deliver Real Results

Stay up-to-date with Dun & Bradstreet Hong Kong