Chris Sherley, Kosmo Karantonis, and Catherine Price-Ackers, EY Sweeney.
In 2024, Mark Ritson wrote that ‘The era of synthetic data is upon us… it’s bad news for market research companies.’ He cautioned that synthetic consumers could predict 95% of survey responses at a fraction of the cost, leading to what he termed ‘partial obsolescence’ for market researchers.
For those unfamiliar with the concept, synthetic data refers to artificially-generated information that mimics real-world data but does not originate from actual consumer interactions. In market research, this usually includes artificially generated quantitative data as well also synthetic personas or qualitative representations of target audiences.
One of the main benefits of synthetic data is that it is highly cost-effective, leveraging deep learning, expert models, and retrieval augmentation to mimic human decision-making at minimal cost and in a fraction of the time that traditional research requires.
However, despite this technology being widely discussed for several years, the global research industry has flourished over the past four years, with Statista forecasting continued growth in 2025. This trend seems counterintuitive for an industry facing impending obsolescence.
So, why hasn’t synthetic data had a bigger impact to date? Are insights managers simply conservative and unwilling to adopt a technology that could save them millions? That seems unlikely. Instead, we believe synthetic data is emerging as a tool to support, rather than replace traditional research programs. In this article, we have provided a detailed explanation of why we believe this is the case and how researchers can use synthetic data to augment their own programs.
Although synthetic data has many potential applications, we believe some of the more ardent advocates like Ritson have overlooked an essential aspect of this discussion. Research is about more than collecting data points, it serves as a bridge between consumers and organisations, fostering dialogues that are crucial for healthy market relationships.
Consumers want to feel connected with the brands they engage with, and research helps facilitate this. Even if artificial intelligence (AI) can predict human decision-making and attitudes, the process can feel dehumanising for real-world, everyday consumers.
A new study with more than 15,000 everyday consumers around the world, including 1,000 people in Australia, indicates that 70% of Australian consumers are concerned about the use of synthetic data, a 3% increase in just six months. Among these concerns, 79% worry about false information being generated by AI and used to make decisions, while 68% fear AI outputs could negatively impact vulnerable or at-risk communities.
These concerns are valid, given the growing evidence of cultural and gender biases in generative AI. Hallucinations, where generative platforms create outputs that appear authentic but are entirely fabricated, also pose additional risks by potentially perpetuating social biases or alienating marginalised communities.
We recognise that most synthetic platforms aim to reduce these issues by employing a mixture of expert models and retrieval augmentation to enhance accuracy. However, these measures only mitigate, rather than eliminate, the issues of bias and hallucinations. Furthermore, the underlying models that power them are often time-bound and require ongoing maintenance.
To be clear, we’re not arguing against the use of synthetic data. It has a place and can serve as a valuable tool alongside primary research, big data, and industry data. Furthermore, synthetic data can fill knowledge gaps where real-world data may be limited, providing organisations with a more comprehensive view of their target audience.
But it should not replace genuine conversations with real-world customers. Even if it can predict with 95% accuracy, it’s often those subtle 5% of insights, derived from the unpredictable nature of human decision-making, that yields the most significant impact for a business. In our experience, these outliers often deliver real breakthroughs in the research process because the nuances of consumer choices cannot be fully captured by large language models (LLM) trained solely on historical internet data; these insights need to come directly from the source.
This is particularly pertinent for government departments, which bear an ethical responsibility to engage with communities who are affected by their decisions, especially marginalised groups. Their lived experiences are often misaligned with the patterns learned by most LLMs by reading internet data.
AI researchers have a term, augmentation, which refers to the use of technology to assist and support human processes. This can involve any form of AI technology, such as machine learning models, and is designed to support individuals in performing tasks more efficiently, accurately, or effectively. For example, in a workplace setting, augmentation might include using AI-driven software to analyse data and provide insights, allowing employees to make informed decisions faster. In healthcare, augmented reality can assist surgeons by overlaying critical information during procedures. The goal of augmentation is to complement human abilities, reduce cognitive load, and improve overall productivity, enabling individuals to focus on more complex and creative aspects of their work.
We believe this perspective is the most appropriate way to view synthetic data when it comes to research. The current generation of narrow AI cannot replace human researchers and insights managers; the complexity of human decision-making is too intricate. By integrating synthetic data into traditional research practices and centring our efforts around consumers, we can develop a more holistic understanding of consumer behaviour. Ultimately, the goal of market research should be to foster genuine connections between organisations and consumers, helping their voices be heard and valued in the decision-making process.
An example of how this can be done is by running traditional research programs (e.g., a qualitative study) and using the outputs of these to calibrate synthetic data generation or agents in a process known as retrieval augmentation. Alternatively, it’s possible to use synthetic consumers to help test and then refine questioning approaches before engaging with real-world respondents – ensuring a study is a relevant and refined as possible.
In this evolving landscape, the challenge lies in striking the right balance between innovation and authenticity, paving the way for a future where synthetic and traditional data coexist. While synthetic data can be a useful tool in our industry, it is critical to find ways for it to support, rather than replace high-quality research.
The views expressed in this article are the views of the author, not Ernst & Young. This article provides general information, does not constitute advice and should not be relied on as such. Professional advice should be sought prior to any action being taken in reliance on any of the information. Liability limited by a scheme approved under Professional Standards Legislation.
This 90-minute course, led by market research expert Ray Poynter, provides a clear overview of synthetic data—how it's generated, applied in quantitative and qualitative research, and managed ethically. It includes case studies, AI tools, and guidance on choosing reliable providers.
Ideal for market researchers, analysts, and business leaders, the session equips participants with practical knowledge to confidently integrate synthetic data into their research practices.
For more details, go here.
Understanding Synthetic Data: Methods, Tools, and Best Practices