Applause released the results of its third
annual
State of Digital Quality in AI Survey, highlighting a significant
disconnect between substantial investments in generative AI (Gen AI)
and the adoption of essential quality assurance (QA) practices within
the software development lifecycle (SDLC). Considering the rapid, global
rise of Gen AI apps and agentic AI, which enables autonomous
decision-making and execution without human intervention, rigorous
crowdtesting throughout the SDLC is critical to mitigating expanding
risks associated with the technology. Over 4,400 independent software
developers, QA professionals and consumers worldwide participated in the
survey, which explored common AI use cases, tools and challenges, as
well as user experiences and preferences.
"The
results of our annual AI survey underscore the need to raise the bar on
how we test and roll out new generative AI models and applications,"
said Chris Sheehan, EVP of High Tech & AI, Applause. "Given massive
investment in the technology, we'd like to see more developers
incorporate AI-powered productivity tools throughout the SDLC, and
bolster reliability and safety through rigorous end-to-end testing.
Agentic AI is ramping up at a speed and scale we could hardly have
imagined, so the risks are now amplified. Our global clients are already
ahead of the curve by baking broad AI testing measures into development
earlier, from training models with diverse, high-quality datasets to
employing testing best practices like red teaming."
Key findings:
Embedding AI throughout development delivers powerful competitive advantages, but many organizations are slow to adopt.
- Over
half of the software professionals surveyed believe Gen AI tools
improve productivity significantly, with 25% estimating a boost of
25-49% and another 27% seeing increases of 50-74%.
- Yet,
23% of software professionals say their integrated development
environment (IDE) lacks embedded Gen AI tools (e.g., GitHub Copilot,
OpenAI Codex), 16% aren't sure if the tools are integrated with their
IDE, and 5% have no IDE.
- While
red teaming, or adversarial testing, is a best practice to help
mitigate risks of inaccuracy, bias, toxicity and worse, only 33% of
respondents reported using this technique.
- The
top AI testing activities involving humans include prompt and response
grading (61%), UX testing (57%) and accessibility testing (54%). Humans
are also essential in training industry-specific or niche models; 41% of
developers and QA professionals lean on domain experts for AI training.
Businesses
are investing heavily in AI to enhance customer experiences and reduce
operational costs - but flaws are still reaching users.
- Over
70% of developers and QA professionals who responded said their
organization is developing AI applications and features. Chatbots and
customer support tools are the top AI-powered solutions being built
(55%). And, just over 19% have started to build AI agents.
- Within
the past three months, 65% of users reported that they have encountered
problems using Gen AI, including responses that lacked detail (40%),
misunderstood prompts (38%), showed bias (35%), contained hallucinations
(32%), were clearly incorrect (23%) or included offensive content
(17%). Only 6% fewer people experienced hallucinations since last year's
survey.
- Gen
AI users are fickle, as 30% have swapped one service for another, and
34% prefer different Gen AI services for different tasks.
Additional insights:
- Consumer demand for multimodal capabilities has increased.
78% of consumers say multimodal functionality or the ability to interpret multiple types of media is important to them in a Gen AI tool, compared with 62% last year.
- GitHub Copilot (37%) and OpenAI Codex (34%) are still the AI-powered coding tools of choice.
They
were the favorites in 2024, too, but the gap between their usage is
closing. Last year, GitHub Copilot was preferred by 41% of respondents,
and OpenAI Codex by just 24%.
- QA professionals are turning to AI for basic support of the testing process.
The top three use cases are test case generation (66%), text generation for test data (59%) and test reporting (58%).
Sheehan
continued, "Enterprises best positioned to capture value with
customer-facing generative AI applications understand the important role
human intelligence can play. While every generative AI use case
requires a custom approach to quality, human intelligence can be applied
to many parts of the development process including model data, model
evaluation and comprehensive testing in the real world. As AI seeps into
every part of our existence, we need to ensure these solutions provide
the exceptional experiences users demand while mitigating the risks that
are inherent to the technology."