Applause 2025 AI Survey: Failure to Prioritize Testing and Embed Gen AI in Development at Odds With Increasing AI Investments : @VMblog

Article

Search:

Follow VMblog.com:

Improve end user experience in VDI, DaaS and physical endpoint environments

Applause 2025 AI Survey: Failure to Prioritize Testing and Embed Gen AI in Development at Odds With Increasing AI Investments

Applause released the results of its third annual State of Digital Quality in AI Survey, highlighting a significant disconnect between substantial investments in generative AI (Gen AI) and the adoption of essential quality assurance (QA) practices within the software development lifecycle (SDLC). Considering the rapid, global rise of Gen AI apps and agentic AI, which enables autonomous decision-making and execution without human intervention, rigorous crowdtesting throughout the SDLC is critical to mitigating expanding risks associated with the technology. Over 4,400 independent software developers, QA professionals and consumers worldwide participated in the survey, which explored common AI use cases, tools and challenges, as well as user experiences and preferences.

"The results of our annual AI survey underscore the need to raise the bar on how we test and roll out new generative AI models and applications," said Chris Sheehan, EVP of High Tech & AI, Applause. "Given massive investment in the technology, we'd like to see more developers incorporate AI-powered productivity tools throughout the SDLC, and bolster reliability and safety through rigorous end-to-end testing. Agentic AI is ramping up at a speed and scale we could hardly have imagined, so the risks are now amplified. Our global clients are already ahead of the curve by baking broad AI testing measures into development earlier, from training models with diverse, high-quality datasets to employing testing best practices like red teaming."

Key findings:

Embedding AI throughout development delivers powerful competitive advantages, but many organizations are slow to adopt.

Over half of the software professionals surveyed believe Gen AI tools improve productivity significantly, with 25% estimating a boost of 25-49% and another 27% seeing increases of 50-74%.
Yet, 23% of software professionals say their integrated development environment (IDE) lacks embedded Gen AI tools (e.g., GitHub Copilot, OpenAI Codex), 16% aren't sure if the tools are integrated with their IDE, and 5% have no IDE.
While red teaming, or adversarial testing, is a best practice to help mitigate risks of inaccuracy, bias, toxicity and worse, only 33% of respondents reported using this technique.
The top AI testing activities involving humans include prompt and response grading (61%), UX testing (57%) and accessibility testing (54%). Humans are also essential in training industry-specific or niche models; 41% of developers and QA professionals lean on domain experts for AI training.

Businesses are investing heavily in AI to enhance customer experiences and reduce operational costs - but flaws are still reaching users.

Over 70% of developers and QA professionals who responded said their organization is developing AI applications and features. Chatbots and customer support tools are the top AI-powered solutions being built (55%). And, just over 19% have started to build AI agents.
Within the past three months, 65% of users reported that they have encountered problems using Gen AI, including responses that lacked detail (40%), misunderstood prompts (38%), showed bias (35%), contained hallucinations (32%), were clearly incorrect (23%) or included offensive content (17%). Only 6% fewer people experienced hallucinations since last year's survey.
Gen AI users are fickle, as 30% have swapped one service for another, and 34% prefer different Gen AI services for different tasks.

Additional insights:

Consumer demand for multimodal capabilities has increased.
78% of consumers say multimodal functionality or the ability to interpret multiple types of media is important to them in a Gen AI tool, compared with 62% last year.

GitHub Copilot (37%) and OpenAI Codex (34%) are still the AI-powered coding tools of choice.
They were the favorites in 2024, too, but the gap between their usage is closing. Last year, GitHub Copilot was preferred by 41% of respondents, and OpenAI Codex by just 24%.

QA professionals are turning to AI for basic support of the testing process.
The top three use cases are test case generation (66%), text generation for test data (59%) and test reporting (58%).

Sheehan continued, "Enterprises best positioned to capture value with customer-facing generative AI applications understand the important role human intelligence can play. While every generative AI use case requires a custom approach to quality, human intelligence can be applied to many parts of the development process including model data, model evaluation and comprehensive testing in the real world. As AI seeps into every part of our existence, we need to ensure these solutions provide the exceptional experiences users demand while mitigating the risks that are inherent to the technology."

Published Monday, March 31, 2025 10:21 AM by David Marshall