More than half of all ChatGPT answers to software engineering questions are wrong, according to a new study. study by Purdue University. Researchers also found that 34% of users prefer answers to questions about programming problems created by ChatGPT to those posted by human users on Stack Overflow, despite errors in the AI-generated answers. An expert said Technical monitor that individual programmers’ professional reputations are at risk if they continue to rely on ChatGPT to solve their coding dilemmas.
OpenAI launched its chatbot in November 2022, initially based on the large GPT-3 language model. It has since added a premium version with access to GPT-4, code interpretation and third-party plugins. The underlying model is also used to power Microsoft’s Github Coding Assistant. Co-pilot which is widely used by developers.
The Purdue study is the first to comprehensively examine the characteristics and usability of ChatGPT’s responses to the types of questions regularly shared online. The team asked the platform to answer 517 questions previously posted on Stack Overflow, where there was a correct answer known to humans using the platform.
Earlier this year, as ChatGPT’s popularity rapidly grew, Stack Overflow banned AI-generated responses. At the time, he described ChatGPT’s answers as superficially good, but systematically incorrect. “Publishing answers created by ChatGPT and other generative AI technologies is significantly harmful to the site and to users who ask questions and search for correct answers,” a Stack Overflow spokesperson explained at the time.
OpenAI has made incremental improvements to the platform and underlying models since its first release, most notably through GPT-4, but it is still not accurate at all times. Stack Overflow has also since adopted AIbut as a way of categorizing its content rather than answering questions.
The new study found that half of the answers were wrong because ChatGPT did not correctly grasp the concept of the question. “Even when he manages to understand the question, he fails to show how to solve the problem,” the authors write. “(He) often focuses on the wrong part of the question or gives high-level solutions without fully understanding the finer details of a problem.” Researchers found that he also had limited reasoning ability, which led to the creation of solutions, codes, and formulas without any thought about the outcome.
Users prefer ChatGPT responses over software
OpenAI has since added a code interpreter to ChatGPT, which allows the AI to run the code it creates in a sandbox to check for errors and assess the quality of its output. This in turn allows him to check the final answer, make changes and present a more accurate solution. However, this feature remains in beta and is only available to ChatGPT Plus subscribers.
Despite its obvious drawbacks and the fact that 77% of responses are more wordy than those from human contributors, many users still rely on ChatGPT to answer their burning code questions. “ChatGPT responses are still preferred in 39.34% of cases due to their comprehensiveness and well-articulated linguistic style,” the authors said. “Our result implies the need for careful review and rectification of errors in ChatGPT, (while) making its users aware of the risks associated with seemingly correct ChatGPT answers.”
Owen Morris, director of enterprise architecture at IT intelligence firm Doherty Associates, said: Technical monitor There are many advantages to using AI but also disadvantages that users should always consider before using platforms like ChatGPT. “Tools like ChatGPT offer insights based on the data they are trained on (including analytics from the Internet and other sources) and will retain their biases, so human involvement remains essential for accuracy and added value,” says Morris. “It’s important to remember to engage your team so they can contribute their own domain-specific knowledge and data to improve the applicability of the models. »
He warned that without human oversight to contextualize and critically evaluate the responses ChatGPT generates about software, “there is a considerable risk that you will incorporate incorrect or harmful information into your work, jeopardizing its quality and, more broadly, your professional reputation.”
Learn more: Most AI training data could be synthetic by next year – Gartner