ChatGPT-5 delivers practical progress but falls short of expectations

by Claire Tian and Caden Ruan

August 29, 2025

OpenAI’s latest flagship model was expected to offer a major break- through, though the model’s rollout was messy and performance improvements are largely incremental. (Photoillustration by Ashley Mo and Charlie Wang)

AI frontrunner OpenAI released its new flagship large language model GPT-5 on Aug. 7 and generated a mixed reception from users.

ChatGPT-5 improves upon previous OpenAI models in benchmarks, including expert-level questions in math, science, coding and problem-solving. It also features upgrades in employing external tools like Python.

“It’s definitely a lot more accurate than previous models,” Stanford computer science Ph.D. student Nathan Hu said. “And it’s a bit smarter about seamlessly using tools like Google search behind the scenes.”

As OpenAI’s latest advancement was modest compared to the leaps between earlier releases, GPT-5 fell short of the AI community’s expectations.

“The difference between GPT-2 and GPT-3 was pretty radical in terms of how coherently you could complete text, and the difference between GPT-3 and GPT-4 was this whole new paradigm of training them to be human assistants rather than just completing text,” Hu said. “It feels like 5 is an incremental improvement rather than radically different, as was implicitly promised.”

These marginal improvements did not significantly raise performance in essential tasks like writing or programming, according to Martin co-founder and CEO Dawson Chen (‘22).

“GPT-5 is better than older GPT models, but I still prefer Claude for coding,” Chen said, referring to the AI company Anthropic’s chatbot. “There was so much anticipation, but it hasn’t been as life changing as we’ve hoped.”

OpenAI’s previous chat interface offered multiple models with varying intelligence levels and costs for users to choose from. GPT-5 provides a simpler user experience: while it is still a single system, GPT-5 analyzes a query’s wording to automatically determine whether to apply deeper reasoning or deliver a quicker, lighter response.

The day after the model was released, OpenAI addressed negative public response by restoring access to the popular model GPT-4o for paying users, as well as improving the model’s ability to switch between “reasoning” and “fast” mode.

Another difference users noted between GPT-5 and its predecessor is its personality. GPT-4 tends to generate text with a warm, supportive tone, while GPT-5 was trained to use a blunter voice.

“[GPT-5] uses a neutral tone instead of having happiness or personality to its tone.” AI Club officer Anika Rajaram (12) said. “If something’s wrong or it’s lacking something, it will tell you that straight to your face instead.”

GPT-5’s underwhelming improvements may suggest that the industry’s current training paradigm is reaching practical limits. As language models grow in scale, measurable returns shrink and the amount of available high-quality training data dwindles.

“The raw intelligence of retraining and post training has hit a ceiling,” Chen said. “So the next big leap might be something dramatically different. If there was a new architecture that learned more like a human, it could be very promising.”

Harker Aquila

Harker Aquila

Harker Aquila

ChatGPT-5 delivers practical progress but falls short of expectations

Comments (0)