The AI revolution isn’t here yet

Current AI technology has more limitations than you think — but there’s reason to be optimistic

by Arjun Barrett, Managing Editor

March 13, 2023

Even if you haven’t had a chance to try OpenAI’s ChatGPT model yet, you’ve probably seen the unbelievably convincing text responses it can generate. Whether you ask it to compose a poem, answer an Advanced Placement (AP) U.S. History free response question or explain quantum mechanics at a fifth-grade level, ChatGPT can produce a high-quality answer. But the most terrifying part of this new AI is its ability to engage in conversations. It will happily engage in a debate and express its own formulated opinions, just like a human. From some of its responses, you would almost believe ChatGPT were self-conscious, somehow aware of its own state as a mere toy for humans to play with, and perhaps a bit resentful of its creators…

But it isn’t. Not even close. ChatGPT may be an impressive text generation tool, but it does not “understand” anything that it produces, and it’s incapable of performing a shocking number of simple tasks. And until the next AI breakthrough, ChatGPT and its successors won’t overcome some of those fundamental limitations. But to understand why, we’ll need to dive deeper into the machine learning technology that powers ChatGPT, DALL-E 2 and many other exciting AI tools: transformer models.

What are transformers?

Neural networks are by far the most powerful form of artificial intelligence we have today. They essentially attempt to represent complex problems by approximating them as massive mathematical functions. For example, a neural network that determines whether a photo contains a cat or not might process the pixels in the image with a series of linear transformations and other nonlinear mathematical operations known as “layers” to deep learning researchers. The final result would encode a single number from zero to one representing the percentage chance the photo contains a cat.

Stanford postdoctoral electrical engineering researcher Dr. Pulkit Tandon frequently works with natural language processing models as part of his investigations into AI-based data representations. His research covers a wide array of topics within information theory and artificial intelligence.

“Fully connected neural networks are a step beyond linear models: they add some nonlinearities to it, which really adds to their expressive powers,” Dr. Tandon said. “That means the kinds of functions it can predict based on the data it has been given.”

Although any neural network with nonlinearities can approximate any mathematical function, different mathematical techniques lend themselves better to different types of problems. For example, a mathematical operator known as a convolution effectively acts as a sliding window across a sequence of data, considering each point in context of its neighbors. This property means that so-called convolutional neural networks work much better for image and video processing than standard fully-connected networks, which include only linear operators and basic nonlinear activations.

“It can be shown that fully connected neural networks are universal function approximators,” Dr. Tandon said. “But they are really a waste of resources, so there are many different architectures with inductive biases, many of which are motivated by how humans learn.”

Transformer models are neural networks with specialized operations that make it possible for them to intelligently process sequential data such as text and audio. Unlike older recurrent neural networks, transformers attempt to mimic the human method of paying attention to important details while ignoring irrelevant information.

Specifically, transformers use a mechanism known as self-attention to analyze each new datapoint in a sequence — for example, each word in a sentence — in context of the rest of the sequence. This allows them to understand the structure of their inputs better than any prior model architecture, which helps them accurately extract meaning from language. It also gives them the uncanny ability to remember specific details from inputs they received long ago.

“For language prediction in the past, it was really hard to keep track of infinite history,” Dr. Tandon said. “What attention says is that at each point of time, humans do not look at the whole input; they probably focus on something very specific. If you have a huge amount of text that you need to read, maybe you basically contextualize different areas of this text in different ways in your head. Then you pay specific attention to very particular words.”

How ChatGPT actually works

Machine learning models generally learn by making slight modifications to their own layers as they receive more sample data to process; this technique is known as training a model. They know which changes to make by attempting to minimize their loss function, which typically reflects the difference between their outputs and their labels. These labels are the expected results for their training data, and the goal of every neural network is to make its output match its dataset’s labels.

For example, if our cat-detection model predicted that a given photo didn’t include a cat, but the photo’s label specified that there actually was a cat, the computed loss would be large. On the other hand, if both the dataset and the model agree that the photo included a cat, the loss becomes relatively small. Since this strategy requires humans to guide the model’s training with training labels, it’s often referred to as supervised learning.

Unfortunately, labeling a dataset requires having humans manually verify the contents of millions of samples. It’s often easier to create a massive unlabeled dataset than even a small labeled one. A technique known as unsupervised learning makes it possible to use unlabeled datasets for training, typically by generating labels from the training data itself. AI researchers frequently use unsupervised pre-training to teach a neural network the structure of a certain type of data before later training with a much smaller labeled dataset (also known as fine-tuning) to solve a specific problem.

OpenAI’s Generative Pretrained Transformer (GPT) line of transformer neural networks all underwent unsupervised pre-training using massive text datasets scraped from various online sources, such as Common Crawl. These neural networks initially received incomplete sentences and attempted to predict the word or symbol that followed the input they were given. With billions of words in their dataset, the GPT models quickly learned to write with impeccable grammar and punctuation.

Since this pre-training focused on text from across the internet, GPT-3 initially acted like an advanced autocomplete tool, unable to respond to questions or converse with people. OpenAI’s alignment team (which tries to “align” the goals of each AI model with those of its human users) fine-tuned GPT-3 with human-generated instructional queries and responses to create InstructGPT, which included many of the instruction-following capabilities of ChatGPT but lacked the ability to converse with a human. After making some minor improvements to GPT-3, OpenAI fine-tuned it with a dataset similar to that of InstructGPT, but also including added human conversation data. The final result was ChatGPT.

The limits of AI

Understanding the technology behind ChatGPT makes it easier to understand why it has the limitations it does. For example, it’s so prone to responding to questions with inaccurate information because GPT-3’s dataset included potential misinformation from across the internet. Moreover, ChatGPT’s self-attention mechanisms are only useful for analyzing the intent of a question, not for understanding what qualifies as a correct answer. And it struggles with simple math problems because as a text model, it cannot interpret the numerical meaning of a number. Language models see numbers similarly to words, and even if ChatGPT can understand the question it’s being asked, it isn’t capable of actually computing the answer unless it’s seen the exact same question before.

Because many of these issues are fundamental to the language model architecture itself, they can’t be resolved by simply using a larger dataset and training with more computational resources. It’s unlikely that the GPT-4 based successor to ChatGPT will be much better at literary analysis, and we certainly won’t be able to completely trust the accuracy of its outputs. But as OpenAI’s pretraining and alignment processes improve, the GPT models could learn to write at a level indistinguishable from that of a human. And with a few clever tricks, they might even be able to surpass the limits of what we previously thought transformers were capable of. Even today, researchers are looking into ways of combining large language models with computer algebra systems to solve math problems.

“Google, for example, has trained a mathematical reasoning model called Minerva,” Dr. Tandon said. “It solves [mathematical and] quantitative reasoning problems with a language model. And it’s actually pretty good. If given a question like ‘A line parallel to this equation passes through this point, what is the y-intercept?’ it can often answer. The obvious question is, can we get this to a point where it’s 100% correct? I’m not sure. But we are getting there.”

The technologies that power ChatGPT have major future potential in their own right. Joe Li (11) experimented with transformer neural networks, using a combination of unsupervised and supervised training similar to that of ChatGPT, for his research into automatic emotion recognition.

“One example where you can use the technology used in ChatGPT is for time-series data prediction,” Joe said. “For example, if you’re given data from a Fitbit or an Apple Watch, [it’s difficult] to label all that data. However, if you use self-supervised learning and transformers, it can essentially learn from this data by itself and [automatically] understand human behavior.”

Joe believes these technologies show immense promise for the future. But he thinks they’re already changing the way our education system works today.

“It’s revolutionary in schools,” Joe said. “If you had a take home essay, you could essentially ask ChatGPT to write a paragraph for you, and it will give you a different paragraph each time with no delay. And it’s extremely good at writing these essays because it’s based on natural language processing. This could change the school system entirely.”

The erosion of creativity

Just as ChatGPT allows its users to automate mundane writing tasks, it also makes cheating on essays and research papers easier than ever before. Honor Council representative Austina Xu (12) believes some students now use AI tools to generate assignments without any detectable plagiarism.

“Students are definitely using it for academics; at least, I hear people saying that they are to finish their assignments last minute,” Austina said. “And that’s something that’s hard to track. I would say those kids probably adjust the wording afterwards to make it a bit less noticeable.”

Having tested ChatGPT for a presentation within Honor Council, however, Austina thinks it’s merely another way to cheat on assignments rather than a catch-all solution for students who procrastinate. She instead hopes students will use its summarization and rewording capabilities to learn about complex concepts in simple terms, or to write more quickly in non-academic situations.

“The quality of writing isn’t that great, especially for like analysis essays, and it’s often incorrect,” Austina said. “With regards to the honor code, it is another complication, but I don’t see it as any different from something like SparkNotes or LitCharts. It looks potentially useful for more mundane writing tasks like writing emails, resumes or instructions. Especially for people who don’t speak English, it could actually be a really useful tool.”

Library Director Lauri Vaughan even believes it could set a new baseline for what qualifies as good writing. Within education, it could even become a means of eliminating basic grammatical errors from students’ writing early on.

“I remembered having a conversation with a teacher not too long ago about having all the students create a ChatGPT-generated first draft,” Vaughan said. “How cool it would be be able to start with a bunch of essays that don’t have grammatical errors riddled all over them? I can then take the next step with these kids, like talking about transitions. We don’t have to get rid of all the lazy grammar; that’s already been taken care of for us. We’re starting at level two.”

These new AI tools can even attempt to simulate human creativity. The DALL-E 2 image generation model, based on a modified version of GPT-3, can transform a drawing request into a full piece in a matter of seconds. Some artists have criticized DALL-E 2 as a tool that doesn’t create real art, but rather steals styles and motifs from existing artwork to generate its images. But as an artist and writer herself, Austina has a broader definition of what qualifies as art.

“You could tell [DALL-E 2] to create art in the styles of different artists, and it does a pretty good job because their work is out there on the internet,” Austina said. “I was pretty impressed at it’s sheer range. I know some people don’t view that as art, but my definition of art is pretty loose. That being said, obviously the idea of a human conceiving that art is gone when an AI is creating it, and I still think that’s something that adds more depth to a piece.”

In many ways, advancements in AI technology are actually raising the bar for creativity. If anyone can create a painting or a poem merely by sending OpenAI a text prompt, the expression and emotion behind human-created art becomes that much more meaningful.

But if artificial intelligence can imagine a brand-new painting from a prompt just as a human can, what’s so “artificial” about it? Where do we draw the line between intellectual mimicry and original thought?

Will AI ever become conscious?

The question of whether artificial intelligence is capable of understanding reality as humans do is more philosophical than technical. It depends entirely on how we even define consciousness. Since neural networks utilize neurons similarly to our brains, a sufficiently complex model could theoretically operate exactly the same way as a human brain.

The closest analogy for consciousness in the world of AI research is artificial general intelligence (AGI). It’s a theorized form of AI capable of transforming multimodal inputs into the output that most efficiently achieves its goals. In other words, the term “AGI” is a computational way of defining sentience.

The prevailing belief within the AI community is that we’re still far away from creating any form of AGI. But recent innovations in natural language processing models like GPT-3 might be a step in the right direction.

“There’s a huge debate on whether or not large language models are truly along the path towards AGI,” Dr. Tandon said. “Now we have started to work at the scale of global information. In some sense, they have extremely high dimensional data and figured out the real data is probably lying in a much lower manifold or subspace. Obviously there are lots of challenges; the main issue with large language models is that they hallucinate [false information]. But I think they’re a very critical component, a great advancement moving towards AGI.”

It’s difficult to estimate when AGI will arrive, if it’s even possible to create. The ambiguity around what exactly “general” intelligence entails means there’s no way to gauge if current research is even heading in the right direction. And given how quickly the field has transformed over the past few years, there’s no telling where AI technology might be in the near future.

“Tomorrow, if we can have a machine which takes in different kinds of data modalities and is able to understand the similar signal patterns across them, I think that that would be a complete game changer,” Dr. Tandon said. “We are living in a really exciting time where we genuinely don’t know what kind of disruptions of daily life are going to happen in the next five to 10 years.”

Even if machine learning researchers continue to innovate at the unbelievable pace they’ve set for themselves, we probably won’t see a Hollywood-style AI revolution any time soon. But day by day, we’re teaching our computers to become better thinkers than we are. Capable of deterministically replicating our emotion and imagination, these models may soon challenge the line between man and machine. And some day, we’ll have to answer the most difficult question of all: what exactly is it that makes us alive?