How Smart is ChatGPT?

February 8, 2023

To test ChatGPT’s accuracy and ability to imitate a high school student, Harker Aquila entered three sample free-response questions from past Advanced Placement (AP) exams and presented the AI’s solutions for teachers to grade.

1. AP United States History

How would you grade ChatGPT?

Loading...

Sorry, there was an error loading this poll.

Scroll down to see answer and analysis.

Answer: Pass

Explanation (given by upper school history teacher James Tate): There’s a lot of correct content there that does address the question and provide a relatively good answer. The order in which it is presented strikes me as a bit odd. The chatbot says, ‘One specific historical change in relations was the signing of the Treaty of Greenville in 1795.’ The signing of a treaty might encompass or represent or delineate changes in relations, but the signing of a treaty itself was not a change in relations. If ChatGPT had opened with something like ‘A historical change in relations between American Indians and the United States was the Native American tribes in the Ohio Valley turning to a more peaceful set of relations which is encompassed in this particular treaty,’ that would logically make more sense to me. The ordering of the presentation of information strikes me as a bit odd; the word ‘artificial’ immediately comes to mind. Maybe I wouldn’t be thinking of the word ‘artificial’ if I didn’t know an AI wrote it, but it still feels a bit odd the way that that’s presented. If I had read this, I wouldn’t immediately think AI, but I would think it’s weird because it addresses the question in sort of an oblique way. It comes at it from a logic that’s at least very different from the way I would think about it.”

2. AP Calculus BC

This poll has ended.

How would you grade ChatGPT?

Loading...

Sorry, there was an error loading this poll.

Scroll down to see answer and analysis.

Answer: Fail

Explanation (given by upper school mathematics teacher Bradley Stoll): “From an AP standpoint, there’s a couple of things that are wrong. For one, the name of the function was changed at various times. It went from capital G to lowercase g, and those are actually two different functions. Second of all, from an AP standpoint, they would not accept an answer that was rounded to the tenths, not the thousandths, so that could lose a point. Another thing was it didn’t use proper notation. It used the word ‘Integral,’ and it didn’t use the integral symbol from one to four, and the College Board does not accept calculator notation. Even without verifying that the answer was right, there were three mistakes in there. The math was all right, integrating from one to four, dividing by the length of the interval, and then setting that equal. So it’s kind of impressive that it did all that. It’s just a little bit interesting that it changed a few things in there. I also think if a student went into that depth of writing an explanation like ChatGPT did, I think it would be pretty obvious that they got it from somewhere else.”

3. AP Computer Science A

This poll has ended.

How would you grade ChatGPT?

Loading...

Sorry, there was an error loading this poll.

Scroll down to see answer and analysis.

Answer: Pass

Explanation (given by upper school computer science teacher Anu Datar): The answer is beautifully composed, and the code is written quite well because [ChatGPT] followed all Java naming conventions and coding standards. It did use a construct that is not on the AP curriculum, and it used a random class instead of using ‘Math.random’. But that’s a minor thing because as an AP reader, I’ve also been told that even if the construct is outside the AP curriculum, we are supposed to award points if it works. Although, I would recognize that this is either a student who self-studied or a student who knows more than what they’re expected to for the AP because they’re using constructs that are beyond the AP curriculum. I’ve seen a response like this written by an actual student on the AP when I’ve been grading, but I know that those students either self-study or are taking the AP after working in Java for three or four years.”

While ChatGPT nails the general problem-solving process of the questions and excels at a surface-level examination, it falls short in the specifics of arriving at a definitive answer and properly imitating the structure of a student-produced response. Even so, both Tate and Stoll expressed concerns about students plagiarizing the chatbot and submitting its work as their own.

“I think that while there might be an argument to be made that using ChatGPT isn’t technically plagiarism, doing that goes against the spirit of the law, which is that work done in an academic setting is supposed to represent not only your ideas but also your effort,” Tate said. “Using an AI is neither your ideas nor your effort.”

ChatGPT’s performance on the problems demonstrates that the AI can now successfully impersonate a human student, for the most part. Although platforms like GPTZero are already uncovering ways of detecting ChatGPT’s writing, they have still yet to reach the same popularity as the program they were designed to target. For now, students will have to use their own judgment for now to decide if using a chatbot for their homework is really the best decision.

Harker Aquila • Copyright 2024 • FLEX WordPress Theme by SNOLog in