
It’s easy to get caught up in the hype around artificial intelligence tutors. But so far, the evidence suggests caution.
Some studies have shown that chatbot tutors can flashback because the students lean on them too strongly, spoon up solutions and fail to absorb the material. Even when AI tutors are designed not to give answers, they haven’t always produced better results than old-fashioned learning without AI.
Yet the researchers who carried out these skeptical studies have not given up hope. Some are still experimenting and trying to create better AI tutors.
A promising idea has less to do with how an AI tutor explains concepts and more to do with what they ask students to practice next.
A team from the University of Pennsylvania, which included AI skeptics, recently tested this approach in a study of nearly 800 Taiwanese high school students learning Python programming. All students used the same AI tutor, designed not to give answers.
But there was one essential difference. Half of the students were randomly assigned to a fixed sequence of practice problems, progressing from easy to difficult. The other half received a personalized sequence with the AI tutor continuously adjusting the difficulty of each problem based on the student’s performance and interaction with the chatbot.
The idea is based on what educators call the “zone of proximal development.” When problems are too easy, students get bored. When they are too hard, students become frustrated. The goal is to keep students in an ideal situation: challenged, but not overwhelmed.
The researchers found that students in the personalized group performed better on a final exam than students in the fixed-problem group. The difference was characterized as the equivalent of 6 to 9 months of additional schooling, a catchy claim for an after-school online course that lasted only five months. The inventor of the AI tutor, Angel Chung, a doctoral student at the Wharton School, acknowledged that his conversion of statistical units was “not a perfect estimate.” (A draft paper (The experiment was published online in March 2026, but has not yet been published in a peer-reviewed journal.)
Still, this is early evidence that small adjustments – in this case, tailoring the difficulty of practice problems to the student – can make a difference.
Chung said ChatGPT’s answers can already feel very personal because they directly answer a student’s unique questions. But this level of customization is not enough. “Students generally don’t know what they don’t know,” Chung said. “The student does not have the ability to ask the right questions to benefit from the best tutoring.”
To solve this problem, Chung’s team combined a large language model with a separate machine learning algorithm that analyzes how students interact with the online course platform (how they answer practice questions, how many times they review or edit their coding, and the quality of their conversations with the chatbot) and uses that information to decide which problem to solve next.
How different students interact with the chatbot tutor

In other words, personalization isn’t just about tailoring explanations. It’s about adapting the learning path itself.
This idea is not new.
Long before the invention of generative AI tools like ChatGPT, education researchers developed “intelligent tutoring systems” that attempted to do something similar: estimate what a student knew and present them with the next correct problem. These earlier systems couldn’t produce natural conversations, but they could provide instant advice and feedback. Rigorous studies have found that well-designed versions help students learn significantly more.
Their Achilles heel was engagement. Many students simply didn’t want to use them.
Today’s AI tools could help solve this problem. Students might be more interested in a chatbot that converses with them in an almost human-like way.
In the University of Pennsylvania study, students in the personalized group spent more time practicing, about three minutes more per problem, which amounts to about an hour per module of the Python course, compared to half as much time (half an hour or less) for the comparison students. The researchers believe these students performed better because they were more engaged in their practical work.
Students’ prior knowledge of a topic affected how personalized sequencing worked. Students who were new to Python gained more than those with previous Python experience, who performed equally well with the fixed sequence of practice problems. Students at less elite high schools also appear to benefit more.
How Student Background Affected Results

All Taiwanese students participating in this study volunteered to take a computer programming elective course that could strengthen their college applications. Many were highly motivated, had highly educated parents, and many already had prior coding experience.
It’s unclear whether the chatbot would work as well with students who are less motivated, behind in school, and most in need of extra help.
A possible solution: merge the new and the old.
Ken Koedinger, a professor at Carnegie Mellon University and a pioneer of intelligent tutoring systems, is experimenting with the use of new AI models to alert remote human tutors which can motivate struggling students who stray. “We’re having more success,” Koedinger said.
Humans are not yet obsolete.
Contact staff writer Jill Barshay at 212-678-3595, jillbarshay.35 on Signal or barshay@hechingerreport.org.
This story about AI tutors was produced by The Hechinger reportan independent, nonprofit news organization that covers education. Register for Proof points and others Hechinger Newsletters.
The article The Quest to Create a Better AI Tutor appeared first in the Hechinger Report.