Research Revelations Episode 9: Grading vocal performance recordings with Saanvi Bhargava

by Ashley Mo and Jonathan Szeto

February 16, 2025

This is the ninth installment of Research Revelations: Conversations with Our Student Researchers, a podcast where Aquila staff members talk to student researchers about their projects and research goals. In this episode, STEM Editors Ashley Mo (11) and Jonathan Szeto (11) meet with senior Saanvi Bhargava to discuss her work in using artificial intelligence to grade the vocal performance of singers.

harkeraquila · Research Revelations Episode 9: Grading vocal performance recordings with Saanvi Bhargava

Ashley: Hi everyone, I’m Ashley.

Jonathan: I’m Jonathan.

Ashley: And welcome back to Research Revelations: Conversations with Our Student Researchers.

Jonathan: Today, we’re here with senior Saanvi Bhargava to talk about her work using artificial intelligence to grade vocal performance recordings.

Ashley: Thanks for joining us today, Saanvi. We want to first ask you, what’s your research all about, and how did you get started?

Saanvi: My research is about an AI model for automatically grading vocal performances—basically using machine learning to compare a singing recording to a piano recording and just grading it on a scale.

Ashley: And why did you choose this topic, and what got you interested in it?

Saanvi: I chose the topic around my freshman or sophomore year because I really wanted to improve my singing skills, and I felt like I wasn’t getting enough feedback from my teachers. A lot of online platforms were paid only, and I wanted to come up with something that could help me and other students improve on their vocal skills.

Ashley: Could you explain your internal motivations on why you chose to go into CS and a singing project?

Saanvi: I’ve been doing performing arts since fourth grade, so that’s definitely where I found my research problem. And the reason I used CS was because I’ve done CS projects before, and I had extensive background in it, so I knew that CS would provide a good solution with machine learning. And AI is something you hear a lot about, but I think that using it to solve different problems in different fields that don’t just impact tech is really important, and I’m glad I found a way to do that.

Jonathan: Can you walk us through the steps you took in your research process?

Saanvi: The first thing was defining the problem, and this involved choosing a subset, so choosing a certain problem from a variety [of problems]. I decided to focus on singing instead of instrumentals, just because that was the problem I faced. And I wanted to focus on first giving a numerical grade to a singer, and then later on I moved to choosing and then giving more specific feedback on specific parts of the singing. But at first I wanted to focus on giving an overall numerical grade, and so that was a problem I chose. And then, after I chose the problem, I had to find a data set that I could use for my research. And I would say that one of the biggest challenges was locating a data set.

Ashley: Can you talk more about how your model specifically works? I know you mentioned comparing it to a piano or instrumental piece?

Saanvi: The data set contained a reference piano recording and a sample singing recording and a grade from one to four, one being the worst and four being the best. What I did was I trained a K-nearest neighbors machine learning algorithm, a simple supervised learning algorithm on the performances and the reference piano recordings with the scores. And I trained the model and tested it, and I got up to 80% accuracy.

Jonathan: Do you have any insight on what the model was looking for? Was it matching the pitches of the piano or something similar?

Saanvi: I was specifically focused on matching pitches and ignoring timing differences. And the reason I wanted to focus on this was because sometimes the singers sang it slower than the piano had played it, but I didn’t want that to mark it as wrong. So I took some liberty in choosing that. So what I did was I used dynamic time warping, which basically takes two sequences of different lengths. It’s sort of similar to DNA sequences, but it takes two sequences of different lengths, which would be a piano recording pitches and then the singing recording pitches, and it would match pitches as close as possible, ignoring time.

Jonathan: You said one of the biggest challenges was getting the data. So how did you get [the recordings of] the singers’ singing and then the scores?

Saanvi: In getting the data, I had to individually reach out to a lot of data set creators, because when I first found the data set, it didn’t have the scores matched up with the recordings. It only had piano recordings and singing recordings. So I reached out to the data set creators, and they gave me the scores, which they had still been working on. So I kind of got a beta version of the data set.

Jonathan: So how is the score calculated?

Saanvi: The score is based on a panel of five judges, five experts, and they each individually give scores from one to four. And so it takes the most common score, and it also lists whether they all agreed on the score or not.

Ashley: That’s really cool. What would you say was your favorite part of the research process?

Saanvi: My favorite part of my research process was probably what came after, so after I used the machine learning algorithm, and I got an accuracy up to 80% I wanted to try giving more specific feedback, since that’s more useful for performers, and so I worked on a one-to-one correspondence algorithm, which basically took, again, the piano and the vocal recording, and it took singular notes and matched notes to each other. So I could really get down to where a singer forgot to sing in one place or sang too long in one place. And I really enjoyed tweaking that and figuring out how, for an individual recording, I could match pitches.

Jonathan: Was there anything unexpected you found in your research?

Saanvi: I didn’t realize how hard it would be to match up individual notes from one recording to another. I wasn’t able to scale that to multiple different recordings. I was only able to do that for individual ones, but I think that’s something I would want to work on in the future.

Ashley: What would you say is the biggest thing you took away from your project?

Saanvi: I think the biggest thing I took away was that I could do research without a mentor. I think it was hard to get started at first without someone to guide me. But as I reached out to individual data set owners and researchers, I gained little tidbits of advice. So even though I didn’t have one specific research mentor, I was able to get guidance from a lot of different people.

Jonathan: Were there any people or mentors who were especially impactful for you during the process?

Saanvi: There was one professor from India, from IIT Bombay, and after I reached out to her she responded, and she read the paper that I had written thus far. We scheduled multiple meetings, and I got to talk to her for maybe two hours total, and she gave me some really good feedback and pointed me in new directions, like being able to give more specific feedback. She then even offered to give me a research position over the summer, so I think that was really great.

Ashley: I also heard about your work in presenting this at different conferences or competitions. Could you maybe talk about that and what that was like?

Saanvi: I presented this work at Harker Symposium and the Association for Popular Music Education, and I also submitted it to the Harvard Undergraduate Research Journal High School Research Competition. I think doing the different competitions and submitting it to different conferences helped me talk about my research in different ways, so that was really exciting. Giving a 20-minute presentation at the conference on my research and being asked questions that I hadn’t thought about before — it helps me think about next steps to take in my research and also how to articulate my research better.

Ashley: And speaking of next steps, do you have any future goals for your project?

Saanvi: I would want to build it into a web application for something that people can actually use since right now it’s all local code on my laptop.

Jonathan: So what impact do you think your project will have on singers who want to improve?

Saanvi: I’m hoping it will have the biggest impact on beginner singers, ones who haven’t really figured out how to tell when they’re singing the correct thing or not, so middle schoolers who are just starting off and who aren’t able to access vocal coaches. I’m hoping that this will help them improve on specific songs that they’re singing, or overall their abilities.

Ashley: Overall, what is one piece of advice you would give to someone who’s just starting out in research?

Saanvi: I would say to not be afraid to reach out to as many people as you need to until you figure out the answer to your question or the resource that you need or anything like that. I think I cold-emailed up to 30 professors in my research journey, and I’m glad I did.

Jonathan: If you are a student researcher and would like to be featured next, please feel free to email us at [email protected].

Ashley: This is Ashley.

Jonathan: This is Jonathan.

Ashley: And we’ll see you next time.

About the Contributors

Ashley Mo, STEM Editor

Ashley Mo (11) is a STEM editor for Harker Aquila and the Winged Post, and this is her third year on staff. This year, Ashley hopes to further connect with the journalism team and create impactful STEM pieces and pages. In her free time, Ashley enjoys playing golf and spending time with her friends.

Jonathan Szeto, STEM Editor

Jonathan Szeto (11) is a STEM editor for Harker Aquila and the Winged Post, and this is his third year on staff. This year, he hopes to improve his photography skills and interview more people around campus. In his free time, he enjoys playing piano and violin and learning more about aerospace.