When Shazam Met "Baby Shark": A Weekend Adventure With GPT-4 and Whisper
Table of Contents

When Shazam Met "Baby Shark": A Weekend Adventure With GPT-4 and Whisper

Last weekend, I introduced my daughters to Shazam. We tested it with a random track I played on my phone, and it identified the song flawlessly. Excited, they wanted to see if Shazam could recognize their rendition of “Baby Shark”. And, of course, it did not work. They impatiently waited for it to complete all possible searches, only to be informed that it didn’t recognize their version of perhaps the most iconic children’s song. They were not impressed :)

This got me thinking: Could we bridge this gap with a robust speech-to-text tool combined with a large language model? If the system could identify the lyrics as text, it could then tap into an LLM to determine the song. Inspired, I promised my girls a program where they could sing to the computer, and it would identify their song. Their eyes lit up with excitement, and I got to work.

No alt text provided for this image
A singing cat, generated by Midjourney

I know that Whisper is very good at recognizing speech. It can handle accents (including foreigners speaking English) or speed talking, at least that’s what the demo page claims. I hoped that it wouldn’t have issues with people singing. And indeed, it doesn’t.              

Here’s a short clip of Maira singing “Baby Shark”:        

And here what Whisper hears: Baby Shark do do do do do, Baby Shark do do do do do do… Baby Shark, do do-do-do-do-do, Baby Shark!                

For the more adventurous, here’s me singing “Livin’ on a Prayer” by Bon Jovi:

Whisper transcription: Whoa, we’re halfway there, whoa, living on the prayer, take my hand we’ll make it.                

And here is the actual output from the application:    

$ python main.py
Capturing audio...
Audio captured to: recorded_audio.mp3
Transcript: Whoa, we're halfway there, 
whoa, living on the prayer, take my hand we'll make it
Recognized song: Livin' on a Prayer - Bon Jovi          


For song recognition, I’m using a single call to GPT-4 via the chat completion API. Here’s a glimpse at the code:    

"role": "system",
"content": "You are an expert in songs with perfect 
knowledge of titles, artists, release dates, and lyrics."},
"role": "user", 
"content": f"What song is this text from: \"{processed_audio}\". 
Output only the song title and artist name (when applicable) and nothing else.

With these prompts, GPT-4 delivers reliable, though not perfect, results. Understandably, it can’t identify songs released after the model’s training cut-off. The performance is also not great for non-English songs. I experimented with GPT-3.5, but the results were less consistent, even in English. Perhaps they could be improved with some prompt engineering or more context. I just went with the simplest solution and upgraded to the bigger model. The knowledge cut-off can potentially be managed using Langchain, which would enable the model to browse the internet and search for specific lyrics. This might be a future addition.                

If you’re curious about how this all fits together, check out my repo. Happy singing!

All images created with Midjourney.

This post was originally published on LinkedIn by Michał Prządka.

Liked the article? subscribe to updates!
360° IT Check is a weekly publication where we bring you the latest and greatest in the world of tech. We cover topics like emerging technologies & frameworks, news about innovative startups, and other topics which affect the world of tech directly or indirectly.

Like what you’re reading? Make sure to subscribe to our weekly newsletter!

Join 17,850 tech enthusiasts for your weekly dose of tech news

By filling in the above fields and clicking “Subscribe”, you agree to the processing by ITMAGINATION of your personal data contained in the above form for the purposes of sending you messages in the form of newsletter subscription, in accordance with our Privacy Policy.
Thank you! Your submission has been received!
We will send you at most one email per week with our latest tech news and insights.

In the meantime, feel free to explore this page or our Resources page for eBooks, technical guides, GitHub Demos, and more!
Oops! Something went wrong while submitting the form.

Related articles

Our Partners & Certifications
Microsoft Gold Partner Certification 2021 for ITMAGINATION
ITMAGINATION Google Cloud Partner
© 2024 ITMAGINATION. All Rights Reserved. Privacy Policy