Scraping Recipes from TikTok with AI
I have a problem. I save TikTok cooking videos compulsively. Someone makes a gorgeous one-pot pasta or a five-ingredient dessert, and I hit save, fully intending to cook it later. I never cook it later. The video sits in my saved folder, buried under 200 other saved videos, and I end up ordering takeout. So I built ReciMe: a tool that downloads my saved TikTok videos, transcribes them, extracts the recipe, and puts it in a searchable database. Problem solved. Mostly.
The Pipeline
ReciMe is a Python application with four stages. First, it downloads the TikTok video using yt-dlp (the Swiss Army knife of video downloaders). Second, it extracts the audio and transcribes it using Faster-Whisper, an optimized version of OpenAI’s Whisper model that runs locally. Third, it sends the transcript to Ollama (a local LLM runner) with a prompt that says “extract the recipe from this transcript, including ingredients and steps.” Fourth, it stores the structured recipe in a SQLite database with full-text search.
Each stage has its own quirks. The download stage needs to handle TikTok’s aggressive anti-scraping measures, which change frequently. The transcription stage needs to handle background music, overlapping audio, and the fact that many TikTok creators talk extremely fast. The extraction stage needs to handle the difference between someone talking through a recipe and someone telling a life story with a recipe embedded somewhere in the middle. The storage stage needs to handle duplicate recipes, variations, and the fact that “a pinch” is not a standardized unit of measurement.
The Transcription Challenge
Faster-Whisper was the obvious choice for transcription because it runs locally (no API costs) and handles multiple languages well. But TikTok audio is a special kind of chaos. Creators talk over music, use sound effects, speak in fragments, and rely heavily on visual context that the audio alone doesn’t capture.
The first time I ran the transcriber against my saved videos, about 40% of the transcripts were usable. The rest were either garbled by background music, too fragmented to parse, or contained more commentary than recipe (“and then my roommate came in and said this smelled amazing, which, like, obviously”).
I improved the accuracy by pre-processing the audio: separating vocals from background music using a lightweight source separation model, normalizing volume levels, and trimming silence. This got the usable rate up to about 70%. For the remaining 30%, I added a fallback path that uses the video’s caption and comments to supplement the transcript — because TikTok creators often post the full ingredient list in their caption or pin it in the first comment.
Teaching the LLM to Extract Recipes
The extraction step is where things get creative. A raw transcript of a cooking TikTok looks nothing like a recipe. It looks like someone narrating their day while occasionally mentioning food. The LLM’s job is to turn that into a structured recipe with a title, ingredient list, and ordered steps.
I use Ollama to run a local model (usually Llama or Mistral, depending on what’s performing best that week). The prompt is carefully engineered to handle common failure modes:
It knows to ignore personal anecdotes and focus on cooking instructions. It understands that “throw in some garlic” means “add garlic” and that “cook it until it looks done” needs to be translated into an approximate time. It handles the common TikTok pattern of showing ingredients on screen without saying them aloud by cross-referencing the caption text. And it standardizes units, converting “a big glug of olive oil” into “2-3 tablespoons olive oil.”
The extraction accuracy is about 85% — meaning 85% of the time, the output is a recipe I could actually follow without going back to the original video. The other 15% usually fails because the original video didn’t actually contain enough information to reconstruct the recipe (some TikTok “recipes” are more vibes than instructions).
The Database
SQLite was the obvious choice for storage. It’s local, serverless, and its FTS5 (Full-Text Search) extension is shockingly good for a built-in feature. I can search my recipe collection by ingredient (“chicken thigh”), cuisine (“Korean”), cooking method (“air fryer”), or any combination.
The schema stores the raw transcript, the extracted recipe, the original TikTok URL, the creator’s username, and metadata like cook time, difficulty, and dietary tags (the LLM generates these during extraction). I also store the video thumbnail as a BLOB, so I can browse recipes visually without needing network access.
The FTS index means I can do queries like “show me all pasta recipes that use fewer than 5 ingredients and take less than 30 minutes” and get results instantly. It’s infinitely more useful than scrolling through a saved folder hoping to recognize a thumbnail.
What I Actually Cook Now
The honest answer is: more than I used to, but still not as much as the database suggests I should. Having 300+ searchable recipes on my laptop is genuinely useful — I can search “what can I make with chickpeas and whatever’s in the fridge” and get actual results. But the gap between “saving a recipe” and “actually cooking it” turns out to be more about motivation than information access.
Still, ReciMe solved the problem I built it to solve. My TikTok saved folder is no longer an unsearchable graveyard of good intentions. It’s an organized, searchable cookbook that I actually use — even if “use” sometimes means browsing recipes while eating takeout and thinking about what I’ll cook tomorrow.
The Technical Takeaway
ReciMe taught me that the most interesting AI projects aren’t the ones with the fanciest models — they’re the ones that chain simple tools together in clever ways. A video downloader, a transcription model, a language model, and a database. None of these are cutting-edge. But the pipeline they form solves a real problem that no single tool could solve alone.
It also taught me the value of running AI locally. Faster-Whisper and Ollama both run on my machine with no API calls, no usage fees, and no data leaving my laptop. For a project that processes hundreds of personal videos, that privacy guarantee matters. And the speed is good enough — processing a 60-second TikTok takes about 30 seconds end-to-end on my M1 Mac.
The whole project took about two weeks from idea to working prototype. Not because the code was hard (it wasn’t), but because getting each stage to handle real-world messiness — bad audio, inconsistent formats, vague instructions — required constant iteration. That’s the unglamorous truth about AI projects: the model is 10% of the work. The other 90% is cleaning data, handling edge cases, and building the plumbing that connects everything together.