<code>whisper-tui</code>: Interactively Grepping Audio with Whisper and SQLite

written Sun Jun 18 2023 03:38:52 GMT+0000 (Coordinated Universal Time)

weblog

Another tool I impulsively made to solve an issue I had.

In a Discord server I’m a part of, a fellow member asked me to share a link to a podcast episode I mentioned when discussing how physicists are awesome. I wanted to share with him the timestamp where the relevant discussion happened since the podcast is pretty long and I didn’t expect him to watch the entire thing. The experience of deriving the timestamp ws pretty awful; I awkwardly scrubbed through the player as I fished for the specific section I was looking for. It took me like five minutes to do this, which was pretty embarassing. ^[1]

I realized in that moment that just being able to search the captions of a video would have been a lot nicer as text is the universal interface. I have some previous experience in this domain, but it is much less ad-hoc; I have no need to keep around the captions once I have the snippets I need, and don’t mind having to generate the captions again since Whisper on the GPU runs fast enough. ^[2]

So, I made a tool which does just that. It’s a 50-line Python script which takes an audio file, transcribes it ^[3], inserts the individual lines of the transcription into an in-memory SQLite FTS5 virtual table, and then drops the user into a REPL where they can search for captions in that table.

For being so scrappy, the script achieves its goal pretty well. As always, SQLite rocks; there’s basically no other library for Python which allows for the in-memory document search ^[4], and it’s part of the standard library. Whisper is quite fast with a GPU; it can chew through the podcast episode I mentioned earlier in two-and-a-half minutes.

Overall, I’m pretty happy with the project; if my description of the tool interests you, you can clone it. Making productivity tools like this has been the driving force behind why I write software; extending myself with my machine always brings me joy.

1. I could have used the autogenerated captions on the YouTube copy of the podcast to find the relevant section, but I hadn’t thought of it at the time.

2. I wanted to go even faster with faster-whisper, but had issues getting it to work with my graphics card’s version of CUDA. I love the Python ML ecosystem. :)

3. It uses the tiny.en model by default for maximum performance; it’s good enough.

4. The only one I know of is this embedding of Meilisearch’s indexer; I wanted a fuzzy search solution but couldn’t find an all-in-one package.