Whisper Transcript with Diarization
Use this script to generate transcript from the audio file using OpenAI's Whisper STT. It doesn't use the API but instead uses the Whisper model directly, so it doesn't incur any changes and no data is shared outside: https://github.com/ddeepak95/whisper-transcript-w-diarization/blob/main/whisper-diarization.ipynb
This script can be further used to clean the generated transcript using LLMs: https://github.com/ddeepak95/llm-transcript-cleaner
Both scripts can be combined but I have kept them separate intentionally as I usually run the Whisper transcription using Google Colab's resources and in Colab the files are not persistent, so I have to be vigilant throughout the process. So I download the files after transcription to my local computer and run the transcript cleaning locally.