Whisper Transcript with Diarization

Use this script to generate transcript from the audio file using OpenAI's Whisper STT. It doesn't use the API but instead uses the Whisper model directly, so it doesn't incur any changes and no data is shared outside: https://github.com/ddeepak95/whisper-transcript-w-diarization/blob/main/whisper-diarization.ipynb

This script can be further used to clean the generated transcript using LLMs: https://github.com/ddeepak95/llm-transcript-cleaner

Both scripts can be combined but I have kept them separate intentionally as I usually run the Whisper transcription using Google Colab's resources and in Colab the files are not persistent, so I have to be vigilant throughout the process. So I download the files after transcription to my local computer and run the transcript cleaning locally.