ASR for classrooms

Challenges
- babble noise
- room acoustics
- child speech
Speech-native Vs LLM-centric Modeling
LLM-Based ASR
- Contextual biasing
  - More context about the audio as metadata
Whisper is good out of box compared to Wav2Vec
- But if have data to fine tune, wave vec could be a better option