STT Translate & Transcribe!

Gradio_GenerativeAI Tool To Transcribe & Translate Audio

`About` 📝

> Using OpenAI Whisper Base model to transcribe audio files into text Google Madlad model to translate transcribed texts into multiple languages. 
> Enabling users to convert spoken words into written text. 
> Supporting various use cases, including transcription of audio files, detection of phrases, speech-to-text generation, and translation of text.

`How it Works` 🫶

Upload an audio file or record a new one directly in the app.
Transcribe the audio into text, allow copy and paste function for further use.
Or/ Translates the transcribed text into multiple languages (400+ languages)

`Usage` 🤗

Transcribe audio files for note-taking, research, or content creation
Detect phrases or keywords in audio recordings for data analysis or market research
Generate text from speech for speech-to-text applications, such as subtitles, closed captions, or voice assistants
Use the app for language learning, by transcribing audio files in a foreign language and practicing pronunciation
Translate the transcribed text into multiple languages for global communication

`Features` 🎉

Supports audio files and live recording 📻
Uses the OpenAI Whisper Base model for speech recognition 💬
Displays transcribed text in the app 📝
Allows you to copy and paste the transcribed text for further use 📋
Translates the transcribed text into multiple languages using the Google Madlad model 🌎

`Model Details` 📊

" "🐤 OpenAI Whisper | " "🧑‍💻 Google Madlad |" "

The OpenAI Whisper Base model is a pre-trained model for automatic speech recognition (ASR) and speech translation.
It was trained on 680k hours of labelled data and demonstrates a strong ability to generalize to many datasets and domains without fine-tuning.

Size	Parameters	English-only	Multilingual
tiny	39 M	✓	✓
base	74 M	✓	✓
small	244 M	✓	✓
medium	769 M	✓	✓
large	1550 M	x	✓
large-v2	1550 M	x	✓
large-v3	1550 M	x	✓

For both the machine translation and language model, MADLAD-400 (Multilingual (400+ languages)) is used. For the machine translation model, a combination of parallel datasources covering 157 languages is also used. Using a Sentence Piece Model with 256k tokens shared on both the encoder and decoder side. Each input sentence has a <2xx> token prepended to the source sentence to indicate the target language.

image/png

See the research paper for further details.

`🙅‍♂️Disclaimer`

This app is licensed under AGPL-3.0 License and is for personal use only and should not be used for commercial purposes. The OpenAI Whisper Base model and Google Madlad model are pre-trained models and may not always produce accurate results.

`Get Involved!` 😌

I hope you found this project informative and engaging! 😊
If you’re interested in collaborating and contributing to the project, please let me know! I’d love to hear from you.

`Getting Started` 🚀

To get started with this project, you’ll need to:

Clone the repository
Install the required dependencies using pip install -r requirements.txt📦
Run the application using python app.py 🤖

Enjoy working with the content! 😊

🎙Speech To Text & Translation

STT Translate & Transcribe!

About 📝

How it Works 🫶

Usage 🤗

Features 🎉

Model Details 📊

🙅‍♂️Disclaimer

Get Involved! 😌

Getting Started 🚀