WAAS
Overview of WAAS
WAAS: Whisper as a Service - GUI and API for OpenAI Whisper
WAAS (Whisper as a Service) is an open-source project that provides a GUI and API for OpenAI's Whisper, making audio and video transcription more accessible and user-friendly. It offers both a graphical user interface (GUI) for easy file upload and transcription and an API for programmatic access.
What is WAAS?
WAAS provides an interface to upload and transcribe audio or video files. After transcription, users receive an email with download links for the transcription in various formats, including Jojo-file, SRT, or plain text. A key feature is the local browser-based editor for correcting transcription errors.
Key Features
- GUI for Upload and Transcription: Simple interface for uploading audio and video files.
- Email Notifications: Receive email notifications with download links after transcription.
- Multiple Output Formats: Download transcriptions in Jojo-file, SRT, or plain text formats.
- Local Browser-Based Editor: Correct transcription errors within the browser.
- API Access: Programmatic access to transcription services via API.
How does WAAS work?
WAAS allows users to upload audio or video files through a GUI (named Jojo) or via an API. The uploaded file is then processed using OpenAI's Whisper model for transcription. Once the transcription is complete, the user receives an email containing links to download the transcription in various formats. The browser-based editor allows users to refine and correct any errors in the transcription before saving the final result.
API Documentation
The WAAS API provides several endpoints for transcription and related tasks:
- POST /v1/transcribe: Adds a new transcription job to the queue.
- Required parameters:
email_callbackorwebhook_id. - Optional parameters:
language,model,task,filename. - Body: Raw audio data.
- Required parameters:
- OPTIONS /v1/transcribe: Retrieves available options for the transcription route.
- POST /v1/detect: Detects the language of the audio file.
- Optional parameter:
model. - Body: Raw audio data.
- Optional parameter:
- OPTIONS /v1/detect: Retrieves available options for the detect route.
- GET /v1/download/<job_id>: Retrieves the completed transcription in the requested output format.
- Optional parameter:
output(json, timecode_txt, txt, vtt, srt).
- Optional parameter:
- OPTIONS /v1/download/<job_id>: Retrieves available options for the download route.
- GET /v1/jobs/<job_id>: Retrieves the status and metadata of the specified job.
- GET /v1/queue: Retrieves the current length of the queue.
Webhook Integration
WAAS supports webhook notifications. Upon successful or failed transcription, a POST request is sent to the configured webhook URL with a JSON payload and an X-WAAS-Signature header for content verification.
Who is WAAS for?
- Researchers needing to transcribe interviews or lectures.
- Journalists working with audio or video content.
- Developers integrating transcription services into their applications.
- Anyone needing to quickly and accurately transcribe audio or video files.
Installation
To install and run WAAS, follow these steps:
- Clone the repository.
- Create a virtual environment.
- Install the required Python packages using
pip install -r requirements.txt. - Configure environment variables such as
BASE_URL,EMAIL_SENDER_ADDRESS,EMAIL_SENDER_PASSWORD, andEMAIL_SENDER_HOST. - Run the setup using Docker Compose.
Running with Docker Compose
- Create a
.envrcfile with the necessary environment variables. - Add a
allowed_webhooks.jsonfile (if using webhooks) with valid webhook URLs and tokens. - Run
docker-compose --env-file .envrc up.
Using NVIDIA CUDA
To enable GPU acceleration with NVIDIA CUDA:
- Install NVIDIA Docker.
- Edit the
docker-compose.ymlfile to use theDockerfile.gpuand uncomment the device reservation. - Run
docker-compose --env-file .envrc up.
Why choose WAAS?
WAAS offers a user-friendly interface and API for leveraging OpenAI's Whisper model. Its features like email notifications, multiple output formats, and local browser-based editing make it a convenient and efficient solution for audio and video transcription needs. The flexibility to run it locally or integrate it into existing systems via the API makes it a versatile tool for various use cases.
In conclusion, WAAS is a valuable tool for anyone looking to transcribe audio or video content quickly and accurately. Its open-source nature and ease of use make it an excellent choice for both personal and professional use.
Best Alternative Tools to "WAAS"
WhisperAPI offers a fast and accurate video & audio transcription API powered by OpenAI Whisper. Get 5 free transcriptions daily. Supports multiple formats, generous limits, and privacy-first approach.
WhisperUI provides affordable speech to text conversion using OpenAI Whisper. Convert audio files to text and SRT formats easily. Get started with a free account!
Buzz Captions is an offline audio transcription and translation tool powered by OpenAI's Whisper. It supports various audio/video formats and exports to CSV, SRT, TXT, and VTT.
Azure AI Speech Studio empowers developers with speech-to-text, text-to-speech, and translation tools. Explore features like custom models, voice avatars, and real-time transcription to enhance app accessibility and engagement.
Whisper API: Affordable audio transcription API powered by OpenAI. Easy integration, speaker detection, supports 100+ languages. Free trial available!
1minAI is a free, all-in-one AI platform offering tools for text generation, image editing, audio transcription, and video creation. Unlock AI power for all your creative needs!
Discover Accent Guesser, an AI-powered tool for analyzing speech patterns and identifying accents. Explore your linguistic background and enhance communication skills.
Convert audio and video to text with 99.8% AI accuracy using AccurateScribe.ai. Transcribe 134+ languages and export in various formats. Start your free trial now!
Experience the future of voice interaction with Advanced Voice from ChatGPT. Natural, real-time voice synthesis with custom instructions, memory, and improved accents. Perfect for virtual assistants, audiobooks, and customer service.
AI Otaku LABO provides expert reviews and how-to guides on the latest generative AI tools for image, music, video, and more. Learn how to leverage AI for creative tasks.
Ailtoolbox is an AI content generator that helps create articles, improve content, generate ads, and convert text to speech. It offers a range of tools for various content needs.
Generate a realistic AI voice clone for free. No subscription, unlimited usage.
AI Coffee Club is an all-in-one platform to generate AI content including text, images, and code. Start making money in minutes with cost-effective AI solutions!
AIdeaFlow AI Podcast Generator transforms text into engaging AI podcasts with natural voices in multiple languages. Perfect for content creators, educators, and professionals.