MCP Serversathizz7public

speech to text

基于 FastAPI 的 Speech-to-Text 服务，支持多种语音识别引擎，遵循 MCP 协议。

Repository Info

Stars

Forks

Watchers

Issues

Python

Language

License

View on GitHubGitHub Download DocumentationDocs

About This Server

基于 FastAPI 的 Speech-to-Text 服务，支持多种语音识别引擎，遵循 MCP 协议。

Model Context Protocol (MCP) - This server can be integrated with AI applications to provide additional context and capabilities, enabling enhanced AI interactions and functionality.

Documentation

Speech-to-Text MCP Server

A high-performance Speech-to-Text server built with FastAPI that implements the Model Context Protocol (MCP) architecture. This server provides a unified interface for multiple STT providers (OpenAI Whisper, Groq) while following MCP standards for model serving and context management.

🎯 About MCP

The Model Context Protocol (MCP) is a standardized protocol for model serving that:

Defines how AI models should expose their capabilities
Manages model contexts and state
Provides consistent interfaces across different model providers
Enables seamless model switching and fallback strategies

Our STT server implements MCP through FastAPI-MCP, allowing:

Standardized model routing
Unified context handling
Consistent error management
Provider-agnostic model serving

🌟 Features

MCP Implementation:
- Full Model Context Protocol support
- FastAPI-MCP integration
- Standardized model interfaces
- Context-aware request handling
Multi-Provider Support:
- OpenAI Whisper
- Groq
- Extensible architecture for adding more providers
Audio Processing:
- Automatic audio format conversion
- Sample rate normalization (16kHz)
- Duration validation
- Support for multiple input formats (wav, mp3, ogg, flac, m4a)
API Features:
- Fast async processing
- MCP-compliant provider selection
- Language detection and specification
- Comprehensive error handling
- Detailed logging

🚀 Quick Start

Prerequisites

Python 3.9 or higher
FFmpeg installed on your system
Poetry for dependency management
API keys for providers (OpenAI, Groq)

Installation

Clone the repository:

git clone <your-repository-url>
cd stt

Install dependencies using Poetry:

poetry install

Set up environment variables:

# Create .env file
cp .env.example .env

# Edit .env with your API keys
OPENAI_API_KEY=sk-...
GROQ_API_KEY=gk-...

Running the Server

poetry run uvicorn stt.main:app --reload

The server will start at http://localhost:8000

📝 API Usage

Transcribe Audio

curl -X POST "http://localhost:8000/v1/transcribe" \
  -H "Content-Type: multipart/form-data" \
  -F "file=@audio.mp3" \
  -F "engine=auto" \
  -F "language=auto"

Parameters:

file: Audio file to transcribe
engine: Transcription engine choice (auto, openai, groq)
language: Language of the audio (auto for detection)

Response:

{
  "text": "Transcribed text content...",
  "model": "openai/whisper-1",
  "duration_ms": 12345,
  "language": "en"
}

⚙️ Configuration

MCP Configuration

# config.py
MCP_CONFIG = {
    "protocol_version": "1.0",
    "provider_configs": {
        "openai": {
            "context_window": 240000,  # 4 minutes in ms
            "max_tokens": None,  # No token limit for audio
        },
        "groq": {
            "context_window": 300000,  # 5 minutes in ms
            "max_tokens": None,
        }
    }
}

General Settings

max_audio_duration_seconds: Maximum allowed audio duration
default_engine_policy: MCP-based policy for selecting transcription engine
default_engine: Fallback engine if policy fails

🔧 Development

Project Structure

stt/
├── adapters/         # MCP-compliant provider implementations
├── audio_utils.py    # Audio processing utilities
├── config.py         # MCP and general configuration
├── main.py          # FastAPI application and MCP routes
├── router.py        # MCP-based model routing logic
├── schemas.py       # Pydantic models and MCP schemas
└── common_exceptions.py  # MCP-compliant exceptions

Adding a New Provider

Create a new MCP-compliant adapter in adapters/

Implement the MCP provider interface:

class NewProviderAdapter(MCPModelAdapter):
    async def process_context(self, context: MCPContext) -> MCPResponse:
        # Implementation

Register the provider in router.py
Update MCP configuration as needed

MCP Integration Example

from fastapi_mcp import FastApiMCP, MCPContext

@app.post("/v1/transcribe")
async def transcribe_audio(context: MCPContext):
    # MCP handles provider selection and execution
    response = await mcp_router.process(context)
    return response

📚 Documentation

For more detailed information about MCP and its implementation:

Model Context Protocol Specification
FastAPI-MCP Documentation
MCP Provider Implementation Guide

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🔗 Links

Model Context Protocol
FastAPI Documentation
FastAPI-MCP Documentation
LiteLLM Documentation

Quick Start

Clone the repository

git clone https://github.com/sathizz7/speech-to-text

Install dependencies

cd speech-to-text
npm install

Follow the documentation

Check the repository's README.md file for specific installation and usage instructions.

Repository Details

Ownersathizz7

Repospeech-to-text

LanguagePython

License-

Last fetched8/10/2025

Quick Links

Issues

Releases

License

Recommended MCP Servers

💬

Discord MCP

Enable AI assistants to seamlessly interact with Discord servers, channels, and messages.

integrationsdiscordchat

🔗

Knit MCP

Connect AI agents to 200+ SaaS applications and automate workflows.

integrationsautomationsaas

🕷️

Apify MCP Server

Deploy and interact with Apify actors for web scraping and data extraction.

apifycrawlerdata

🌐

BrowserStack MCP

BrowserStack MCP Server for automated testing across multiple browsers.

testingqabrowsers

⚡

Zapier MCP

A Zapier server that provides automation capabilities for various apps.

zapierautomation