sathizz7
MCP Serversathizz7public

speech to text

基于 FastAPI 的 Speech-to-Text 服务,支持多种语音识别引擎,遵循 MCP 协议。

Repository Info

0
Stars
0
Forks
0
Watchers
0
Issues
Python
Language
-
License

About This Server

基于 FastAPI 的 Speech-to-Text 服务,支持多种语音识别引擎,遵循 MCP 协议。

Model Context Protocol (MCP) - This server can be integrated with AI applications to provide additional context and capabilities, enabling enhanced AI interactions and functionality.

Documentation

Speech-to-Text MCP Server

A high-performance Speech-to-Text server built with FastAPI that implements the Model Context Protocol (MCP) architecture. This server provides a unified interface for multiple STT providers (OpenAI Whisper, Groq) while following MCP standards for model serving and context management.

🎯 About MCP

The Model Context Protocol (MCP) is a standardized protocol for model serving that:

  • Defines how AI models should expose their capabilities
  • Manages model contexts and state
  • Provides consistent interfaces across different model providers
  • Enables seamless model switching and fallback strategies

Our STT server implements MCP through FastAPI-MCP, allowing:

  • Standardized model routing
  • Unified context handling
  • Consistent error management
  • Provider-agnostic model serving

🌟 Features

  • MCP Implementation:
    • Full Model Context Protocol support
    • FastAPI-MCP integration
    • Standardized model interfaces
    • Context-aware request handling
  • Multi-Provider Support:
    • OpenAI Whisper
    • Groq
    • Extensible architecture for adding more providers
  • Audio Processing:
    • Automatic audio format conversion
    • Sample rate normalization (16kHz)
    • Duration validation
    • Support for multiple input formats (wav, mp3, ogg, flac, m4a)
  • API Features:
    • Fast async processing
    • MCP-compliant provider selection
    • Language detection and specification
    • Comprehensive error handling
    • Detailed logging

🚀 Quick Start

Prerequisites

  • Python 3.9 or higher
  • FFmpeg installed on your system
  • Poetry for dependency management
  • API keys for providers (OpenAI, Groq)

Installation

  1. Clone the repository:
git clone <your-repository-url>
cd stt
  1. Install dependencies using Poetry:
poetry install
  1. Set up environment variables:
# Create .env file
cp .env.example .env

# Edit .env with your API keys
OPENAI_API_KEY=sk-...
GROQ_API_KEY=gk-...

Running the Server

poetry run uvicorn stt.main:app --reload

The server will start at http://localhost:8000

📝 API Usage

Transcribe Audio

curl -X POST "http://localhost:8000/v1/transcribe" \
  -H "Content-Type: multipart/form-data" \
  -F "file=@audio.mp3" \
  -F "engine=auto" \
  -F "language=auto"

Parameters:

  • file: Audio file to transcribe
  • engine: Transcription engine choice (auto, openai, groq)
  • language: Language of the audio (auto for detection)

Response:

{
  "text": "Transcribed text content...",
  "model": "openai/whisper-1",
  "duration_ms": 12345,
  "language": "en"
}

⚙️ Configuration

MCP Configuration

# config.py
MCP_CONFIG = {
    "protocol_version": "1.0",
    "provider_configs": {
        "openai": {
            "context_window": 240000,  # 4 minutes in ms
            "max_tokens": None,  # No token limit for audio
        },
        "groq": {
            "context_window": 300000,  # 5 minutes in ms
            "max_tokens": None,
        }
    }
}

General Settings

  • max_audio_duration_seconds: Maximum allowed audio duration
  • default_engine_policy: MCP-based policy for selecting transcription engine
  • default_engine: Fallback engine if policy fails

🔧 Development

Project Structure

stt/
├── adapters/         # MCP-compliant provider implementations
├── audio_utils.py    # Audio processing utilities
├── config.py         # MCP and general configuration
├── main.py          # FastAPI application and MCP routes
├── router.py        # MCP-based model routing logic
├── schemas.py       # Pydantic models and MCP schemas
└── common_exceptions.py  # MCP-compliant exceptions

Adding a New Provider

  1. Create a new MCP-compliant adapter in adapters/
  2. Implement the MCP provider interface:
    class NewProviderAdapter(MCPModelAdapter):
        async def process_context(self, context: MCPContext) -> MCPResponse:
            # Implementation
    
  3. Register the provider in router.py
  4. Update MCP configuration as needed

MCP Integration Example

from fastapi_mcp import FastApiMCP, MCPContext

@app.post("/v1/transcribe")
async def transcribe_audio(context: MCPContext):
    # MCP handles provider selection and execution
    response = await mcp_router.process(context)
    return response

📚 Documentation

For more detailed information about MCP and its implementation:

  • Model Context Protocol Specification
  • FastAPI-MCP Documentation
  • MCP Provider Implementation Guide

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

  • Model Context Protocol
  • FastAPI Documentation
  • FastAPI-MCP Documentation
  • LiteLLM Documentation

Quick Start

1

Clone the repository

git clone https://github.com/sathizz7/speech-to-text
2

Install dependencies

cd speech-to-text
npm install
3

Follow the documentation

Check the repository's README.md file for specific installation and usage instructions.

Repository Details

Ownersathizz7
Repospeech-to-text
LanguagePython
License-
Last fetched8/10/2025

Recommended MCP Servers

💬

Discord MCP

Enable AI assistants to seamlessly interact with Discord servers, channels, and messages.

integrationsdiscordchat
🔗

Knit MCP

Connect AI agents to 200+ SaaS applications and automate workflows.

integrationsautomationsaas
🕷️

Apify MCP Server

Deploy and interact with Apify actors for web scraping and data extraction.

apifycrawlerdata
🌐

BrowserStack MCP

BrowserStack MCP Server for automated testing across multiple browsers.

testingqabrowsers

Zapier MCP

A Zapier server that provides automation capabilities for various apps.

zapierautomation