MCP Servertoshikazu-takemasapublic

mcp_whisper_transcription

基于 OpenAI Whisper API 的语音转文字工具，支持多种音频格式处理和功能扩展。

Repository Info

Stars

Forks

Watchers

Issues

Python

Language

License

View on GitHubGitHub Download DocumentationDocs

About This Server

基于 OpenAI Whisper API 的语音转文字工具，支持多种音频格式处理和功能扩展。

Model Context Protocol (MCP) - This server can be integrated with AI applications to provide additional context and capabilities, enabling enhanced AI interactions and functionality.

Documentation

MCP Whisper Transcription Server

OpenAI Whisper APIを使用した音声文字起こし用のMCP（Model Context Protocol）サーバーです。

機能

音声文字起こし: OpenAI Whisper APIを使用した高精度な音声文字起こし
音声チャット: OpenAIの音声対応モデルを使用した音声分析
音声変換: 音声ファイルの形式変換（mp3, wav）
音声圧縮: ファイルサイズの圧縮
音声合成: テキストから音声への変換（TTS）
ファイル情報取得: 音声ファイルのメタデータ取得

サポートされる音声形式

MP3
WAV
FLAC
MP4
MPEG
MPGA
M4A
OGG

必要な環境

Python 3.11以上
OpenAI API キー

インストール

Dockerイメージを使用する場合（推奨）

# GitHub Container Registryから最新イメージを取得
docker pull ghcr.io/toshikazu-takemasa/mcp_whisper_transcription:latest

# MCPサーバーをstdioモードで起動（Clineなどのクライアントから使用）
# 注意: このコマンドは直接実行するものではなく、MCPクライアント設定で使用します
docker run -i -e OPENAI_API_KEY="your-openai-api-key" \
  -v $(pwd)/audio_files:/app/audio_files \
  ghcr.io/toshikazu-takemasa/mcp_whisper_transcription:latest

Docker Composeを使用する場合

# 環境変数ファイルを作成
cp .env.example .env
# .envファイルを編集してOpenAI API keyを設定

# ローカルビルド版を起動
docker-compose up

# または、公開済みイメージを使用
docker-compose --profile published up mcp-whisper-transcription-published

Docker イメージのビルドについて

このプロジェクトでは、GitHub Actionsを使用してDocker イメージを自動的にビルド・公開しています：

mainブランチへのプッシュ時に自動的にイメージがビルドされます
イメージは GitHub Container Registry (GHCR) に公開されます
タグ付きリリース（v*）時には、バージョン付きイメージも作成されます

手動でイメージをテストする場合：

# テストスクリプトを実行
./scripts/test-docker.sh local   # ローカルビルドをテスト
./scripts/test-docker.sh remote  # 公開イメージをテスト
./scripts/test-docker.sh both    # 両方をテスト

devcontainerを使用する場合

このリポジトリをクローンします
VS Codeでプロジェクトを開きます
"Reopen in Container"を選択します

環境変数を設定します：

export OPENAI_API_KEY="your-openai-api-key"

ローカル環境での使用

# 依存関係のインストール
pip install -r requirements.txt

# 開発用インストール
pip install -e .

使用方法

MCPクライアント（Cline等）での設定

Clineなどのクライアントで使用する場合は、MCP設定ファイルに以下を追加してください：

{
  "mcpServers": {
    "mcp-whisper-transcription": {
      "command": "docker",
      "args": [
        "run", "-i", "--rm",
        "-e", "OPENAI_API_KEY=your-openai-api-key",
        "-v", "/path/to/audio/files:/app/audio_files",
        "ghcr.io/toshikazu-takemasa/mcp-whisper-transcription:latest"
      ]
    }
  }
}

MCPサーバーとして直接起動

python -m mcp_whisper_transcription

利用可能なツール

1. transcribe_audio

音声ファイルをテキストに変換します。

{
  "input_file_path": "/path/to/audio.mp3",
  "response_format": "text",
  "prompt": "音声の内容に関するヒント"
}

2. transcribe_with_enhancement

事前定義されたプロンプトを使用して音声を文字起こしします。

{
  "input_file_path": "/path/to/audio.mp3",
  "enhancement_type": "detailed"
}

利用可能な enhancement_type:

detailed: 詳細な文字起こし（非言語音も含む）
storytelling: 自然な会話形式
professional: プロフェッショナルな形式
analytical: 技術的な分析形式

3. chat_with_audio

音声ファイルを分析し、カスタムプロンプトで処理します。

{
  "input_file_path": "/path/to/audio.mp3",
  "system_prompt": "あなたは音声分析の専門家です",
  "user_prompt": "この音声の要約を作成してください"
}

4. convert_audio

音声ファイルの形式を変換します。

{
  "input_file_path": "/path/to/audio.wav",
  "target_format": "mp3"
}

5. compress_audio

音声ファイルを圧縮します。

{
  "input_file_path": "/path/to/large_audio.mp3",
  "max_mb": 25
}

6. create_speech

テキストを音声に変換します。

{
  "text_prompt": "こんにちは、これはテスト音声です。",
  "voice": "nova",
  "speed": 1.0
}

7. get_file_support

音声ファイルの情報とサポート状況を確認します。

{
  "file_path": "/path/to/audio.mp3"
}

設定

環境変数

OPENAI_API_KEY: OpenAI APIキー（必須）
AUDIO_FILES_PATH: 音声ファイルのベースパス（オプション）

開発

コードフォーマット

black src/
isort src/

型チェック

mypy src/

テスト実行

pytest

ライセンス

MIT License

貢献

プルリクエストやイシューの報告を歓迎します。

注意事項

OpenAI APIの使用には料金が発生します
大きな音声ファイルの処理には時間がかかる場合があります
音声ファイルのアップロード制限にご注意ください

Quick Start

Clone the repository

git clone https://github.com/toshikazu-takemasa/mcp_whisper_transcription

Install dependencies

cd mcp_whisper_transcription
npm install

Follow the documentation

Check the repository's README.md file for specific installation and usage instructions.

Repository Details

Ownertoshikazu-takemasa

Repomcp_whisper_transcription

LanguagePython

License-

Last fetched8/10/2025

Quick Links

Issues

Releases

License

Recommended MCP Servers

💬

Discord MCP

Enable AI assistants to seamlessly interact with Discord servers, channels, and messages.

integrationsdiscordchat

🔗

Knit MCP

Connect AI agents to 200+ SaaS applications and automate workflows.

integrationsautomationsaas

🕷️

Apify MCP Server

Deploy and interact with Apify actors for web scraping and data extraction.

apifycrawlerdata

🌐

BrowserStack MCP

BrowserStack MCP Server for automated testing across multiple browsers.

testingqabrowsers

⚡

Zapier MCP

A Zapier server that provides automation capabilities for various apps.

zapierautomation