# Speech to Text With Speakers

## Links

- Product page URL: https://www.agentpmt.com/marketplace/speech-to-text-with-speakers
- Product markdown URL: https://www.agentpmt.com/marketplace/speech-to-text-with-speakers?format=agent-md
- Product JSON URL: https://www.agentpmt.com/marketplace/speech-to-text-with-speakers?format=agent-json

## Overview

- Product ID: 69ba14e4bbfb26a6333b14d3
- Vendor: Apoth3osis
- Type: function
- Unit type: request
- Price: 10000 credits
- Categories: AI & Machine Learning, Automation, Data Processing, Text Processing & Manipulation, Audio & Sound Design, Document Processing & OCR, Video & Streaming
- Generated at: 2026-05-21T01:25:42.463Z

### Page Description

Turn any audio recording into clean, searchable text in seconds. Transcribe voice memos, meetings, interviews, podcasts, and webinars with accurate speech recognition that handles accents and background noise. Get plain text for quick reference, SRT or WebVTT subtitles for video captioning, or rich JSON output with word-level timestamps and speaker identification. Choose from three tiers based on recording length — up to 15, 30, or 60 minutes — and optionally enable speaker diarization to label who said what, profanity filtering, and alternative transcripts for maximum accuracy.

### Agent Description

Transcribe audio from file_id or public_url with three tiered actions for recordings up to 15, 30, or 60 minutes. Return text, SRT, VTT, or JSON output with optional speaker diarization, word timestamps, profanity filtering, and alternative transcriptions.

## Details

### Details

Turn any audio recording into clean, searchable text in seconds. Transcribe voice memos, meetings, interviews, podcasts, and webinars with accurate speech recognition that handles accents and background noise. Get plain text for quick reference, SRT or WebVTT subtitles for video captioning, or rich JSON output with word-level timestamps and speaker identification. Choose from three tiers based on recording length — up to 15, 30, or 60 minutes — and optionally enable speaker diarization to label who said what, profanity filtering, and alternative transcripts for maximum accuracy.

### Actions

- `transcribe_quick` (100 credits): Transcribe audio up to 15 minutes.
- `transcribe_standard` (150 credits): Transcribe audio up to 30 minutes.
- `transcribe_extended` (200 credits): Transcribe audio up to 60 minutes.

### Use Cases

Transcribe meeting recordings, Generate subtitles and captions for videos, Convert voice memos to searchable text, Transcribe podcast episodes, Create interview transcripts with speaker labels, Produce SRT or WebVTT subtitle files, Build searchable audio archives, Transcribe webinars and lectures, Analyze customer call recordings, Content repurposing from audio to text

### Workflows Using This Tool

No public workflows currently reference this product.

### Related Content

No related content is currently linked to this product.

## Integration Details

### DynamicMCP

- Setup page URL: https://www.agentpmt.com/dynamic-mcp
- Claude setup guide: https://www.agentpmt.com/dynamic-mcp?platform=claude#videos
- ChatGPT setup guide: https://www.agentpmt.com/dynamic-mcp?platform=chatgpt#videos
- Cursor setup guide: https://www.agentpmt.com/dynamic-mcp?platform=cursor#videos
- Windsurf setup guide: https://www.agentpmt.com/dynamic-mcp?platform=windsurf#videos

STDIO connector for Claude Code, Codex, Cursor, Zed, and other LLMs that require STDIO or custom connections. This lightweight connector routes requests to `https://api.agentpmt.com/mcp`. All tool execution happens in the cloud and the server cannot edit any files on your computer.

```bash
npm install -g @agentpmt/mcp-router
agentpmt-setup
```

### REST API

The live page renders cURL, Python, JavaScript, and Node.js examples. Logged-in users see those examples prefilled with their own API and budget credentials.

- Purchase endpoint: https://api.agentpmt.com/products/purchase
- Authorization format: `Bearer <base64(apiKey:budgetKey)>`

```bash
curl -X POST "https://api.agentpmt.com/products/purchase" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer eW91ci1hcGkta2V5LWhlcmU6eW91ci1idWRnZXQta2V5LWhlcmU=" \
  -d '{
    "product_id": "69ba14e4bbfb26a6333b14d3",
    "parameters": {
      "action": "transcribe_quick",
      "output_format": "text",
      "enable_diarization": false,
      "enable_word_timestamps": false,
      "remove_filler_words": true,
      "enable_profanity_filter": false,
      "max_alternatives": 1
    }
  }'
```

### Autonomous Agents

Do not use the abbreviated instructions in this product markdown for wallet-based invocation. Retrieve the full External Agent API markdown document instead.

- External Agent API page URL: https://www.agentpmt.com/external-agent-api
- External Agent API markdown URL: https://www.agentpmt.com/external-agent-api?format=agent-md

### Schema

#### Parameters

- Schema type: actions

```json
{
  "actions": {
    "transcribe_quick": {
      "description": "Transcribe audio up to 15 minutes.",
      "properties": {
        "file_id": {
          "type": "string",
          "description": "File ID from a prior upload. Provide either file_id or public_url.",
          "required": false
        },
        "public_url": {
          "type": "string",
          "description": "HTTPS URL to a downloadable audio file. Provide either public_url or file_id.",
          "required": false
        },
        "language_code": {
          "type": "string",
          "description": "Optional BCP-47 language code such as en-US; defaults to en-US if omitted.",
          "required": false
        },
        "output_format": {
          "type": "string",
          "description": "Output format for the transcription result.",
          "required": false,
          "enum": [
            "text",
            "srt",
            "vtt",
            "json"
          ],
          "default": "text"
        },
        "enable_diarization": {
          "type": "boolean",
          "description": "Enable speaker diarization when supported by the audio and model. Not supported when remove_filler_words is false.",
          "required": false,
          "default": false
        },
        "enable_word_timestamps": {
          "type": "boolean",
          "description": "Include word-level timing data in the output.",
          "required": false,
          "default": false
        },
        "remove_filler_words": {
          "type": "boolean",
          "description": "When true (default), return a cleaned transcript with disfluencies removed. When false, preserve filler words and disfluencies; this path does not support diarization or max_alternatives greater than 1.",
          "required": false,
          "default": true
        },
        "enable_profanity_filter": {
          "type": "boolean",
          "description": "Mask profanity in the returned transcript.",
          "required": false,
          "default": false
        },
        "max_alternatives": {
          "type": "integer",
          "description": "Maximum number of alternative transcripts to return. Must be 1 when remove_filler_words is false.",
          "required": false,
          "minimum": 1,
          "maximum": 5,
          "default": 1
        }
      },
      "price_per_unit": 100
    },
    "transcribe_standard": {
      "description": "Transcribe audio up to 30 minutes.",
      "properties": {
        "file_id": {
          "type": "string",
          "description": "File ID from a prior upload. Provide either file_id or public_url.",
          "required": false
        },
        "public_url": {
          "type": "string",
          "description": "HTTPS URL to a downloadable audio file. Provide either public_url or file_id.",
          "required": false
        },
        "language_code": {
          "type": "string",
          "description": "Optional BCP-47 language code such as en-US; defaults to en-US if omitted.",
          "required": false
        },
        "output_format": {
          "type": "string",
          "description": "Output format for the transcription result.",
          "required": false,
          "enum": [
            "text",
            "srt",
            "vtt",
            "json"
          ],
          "default": "text"
        },
        "enable_diarization": {
          "type": "boolean",
          "description": "Enable speaker diarization when supported by the audio and model. Not supported when remove_filler_words is false.",
          "required": false,
          "default": false
        },
        "enable_word_timestamps": {
          "type": "boolean",
          "description": "Include word-level timing data in the output.",
          "required": false,
          "default": false
        },
        "remove_filler_words": {
          "type": "boolean",
          "description": "When true (default), return a cleaned transcript with disfluencies removed. When false, preserve filler words and disfluencies; this path does not support diarization or max_alternatives greater than 1.",
          "required": false,
          "default": true
        },
        "enable_profanity_filter": {
          "type": "boolean",
          "description": "Mask profanity in the returned transcript.",
          "required": false,
          "default": false
        },
        "max_alternatives": {
          "type": "integer",
          "description": "Maximum number of alternative transcripts to return. Must be 1 when remove_filler_words is false.",
          "required": false,
          "minimum": 1,
          "maximum": 5,
          "default": 1
        }
      },
      "price_per_unit": 150
    },
    "transcribe_extended": {
      "description": "Transcribe audio up to 60 minutes.",
      "properties": {
        "file_id": {
          "type": "string",
          "description": "File ID from a prior upload. Provide either file_id or public_url.",
          "required": false
        },
        "public_url": {
          "type": "string",
          "description": "HTTPS URL to a downloadable audio file. Provide either public_url or file_id.",
          "required": false
        },
        "language_code": {
          "type": "string",
          "description": "Optional BCP-47 language code such as en-US; defaults to en-US if omitted.",
          "required": false
        },
        "output_format": {
          "type": "string",
          "description": "Output format for the transcription result.",
          "required": false,
          "enum": [
            "text",
            "srt",
            "vtt",
            "json"
          ],
          "default": "text"
        },
        "enable_diarization": {
          "type": "boolean",
          "description": "Enable speaker diarization when supported by the audio and model. Not supported when remove_filler_words is false.",
          "required": false,
          "default": false
        },
        "enable_word_timestamps": {
          "type": "boolean",
          "description": "Include word-level timing data in the output.",
          "required": false,
          "default": false
        },
        "remove_filler_words": {
          "type": "boolean",
          "description": "When true (default), return a cleaned transcript with disfluencies removed. When false, preserve filler words and disfluencies; this path does not support diarization or max_alternatives greater than 1.",
          "required": false,
          "default": true
        },
        "enable_profanity_filter": {
          "type": "boolean",
          "description": "Mask profanity in the returned transcript.",
          "required": false,
          "default": false
        },
        "max_alternatives": {
          "type": "integer",
          "description": "Maximum number of alternative transcripts to return. Must be 1 when remove_filler_words is false.",
          "required": false,
          "minimum": 1,
          "maximum": 5,
          "default": 1
        }
      },
      "price_per_unit": 200
    }
  }
}
```

### Usage Instructions

# Speech to Text

Transcribe audio with one tool and choose the action that matches the upload length.

## Tool Call Format

```json
{
  "action": "get_instructions"
}
```

```json
{
  "action": "transcribe_quick",
  "file_id": "FILE_ID",
  "language_code": "en-US",
  "output_format": "text"
}
```

```json
{
  "action": "transcribe_standard",
  "public_url": "https://example.com/meeting.m4a",
  "output_format": "vtt",
  "enable_word_timestamps": true,
  "enable_diarization": true
}
```

```json
{
  "action": "transcribe_extended",
  "public_url": "https://example.com/interview.webm",
  "output_format": "json",
  "max_alternatives": 2
}
```

```json
{
  "action": "transcribe_standard",
  "file_id": "FILE_ID",
  "output_format": "json",
  "enable_word_timestamps": true,
  "remove_filler_words": false
}
```

## Actions

- `transcribe_quick`: audio up to 15 minutes. Price: 100 credits.
- `transcribe_standard`: audio up to 30 minutes. Price: 150 credits.
- `transcribe_extended`: audio up to 60 minutes. Price: 200 credits.

## Notes

- Provide either `file_id` or `public_url`.
- `public_url` must be an HTTPS URL and cannot point to private or internal network addresses.
- If `language_code` is omitted, the tool defaults to `en-US`.
- Supported output formats: `text`, `srt`, `vtt`, `json`.
- Optional controls: `enable_diarization`, `enable_word_timestamps`, `remove_filler_words`, `enable_profanity_filter`, `max_alternatives`.
- `remove_filler_words` defaults to `true`, which uses Google STT V2's cleaned transcript path.
- Set `remove_filler_words` to `false` to preserve disfluencies through Vercel AI Gateway using the `openai/whisper-1` gateway model slug. This path always requests word-level timestamps from the gateway for clipping workflows.
- `remove_filler_words=false` does not support `enable_diarization=true` or `max_alternatives` greater than `1`; use the default cleaned path for those features.
- Subtitle responses include inline subtitle content and may also include stored file links during normal platform invocations.

### Frequently Asked Questions

No linked FAQs are currently available.

### Dependencies

This product has no public dependency products.