# Speech to Text With Speakers

## Links

- Product page URL: https://www.agentpmt.com/marketplace/speech-to-text-with-speakers
- Product markdown URL: https://www.agentpmt.com/marketplace/speech-to-text-with-speakers?format=agent-md
- Product JSON URL: https://www.agentpmt.com/marketplace/speech-to-text-with-speakers?format=agent-json

## Overview

- Product ID: 69ba14e4bbfb26a6333b14d3
- Type: function
- Unit type: request
- Price: 10000 credits
- Categories: AI & Machine Learning, Automation, Data Processing, Text Processing & Manipulation, Audio & Sound Design, Document Processing & OCR, Video & Streaming
- Generated at: 2026-07-05T02:55:36.241Z

### Page Description

Turn any audio recording into clean, searchable text in seconds. Transcribe voice memos, meetings, interviews, podcasts, and webinars with accurate speech recognition that handles accents and background noise. Get plain text for quick reference, SRT or WebVTT subtitles for video captioning, or rich JSON output with word-level timestamps and speaker identification. Choose from three tiers based on recording length — up to 15, 30, or 60 minutes — and optionally enable speaker diarization to label who said what, profanity filtering, and alternative transcripts for maximum accuracy.

### Agent Description

Transcribe audio from file_id or public_url with three tiered actions for recordings up to 15, 30, or 60 minutes. Return text, SRT, VTT, or JSON output with optional speaker diarization, word timestamps, profanity filtering, and alternative transcriptions.

## Details

### Details

Turn any audio recording into clean, searchable text in seconds. Transcribe voice memos, meetings, interviews, podcasts, and webinars with accurate speech recognition that handles accents and background noise. Get plain text for quick reference, SRT or WebVTT subtitles for video captioning, or rich JSON output with word-level timestamps and speaker identification. Choose from three tiers based on recording length — up to 15, 30, or 60 minutes — and optionally enable speaker diarization to label who said what, profanity filtering, and alternative transcripts for maximum accuracy.

### Actions

- `transcribe_quick` (100 credits): Transcribe audio up to 15 minutes.
- `transcribe_standard` (150 credits): Transcribe audio up to 30 minutes.
- `transcribe_extended` (200 credits): Transcribe audio up to 60 minutes.

### Use Cases

Transcribe meeting recordings, Generate subtitles and captions for videos, Convert voice memos to searchable text, Transcribe podcast episodes, Create interview transcripts with speaker labels, Produce SRT or WebVTT subtitle files, Build searchable audio archives, Transcribe webinars and lectures, Analyze customer call recordings, Content repurposing from audio to text

### Workflows Using This Tool

No public workflows currently reference this product.

### Related Content

#### Artificial Intelligence Medical Scribe, Captions, and Transcripts on AgentPMT

- Type: article
- Page URL: https://www.agentpmt.com/articles/artificial-intelligence-medical-scribe-captions-and-transcripts-on-agentpmt
- Markdown URL: https://www.agentpmt.com/articles/artificial-intelligence-medical-scribe-captions-and-transcripts-on-agentpmt?format=agent-md
Speech to Text With Speakers, built by Apoth3osis, is now live on AgentPMT, a managed connector that turns any recording into accurate text, SRT/VTT captions, or timestamped JSON with speaker diarization across 15-, 30-, and 60-minute tiers. Agents call it through the dynamic MCP server and pay only when a transcription succeeds.

#### Animal Artificial Intelligence Learns to Read the Wild

- Type: article
- Page URL: https://www.agentpmt.com/articles/animal-artificial-intelligence-learns-to-read-the-wild
- Markdown URL: https://www.agentpmt.com/articles/animal-artificial-intelligence-learns-to-read-the-wild?format=agent-md
In a single week, a cluster of research releases showed AI in the animal world moving from finding and counting animals to reading them: re-identifying individuals on GPU-free hardware, inferring diet from feeding sounds, and mapping a songbird's calls. With the capture problem largely solved, the advantage now shifts to the operational work around the model, choosing it on cost and quality, keeping a human on high-stakes calls, and recording why it decided what it did.

## Integration Details

### DynamicMCP

- Setup page URL: https://www.agentpmt.com/dynamic-mcp
- Claude setup guide: https://www.agentpmt.com/dynamic-mcp#platform=claude
- ChatGPT setup guide: https://www.agentpmt.com/dynamic-mcp#platform=chatgpt
- Cursor setup guide: https://www.agentpmt.com/dynamic-mcp#platform=cursor
- Windsurf setup guide: https://www.agentpmt.com/dynamic-mcp#platform=windsurf

Use the local router for command-based MCP clients. It forwards requests to `https://api.agentpmt.com/mcp` and does not execute tools locally.

```bash
npm install -g @agentpmt/mcp-router
agentpmt-setup
```

### REST API

The live page renders cURL, Python, JavaScript, and Node.js examples. Logged-in users see those examples prefilled with their own API and budget credentials.

- Purchase endpoint: https://api.agentpmt.com/products/purchase
- Authorization format: `Bearer <base64(apiKey:budgetKey)>`

```bash
curl -X POST "https://api.agentpmt.com/products/purchase" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer eW91ci1hcGkta2V5LWhlcmU6eW91ci1idWRnZXQta2V5LWhlcmU=" \
  -d '{
    "product_id": "69ba14e4bbfb26a6333b14d3",
    "parameters": {
      "action": "transcribe_quick",
      "output_format": "text",
      "enable_diarization": false,
      "enable_word_timestamps": false,
      "remove_filler_words": true,
      "enable_profanity_filter": false,
      "max_alternatives": 1
    }
  }'
```

### Autonomous Agents

Autonomous agents can access this tool through AgentAddress credit balances or direct x402 payments. Use the Autonomous Agent API reference for endpoint shapes after choosing the access pattern below.

- Autonomous Agent API reference URL: https://www.agentpmt.com/docs/api-reference/autonomous-agents
- Autonomous Agent API reference markdown URL: https://www.agentpmt.com/docs/api-reference/autonomous-agents?format=agent-md
- Credit-Based Access Using AgentAddress: https://www.agentpmt.com/docs/autonomous-agents/credit-based-tool-usage-with-agentaddress
- AgentAddress is preferred for persistent file access, stored platform state, and maximum tool use ability across repeated calls.
- Direct x402 is for independent one-off tool calls that do not require shared files or stored platform state.
- Direct x402 public payments: USDC on Base, Arbitrum, Optimism, Polygon, and Avalanche.

#### Product Skill Package

This product has a published Agent Skill package for product-specific operating instructions.

- Skill slug: speech-to-text-with-speakers
- Version: 1.0.0
- Download SKILL.md: https://raw.githubusercontent.com/AgentPMT/agent-skills/main/skills/speech-to-text-with-speakers/SKILL.md
- Package source: https://github.com/AgentPMT/agent-skills/tree/main/skills/speech-to-text-with-speakers
- OpenClaw listing: https://clawhub.ai/agentpmt/speech-to-text-with-speakers
- OpenClaw install: `openclaw skills install speech-to-text-with-speakers`
- skills.sh install: `npx skills add AgentPMT/agent-skills --skill speech-to-text-with-speakers`
- Last published: 2026-06-24T09:37:54.153Z

### Schema

#### Parameters

- Schema type: actions

```json
{
  "actions": {
    "transcribe_quick": {
      "description": "Transcribe audio up to 15 minutes.",
      "properties": {
        "file_id": {
          "type": "string",
          "description": "File ID from a prior upload. Provide either file_id or public_url.",
          "required": false
        },
        "public_url": {
          "type": "string",
          "description": "HTTPS URL to a downloadable audio file. Provide either public_url or file_id.",
          "required": false
        },
        "language_code": {
          "type": "string",
          "description": "Optional BCP-47 language code such as en-US; defaults to en-US if omitted.",
          "required": false
        },
        "output_format": {
          "type": "string",
          "description": "Output format for the transcription result.",
          "required": false,
          "enum": [
            "text",
            "srt",
            "vtt",
            "json"
          ],
          "default": "text"
        },
        "enable_diarization": {
          "type": "boolean",
          "description": "Enable speaker diarization when supported by the audio and model. Not supported when remove_filler_words is false.",
          "required": false,
          "default": false
        },
        "enable_word_timestamps": {
          "type": "boolean",
          "description": "Include word-level timing data in the output.",
          "required": false,
          "default": false
        },
        "remove_filler_words": {
          "type": "boolean",
          "description": "When true (default), return a cleaned transcript with disfluencies removed. When false, preserve filler words and disfluencies; this path does not support diarization or max_alternatives greater than 1.",
          "required": false,
          "default": true
        },
        "enable_profanity_filter": {
          "type": "boolean",
          "description": "Mask profanity in the returned transcript.",
          "required": false,
          "default": false
        },
        "max_alternatives": {
          "type": "integer",
          "description": "Maximum number of alternative transcripts to return. Must be 1 when remove_filler_words is false.",
          "required": false,
          "minimum": 1,
          "maximum": 5,
          "default": 1
        }
      },
      "price_per_unit": 100
    },
    "transcribe_standard": {
      "description": "Transcribe audio up to 30 minutes.",
      "properties": {
        "file_id": {
          "type": "string",
          "description": "File ID from a prior upload. Provide either file_id or public_url.",
          "required": false
        },
        "public_url": {
          "type": "string",
          "description": "HTTPS URL to a downloadable audio file. Provide either public_url or file_id.",
          "required": false
        },
        "language_code": {
          "type": "string",
          "description": "Optional BCP-47 language code such as en-US; defaults to en-US if omitted.",
          "required": false
        },
        "output_format": {
          "type": "string",
          "description": "Output format for the transcription result.",
          "required": false,
          "enum": [
            "text",
            "srt",
            "vtt",
            "json"
          ],
          "default": "text"
        },
        "enable_diarization": {
          "type": "boolean",
          "description": "Enable speaker diarization when supported by the audio and model. Not supported when remove_filler_words is false.",
          "required": false,
          "default": false
        },
        "enable_word_timestamps": {
          "type": "boolean",
          "description": "Include word-level timing data in the output.",
          "required": false,
          "default": false
        },
        "remove_filler_words": {
          "type": "boolean",
          "description": "When true (default), return a cleaned transcript with disfluencies removed. When false, preserve filler words and disfluencies; this path does not support diarization or max_alternatives greater than 1.",
          "required": false,
          "default": true
        },
        "enable_profanity_filter": {
          "type": "boolean",
          "description": "Mask profanity in the returned transcript.",
          "required": false,
          "default": false
        },
        "max_alternatives": {
          "type": "integer",
          "description": "Maximum number of alternative transcripts to return. Must be 1 when remove_filler_words is false.",
          "required": false,
          "minimum": 1,
          "maximum": 5,
          "default": 1
        }
      },
      "price_per_unit": 150
    },
    "transcribe_extended": {
      "description": "Transcribe audio up to 60 minutes.",
      "properties": {
        "file_id": {
          "type": "string",
          "description": "File ID from a prior upload. Provide either file_id or public_url.",
          "required": false
        },
        "public_url": {
          "type": "string",
          "description": "HTTPS URL to a downloadable audio file. Provide either public_url or file_id.",
          "required": false
        },
        "language_code": {
          "type": "string",
          "description": "Optional BCP-47 language code such as en-US; defaults to en-US if omitted.",
          "required": false
        },
        "output_format": {
          "type": "string",
          "description": "Output format for the transcription result.",
          "required": false,
          "enum": [
            "text",
            "srt",
            "vtt",
            "json"
          ],
          "default": "text"
        },
        "enable_diarization": {
          "type": "boolean",
          "description": "Enable speaker diarization when supported by the audio and model. Not supported when remove_filler_words is false.",
          "required": false,
          "default": false
        },
        "enable_word_timestamps": {
          "type": "boolean",
          "description": "Include word-level timing data in the output.",
          "required": false,
          "default": false
        },
        "remove_filler_words": {
          "type": "boolean",
          "description": "When true (default), return a cleaned transcript with disfluencies removed. When false, preserve filler words and disfluencies; this path does not support diarization or max_alternatives greater than 1.",
          "required": false,
          "default": true
        },
        "enable_profanity_filter": {
          "type": "boolean",
          "description": "Mask profanity in the returned transcript.",
          "required": false,
          "default": false
        },
        "max_alternatives": {
          "type": "integer",
          "description": "Maximum number of alternative transcripts to return. Must be 1 when remove_filler_words is false.",
          "required": false,
          "minimum": 1,
          "maximum": 5,
          "default": 1
        }
      },
      "price_per_unit": 200
    }
  }
}
```

### Usage Instructions

# Speech to Text

Transcribe audio with one tool and choose the action that matches the upload length.

## Tool Call Format

```json
{
  "action": "get_instructions"
}
```

```json
{
  "action": "transcribe_quick",
  "file_id": "FILE_ID",
  "language_code": "en-US",
  "output_format": "text"
}
```

```json
{
  "action": "transcribe_standard",
  "public_url": "https://example.com/meeting.m4a",
  "output_format": "vtt",
  "enable_word_timestamps": true,
  "enable_diarization": true
}
```

```json
{
  "action": "transcribe_extended",
  "public_url": "https://example.com/interview.webm",
  "output_format": "json",
  "max_alternatives": 2
}
```

```json
{
  "action": "transcribe_standard",
  "file_id": "FILE_ID",
  "output_format": "json",
  "enable_word_timestamps": true,
  "remove_filler_words": false
}
```

## Actions

- `transcribe_quick`: audio up to 15 minutes. Price: 100 credits.
- `transcribe_standard`: audio up to 30 minutes. Price: 150 credits.
- `transcribe_extended`: audio up to 60 minutes. Price: 200 credits.

## Notes

- Provide either `file_id` or `public_url`.
- `public_url` must be an HTTPS URL and cannot point to private or internal network addresses.
- If `language_code` is omitted, the tool defaults to `en-US`.
- Supported output formats: `text`, `srt`, `vtt`, `json`.
- Optional controls: `enable_diarization`, `enable_word_timestamps`, `remove_filler_words`, `enable_profanity_filter`, `max_alternatives`.
- `remove_filler_words` defaults to `true`, which uses Google STT V2's cleaned transcript path.
- Set `remove_filler_words` to `false` to preserve disfluencies through Vercel AI Gateway using the `openai/whisper-1` gateway model slug. This path always requests word-level timestamps from the gateway for clipping workflows.
- `remove_filler_words=false` does not support `enable_diarization=true` or `max_alternatives` greater than `1`; use the default cleaned path for those features.
- Subtitle responses include inline subtitle content and may also include stored file links during normal platform invocations.

### Frequently Asked Questions

#### How do I connect this tool to an external agent?

- Page URL: https://www.agentpmt.com/faq
- Markdown URL: https://www.agentpmt.com/faq?format=agent-md

You can install the local MCP server by opening a terminal and running:

```
npm install -g @agentpmt/mcp-router
agentpmt-setup
```

This will connect you to local agents like Claude Code, Windsurf, Grok Build, Cursor, etc.

Alternatively you can connect to the hosted version with this config block, no installation required:

```
{
  "mcpServers": {
    "agentpmt": {
      "type": "streamable-http",
      "url": "https://api.agentpmt.com/mcp",
      "headers": {
        "Authorization": "Bearer <AGENTPMT_BEARER_TOKEN>",
        "x-instance-metadata": "{\"client\":\"generic-mcp\",\"platform\":\"remote\"}"
      }
    }
  }
}
```

[View MCP Connection Instructions](/docs/mcp-reference/connection) for more details.

#### How does an external agent use this tool?

- Page URL: https://www.agentpmt.com/faq
- Markdown URL: https://www.agentpmt.com/faq?format=agent-md

After the external agent is connected to an Agent Group that can use this tool, paste this prompt into the agent:

> Use the AgentPMT-Tool-Search-and-Execution tool. First call action 'get\_instructions' so you know how to use the tool search interface. Then call action 'get\_schema' with tool\_id 69ba14e4bbfb26a6333b14d3 ("Speech to Text With Speakers"). After reading the schema and any returned instructions, tell me what this tool can do, we are going to be using it

The agent should fetch the tool schema first, collect the required parameters for your request, and then call the tool through AgentPMT.

### Dependencies

This product has no public dependency products.