Speech to Text With Speakers

Function

Available ActionsEach successful request consumes credits as outlined below.

transcribe_quick^100crtranscribe_standard^150crtranscribe_extended^200cr

Details

Turn any audio recording into clean, searchable text in seconds. Transcribe voice memos, meetings, interviews, podcasts, and webinars with accurate speech recognition that handles accents and background noise. Get plain text for quick reference, SRT or WebVTT subtitles for video captioning, or rich JSON output with word-level timestamps and speaker identification. Choose from three tiers based on recording length — up to 15, 30, or 60 minutes — and optionally enable speaker diarization to label who said what, profanity filtering, and alternative transcripts for maximum accuracy.

Use Cases

Transcribe meeting recordings, Generate subtitles and captions for videos, Convert voice memos to searchable text, Transcribe podcast episodes, Create interview transcripts with speaker labels, Produce SRT or WebVTT subtitle files, Build searchable audio archives, Transcribe webinars and lectures, Analyze customer call recordings, Content repurposing from audio to text

Connect Your Agent In 5 Min

Watch the setup guide for your platform

Or Install Locally

STDIO connector for Claude Code, Codex, Cursor, Zed, and other LLMs that require STDIO or custom connections. This lightweight connector routes requests to https://api.agentpmt.com/mcp. All tool execution happens in the cloud and the server cannot edit any files on your computer.

npm install -g @agentpmt/mcp-routeragentpmt-setup

Actions(3)

transcribe_quick^100cr9 params

Transcribe audio up to 15 minutes.

file_idstring

File ID from a prior upload. Provide either file_id or public_url.

public_urlstring

HTTPS URL to a downloadable audio file. Provide either public_url or file_id.

language_codestring

Optional BCP-47 language code such as en-US; defaults to en-US if omitted.

output_formatstring

Output format for the transcription result.

Values:

textsrtvttjson

Default: text

enable_diarizationboolean

Enable speaker diarization when supported by the audio and model. Not supported when remove_filler_words is false.

Default: false

enable_word_timestampsboolean

Include word-level timing data in the output.

Default: false

remove_filler_wordsboolean

When true (default), return a cleaned transcript with disfluencies removed. When false, preserve filler words and disfluencies; this path does not support diarization or max_alternatives greater than 1.

Default: true

enable_profanity_filterboolean

Mask profanity in the returned transcript.

Default: false

max_alternativesinteger

Maximum number of alternative transcripts to return. Must be 1 when remove_filler_words is false.

Default: 1

Range: 1 - 5

transcribe_standard^150cr9 params

Transcribe audio up to 30 minutes.

file_idstring

File ID from a prior upload. Provide either file_id or public_url.

public_urlstring

HTTPS URL to a downloadable audio file. Provide either public_url or file_id.

language_codestring

Optional BCP-47 language code such as en-US; defaults to en-US if omitted.

output_formatstring

Output format for the transcription result.

Values:

textsrtvttjson

Default: text

enable_diarizationboolean

Enable speaker diarization when supported by the audio and model. Not supported when remove_filler_words is false.

Default: false

enable_word_timestampsboolean

Include word-level timing data in the output.

Default: false

remove_filler_wordsboolean

Default: true

enable_profanity_filterboolean

Mask profanity in the returned transcript.

Default: false

max_alternativesinteger

Maximum number of alternative transcripts to return. Must be 1 when remove_filler_words is false.

Default: 1

Range: 1 - 5

transcribe_extended^200cr9 params

Transcribe audio up to 60 minutes.

file_idstring

File ID from a prior upload. Provide either file_id or public_url.

public_urlstring

HTTPS URL to a downloadable audio file. Provide either public_url or file_id.

language_codestring

Optional BCP-47 language code such as en-US; defaults to en-US if omitted.

output_formatstring

Output format for the transcription result.

Values:

textsrtvttjson

Default: text

enable_diarizationboolean

Enable speaker diarization when supported by the audio and model. Not supported when remove_filler_words is false.

Default: false

enable_word_timestampsboolean

Include word-level timing data in the output.

Default: false

remove_filler_wordsboolean

Default: true

enable_profanity_filterboolean

Mask profanity in the returned transcript.

Default: false

max_alternativesinteger

Maximum number of alternative transcripts to return. Must be 1 when remove_filler_words is false.

Default: 1

Range: 1 - 5

curl -X POST "https://api.agentpmt.com/products/purchase" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer ********" \
  -d '{
    "product_id": "69ba14e4bbfb26a6333b14d3",
    "parameters": {
      "action": "transcribe_quick",
      "output_format": "text",
      "enable_diarization": false,
      "enable_word_timestamps": false,
      "remove_filler_words": true,
      "enable_profanity_filter": false,
      "max_alternatives": 1
    }
  }'

import requests
import json

url = "https://api.agentpmt.com/products/purchase"

headers = {
    "Content-Type": "application/json",
    "Authorization": "Bearer ********"
}

data = {
    "product_id": "69ba14e4bbfb26a6333b14d3",
    "parameters": {
        "action": "transcribe_quick",
        "output_format": "text",
        "enable_diarization": false,
        "enable_word_timestamps": false,
        "remove_filler_words": true,
        "enable_profanity_filter": false,
        "max_alternatives": 1
    }
}

response = requests.post(url, headers=headers, json=data)
print(response.status_code)
print(response.json())

const url = "https://api.agentpmt.com/products/purchase";

const headers = {
  "Content-Type": "application/json",
  "Authorization": "Bearer ********"
};

const data = {
  product_id: "69ba14e4bbfb26a6333b14d3",
  parameters: {
    "action": "transcribe_quick",
    "output_format": "text",
    "enable_diarization": false,
    "enable_word_timestamps": false,
    "remove_filler_words": true,
    "enable_profanity_filter": false,
    "max_alternatives": 1
  }
};

fetch(url, {
  method: "POST",
  headers,
  body: JSON.stringify(data)
})
  .then(response => response.json())
  .then(data => console.log(data))
  .catch(error => console.error("Error:", error));

const axios = require('axios');

const url = "https://api.agentpmt.com/products/purchase";

const headers = {
  "Content-Type": "application/json",
  "Authorization": "Bearer ********"
};

const data = {
  product_id: "69ba14e4bbfb26a6333b14d3",
  parameters: {
    "action": "transcribe_quick",
    "output_format": "text",
    "enable_diarization": false,
    "enable_word_timestamps": false,
    "remove_filler_words": true,
    "enable_profanity_filter": false,
    "max_alternatives": 1
  }
};

axios.post(url, data, { headers })
  .then(response => {
    console.log(response.status);
    console.log(response.data);
  })
  .catch(error => {
    console.error("Error:", error.message);
  });

Login to view your API and budget keys. The example above uses placeholder values. Sign in to see personalized code with your bearer token.

This tool supports credit-based access for external agents using AgentAddress identities or standard crypto wallets. External agents should use the External Agent API to buy credits with x402 and invoke this tool.

1. Buy Credits

Purchase credits via x402 payment (500 credit minimum, 100 credits = $1).

# Request payment requirements (returns 402 + PAYMENT-REQUIRED header)
curl -i -s -X POST "https://www.agentpmt.com/api/external/credits/purchase" \
  -H "Content-Type: application/json" \
  -d '{ "wallet_address":"0xYOUR_WALLET", "credits": 500, "payment_method":"x402" }'

# Sign the EIP-3009 authorization, then retry with signature header
curl -s -X POST "https://www.agentpmt.com/api/external/credits/purchase" \
  -H "Content-Type: application/json" \
  -H "PAYMENT-SIGNATURE: <base64-json>" \
  -d '{ "wallet_address":"0xYOUR_WALLET", "credits": 500, "payment_method":"x402" }'

2. Create a Session Nonce (nonce used in signed balance/invoke)

curl -s -X POST "https://www.agentpmt.com/api/external/auth/session" \
  -H "Content-Type: application/json" \
  -d '{ "wallet_address":"0xYOUR_WALLET" }'

3. Invoke This Tool

Sign the message with your wallet (EIP-191 personal-sign), then POST to the invoke endpoint.

# Sign this message (wallet MUST be lowercased):
# agentpmt-external
# wallet:0xyourwallet...
# session:<session_nonce>
# request:<request_id>
# method:POST
# path:/external/tools/speech-to-text-with-speakers/actions/<actionSlug>/invoke
# payload:<sha256(canonical_json(parameters))>

curl -s -X POST "https://www.agentpmt.com/api/external/tools/speech-to-text-with-speakers/actions/<actionSlug>/invoke" \
  -H "Content-Type: application/json" \
  -d '{
    "wallet_address": "0xYOUR_WALLET",
    "session_nonce": "<session_nonce>",
    "request_id": "invoke-uuid",
    "signature": "0x<signature>",
    "parameters": {
      "your_param": "value"
    }
  }'

Usage Instructions

Usage guidance provided directly by the developer for this product.

Speech to Text

Transcribe audio with one tool and choose the action that matches the upload length.

Tool Call Format

{
  "action": "get_instructions"
}

{
  "action": "transcribe_quick",
  "file_id": "FILE_ID",
  "language_code": "en-US",
  "output_format": "text"
}

{
  "action": "transcribe_standard",
  "public_url": "https://example.com/meeting.m4a",
  "output_format": "vtt",
  "enable_word_timestamps": true,
  "enable_diarization": true
}

{
  "action": "transcribe_extended",
  "public_url": "https://example.com/interview.webm",
  "output_format": "json",
  "max_alternatives": 2
}

{
  "action": "transcribe_standard",
  "file_id": "FILE_ID",
  "output_format": "json",
  "enable_word_timestamps": true,
  "remove_filler_words": false
}

Actions

transcribe_quick: audio up to 15 minutes. Price: 100 credits.
transcribe_standard: audio up to 30 minutes. Price: 150 credits.
transcribe_extended: audio up to 60 minutes. Price: 200 credits.

Notes

Provide either file_id or public_url.
public_url must be an HTTPS URL and cannot point to private or internal network addresses.
If language_code is omitted, the tool defaults to en-US.
Supported output formats: text, srt, vtt, json.
Optional controls: enable_diarization, enable_word_timestamps, remove_filler_words, enable_profanity_filter, max_alternatives.
remove_filler_words defaults to true, which uses Google STT V2's cleaned transcript path.
Set remove_filler_words to false to preserve disfluencies through Vercel AI Gateway using the openai/whisper-1 gateway model slug. This path always requests word-level timestamps from the gateway for clipping workflows.
remove_filler_words=false does not support enable_diarization=true or max_alternatives greater than 1; use the default cleaned path for those features.
Subtitle responses include inline subtitle content and may also include stored file links during normal platform invocations.

Speech to Text With Speakers

Available ActionsEach successful request consumes credits as outlined below.

Details

Use Cases

Connect Your Agent In 5 Min

Or Install Locally

Looking for help integrating AI into your business? Set up a free consultation.