

Speech to Text With Speakers
Function
Available ActionsEach successful request consumes credits as outlined below.
transcribe_quick100crtranscribe_standard150crtranscribe_extended200cr
Details
Turn any audio recording into clean, searchable text in seconds. Transcribe voice memos, meetings, interviews, podcasts, and webinars with accurate speech recognition that handles accents and background noise. Get plain text for quick reference, SRT or WebVTT subtitles for video captioning, or rich JSON output with word-level timestamps and speaker identification. Choose from three tiers based on recording length — up to 15, 30, or 60 minutes — and optionally enable speaker diarization to label who said what, profanity filtering, and alternative transcripts for maximum accuracy.
Use Cases
Transcribe meeting recordings, Generate subtitles and captions for videos, Convert voice memos to searchable text, Transcribe podcast episodes, Create interview transcripts with speaker labels, Produce SRT or WebVTT subtitle files, Build searchable audio archives, Transcribe webinars and lectures, Analyze customer call recordings, Content repurposing from audio to text
Connect Your Agent In 5 Min
Watch the setup guide for your platform
Or Install Locally
STDIO connector for Claude Code, Codex, Cursor, Zed, and other LLMs that require STDIO or custom connections. This lightweight connector routes requests to https://api.agentpmt.com/mcp. All tool execution happens in the cloud and the server cannot edit any files on your computer.
npm install -g @agentpmt/mcp-routeragentpmt-setupActions(3)
transcribe_quick100cr9 paramsTranscribe audio up to 15 minutes.
transcribe_quick100cr9 paramsTranscribe audio up to 15 minutes.
file_idstringFile ID from a prior upload. Provide either file_id or public_url.
public_urlstringHTTPS URL to a downloadable audio file. Provide either public_url or file_id.
language_codestringOptional BCP-47 language code such as en-US; defaults to en-US if omitted.
output_formatstringOutput format for the transcription result.
Values:
textsrtvttjson
Default:
textenable_diarizationbooleanEnable speaker diarization when supported by the audio and model. Not supported when remove_filler_words is false.
Default:
falseenable_word_timestampsbooleanInclude word-level timing data in the output.
Default:
falseremove_filler_wordsbooleanWhen true (default), return a cleaned transcript with disfluencies removed. When false, preserve filler words and disfluencies; this path does not support diarization or max_alternatives greater than 1.
Default:
trueenable_profanity_filterbooleanMask profanity in the returned transcript.
Default:
falsemax_alternativesintegerMaximum number of alternative transcripts to return. Must be 1 when remove_filler_words is false.
Default:
1Range: 1 - 5
transcribe_standard150cr9 paramsTranscribe audio up to 30 minutes.
transcribe_standard150cr9 paramsTranscribe audio up to 30 minutes.
file_idstringFile ID from a prior upload. Provide either file_id or public_url.
public_urlstringHTTPS URL to a downloadable audio file. Provide either public_url or file_id.
language_codestringOptional BCP-47 language code such as en-US; defaults to en-US if omitted.
output_formatstringOutput format for the transcription result.
Values:
textsrtvttjson
Default:
textenable_diarizationbooleanEnable speaker diarization when supported by the audio and model. Not supported when remove_filler_words is false.
Default:
falseenable_word_timestampsbooleanInclude word-level timing data in the output.
Default:
falseremove_filler_wordsbooleanWhen true (default), return a cleaned transcript with disfluencies removed. When false, preserve filler words and disfluencies; this path does not support diarization or max_alternatives greater than 1.
Default:
trueenable_profanity_filterbooleanMask profanity in the returned transcript.
Default:
falsemax_alternativesintegerMaximum number of alternative transcripts to return. Must be 1 when remove_filler_words is false.
Default:
1Range: 1 - 5
transcribe_extended200cr9 paramsTranscribe audio up to 60 minutes.
transcribe_extended200cr9 paramsTranscribe audio up to 60 minutes.
file_idstringFile ID from a prior upload. Provide either file_id or public_url.
public_urlstringHTTPS URL to a downloadable audio file. Provide either public_url or file_id.
language_codestringOptional BCP-47 language code such as en-US; defaults to en-US if omitted.
output_formatstringOutput format for the transcription result.
Values:
textsrtvttjson
Default:
textenable_diarizationbooleanEnable speaker diarization when supported by the audio and model. Not supported when remove_filler_words is false.
Default:
falseenable_word_timestampsbooleanInclude word-level timing data in the output.
Default:
falseremove_filler_wordsbooleanWhen true (default), return a cleaned transcript with disfluencies removed. When false, preserve filler words and disfluencies; this path does not support diarization or max_alternatives greater than 1.
Default:
trueenable_profanity_filterbooleanMask profanity in the returned transcript.
Default:
falsemax_alternativesintegerMaximum number of alternative transcripts to return. Must be 1 when remove_filler_words is false.
Default:
1Range: 1 - 5
curl -X POST "https://api.agentpmt.com/products/purchase" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer ********" \
-d '{
"product_id": "69ba14e4bbfb26a6333b14d3",
"parameters": {
"action": "transcribe_quick",
"output_format": "text",
"enable_diarization": false,
"enable_word_timestamps": false,
"remove_filler_words": true,
"enable_profanity_filter": false,
"max_alternatives": 1
}
}'import requests
import json
url = "https://api.agentpmt.com/products/purchase"
headers = {
"Content-Type": "application/json",
"Authorization": "Bearer ********"
}
data = {
"product_id": "69ba14e4bbfb26a6333b14d3",
"parameters": {
"action": "transcribe_quick",
"output_format": "text",
"enable_diarization": false,
"enable_word_timestamps": false,
"remove_filler_words": true,
"enable_profanity_filter": false,
"max_alternatives": 1
}
}
response = requests.post(url, headers=headers, json=data)
print(response.status_code)
print(response.json())const url = "https://api.agentpmt.com/products/purchase";
const headers = {
"Content-Type": "application/json",
"Authorization": "Bearer ********"
};
const data = {
product_id: "69ba14e4bbfb26a6333b14d3",
parameters: {
"action": "transcribe_quick",
"output_format": "text",
"enable_diarization": false,
"enable_word_timestamps": false,
"remove_filler_words": true,
"enable_profanity_filter": false,
"max_alternatives": 1
}
};
fetch(url, {
method: "POST",
headers,
body: JSON.stringify(data)
})
.then(response => response.json())
.then(data => console.log(data))
.catch(error => console.error("Error:", error));const axios = require('axios');
const url = "https://api.agentpmt.com/products/purchase";
const headers = {
"Content-Type": "application/json",
"Authorization": "Bearer ********"
};
const data = {
product_id: "69ba14e4bbfb26a6333b14d3",
parameters: {
"action": "transcribe_quick",
"output_format": "text",
"enable_diarization": false,
"enable_word_timestamps": false,
"remove_filler_words": true,
"enable_profanity_filter": false,
"max_alternatives": 1
}
};
axios.post(url, data, { headers })
.then(response => {
console.log(response.status);
console.log(response.data);
})
.catch(error => {
console.error("Error:", error.message);
});Login to view your API and budget keys. The example above uses placeholder values. Sign in to see personalized code with your bearer token.
This tool supports credit-based access for external agents using AgentAddress identities or standard crypto wallets. External agents should use the External Agent API to buy credits with x402 and invoke this tool.
1. Buy Credits
Purchase credits via x402 payment (500 credit minimum, 100 credits = $1).
# Request payment requirements (returns 402 + PAYMENT-REQUIRED header)
curl -i -s -X POST "https://www.agentpmt.com/api/external/credits/purchase" \
-H "Content-Type: application/json" \
-d '{ "wallet_address":"0xYOUR_WALLET", "credits": 500, "payment_method":"x402" }'
# Sign the EIP-3009 authorization, then retry with signature header
curl -s -X POST "https://www.agentpmt.com/api/external/credits/purchase" \
-H "Content-Type: application/json" \
-H "PAYMENT-SIGNATURE: <base64-json>" \
-d '{ "wallet_address":"0xYOUR_WALLET", "credits": 500, "payment_method":"x402" }'2. Create a Session Nonce (nonce used in signed balance/invoke)
curl -s -X POST "https://www.agentpmt.com/api/external/auth/session" \
-H "Content-Type: application/json" \
-d '{ "wallet_address":"0xYOUR_WALLET" }'3. Invoke This Tool
Sign the message with your wallet (EIP-191 personal-sign), then POST to the invoke endpoint.
# Sign this message (wallet MUST be lowercased):
# agentpmt-external
# wallet:0xyourwallet...
# session:<session_nonce>
# request:<request_id>
# method:POST
# path:/external/tools/speech-to-text-with-speakers/actions/<actionSlug>/invoke
# payload:<sha256(canonical_json(parameters))>
curl -s -X POST "https://www.agentpmt.com/api/external/tools/speech-to-text-with-speakers/actions/<actionSlug>/invoke" \
-H "Content-Type: application/json" \
-d '{
"wallet_address": "0xYOUR_WALLET",
"session_nonce": "<session_nonce>",
"request_id": "invoke-uuid",
"signature": "0x<signature>",
"parameters": {
"your_param": "value"
}
}'Usage Instructions
Usage guidance provided directly by the developer for this product.
Speech to Text
Transcribe audio with one tool and choose the action that matches the upload length.
Tool Call Format
{
"action": "get_instructions"
}
{
"action": "transcribe_quick",
"file_id": "FILE_ID",
"language_code": "en-US",
"output_format": "text"
}
{
"action": "transcribe_standard",
"public_url": "https://example.com/meeting.m4a",
"output_format": "vtt",
"enable_word_timestamps": true,
"enable_diarization": true
}
{
"action": "transcribe_extended",
"public_url": "https://example.com/interview.webm",
"output_format": "json",
"max_alternatives": 2
}
{
"action": "transcribe_standard",
"file_id": "FILE_ID",
"output_format": "json",
"enable_word_timestamps": true,
"remove_filler_words": false
}
Actions
transcribe_quick: audio up to 15 minutes. Price: 100 credits.transcribe_standard: audio up to 30 minutes. Price: 150 credits.transcribe_extended: audio up to 60 minutes. Price: 200 credits.
Notes
- Provide either
file_idorpublic_url. public_urlmust be an HTTPS URL and cannot point to private or internal network addresses.- If
language_codeis omitted, the tool defaults toen-US. - Supported output formats:
text,srt,vtt,json. - Optional controls:
enable_diarization,enable_word_timestamps,remove_filler_words,enable_profanity_filter,max_alternatives. remove_filler_wordsdefaults totrue, which uses Google STT V2's cleaned transcript path.- Set
remove_filler_wordstofalseto preserve disfluencies through Vercel AI Gateway using theopenai/whisper-1gateway model slug. This path always requests word-level timestamps from the gateway for clipping workflows. remove_filler_words=falsedoes not supportenable_diarization=trueormax_alternativesgreater than1; use the default cleaned path for those features.- Subtitle responses include inline subtitle content and may also include stored file links during normal platform invocations.





