Available ActionsEach successful request consumes credits as outlined below.
process_document20cr
Details
Hire the OCR AI model to extract text, structured entities, and page-level data from scanned documents, receipts, invoices, PDFs, and image. Supports OCR text extraction from photos of receipts, handwritten notes, printed forms, business cards, shipping labels, contracts, and any document type. Identifies structured fields like dates, amounts, addresses, line items, tax totals, vendor names, and more. Accepts input via base64 content, public URL, or AgentPMT file storage ID. Ideal for expense tracking, invoice processing, receipt scanning, document digitization, data entry automation, bookkeeping ingestion, form parsing, and archival workflows.
Use Cases
Receipt OCR and text extraction,Invoice parsing and field extraction,PDF document text extraction,Scanned image OCR,Handwritten note digitization,Business card scanning,Expense report data capture,Automated bookkeeping ingestion,Contract and legal document text extraction,Shipping label and barcode text reading,Tax form field extraction,Medical record digitization,Insurance claim document processing,Bank statement parsing,Purchase order data extraction,Form field recognition,ID and passport text extraction,Utility bill parsing,Restaurant receipt itemization,Real estate document processing
STDIO connector for Claude Code, Codex, Cursor, Zed, and other LLMs that require STDIO or custom connections. This lightweight connector routes requests to https://api.agentpmt.com/mcp. All tool execution happens in the cloud and the server cannot edit any files on your computer.
npm install -g @agentpmt/mcp-routeragentpmt-setup
Actions(1)
process_document20cr10 params
Extract text, entities, and structured data from a document using Google Document AI. Provide exactly one input source: file_urls, file_ids, or content_base64.
Extract text, entities, and structured data from a document using Google Document AI. Provide exactly one input source: file_urls, file_ids, or content_base64.
document_typestring
Document type. Use 'general' for plain OCR, or a specialized type to extract structured fields (dates, amounts, line items, etc).
Login to view your API and budget keys. The example above uses placeholder values. Sign in to see personalized code with your bearer token.
This tool supports credit-based access for external agents using AgentAddress identities or standard crypto wallets. External agents should use the External Agent API to buy credits with x402 and invoke this tool.
Usage guidance provided directly by the developer for this product.
Google Document AI OCR
Extract text, entities, and structured data from PDFs, receipts, invoices, and images using Google Document AI. No credentials or project IDs needed -- the tool uses a backend service account automatically.
Overview
This tool processes documents through Google Document AI with specialized processors for different document types. Provide a file via URL, cloud file ID, or base64-encoded content, and receive extracted text, structured entities (dates, amounts, names, line items), and per-page statistics. Multiple images can be batched into a single multi-page document for processing.
Actions
process_document
Extract text and structured data from a document.
Required parameters (exactly one of):
file_urls (array of strings) -- URL(s) to process. One URL for a single file, or up to 10 image URLs to batch into a multi-page document.
file_ids (array of strings) -- Cloud file ID(s) to process. One ID for a single file, or up to 10 image IDs to batch into a multi-page document.
content_base64 (string) -- Base64-encoded file content to process (single file only).
Optional parameters:
document_type (string, default: "general") -- Selects the specialized processor. Options: general, bank_statement, expense, invoice, drivers_license, passport, utility, w2, w9.
mime_type (string) -- MIME type of the input (e.g., application/pdf, image/png). Auto-detected from URL headers if omitted; defaults to application/pdf when unresolvable.
max_text_chars (integer, default: 12000, min: 200, max: 250000) -- Max characters of extracted text to return.
max_entities (integer, default: 200, min: 1, max: 2000) -- Max extracted entities to return.
include_pages (boolean, default: true) -- Include per-page summary data (page dimensions, token/line/paragraph/block/table/form field counts).
Maximum input file size: 20 MB (including combined PDF in batch mode).
Maximum pages: 10 pages per PDF, or 10 images in batch mode.
Input source: Exactly one of file_urls, file_ids, or content_base64 must be provided. Providing multiple sources returns an error.
Batch mode: When 2+ URLs or file IDs are provided, all images are downloaded in parallel, combined into a single multi-page PDF (one image per page), and sent to Document AI as one request.
MIME type auto-detection: When mime_type is omitted, it is inferred from URL response headers or file metadata. Falls back to application/pdf if unresolvable.
Text truncation: Extracted text is truncated to max_text_chars characters. Increase this value for long documents.
Entity truncation: Entities are truncated to max_entities. Increase for documents with many structured fields.
Dependencies
3 dependencies will be automatically added when you enable this product.
Automatically redline any signed contract or agreement against its original and produce an exhaustive change report before counter-signing. Upload the returned signed document (PDF, DOCX, or scanned image), name the original stored in Google Drive (DOCX or native Google Doc), and the workflow OCRs the signed copy, locates and downloads the original from Drive, converts both to clean text, and surfaces every difference categorized by type: substantive wording and clause changes with section numbers and side-by-side quotes, filled-in fields such as parties, effective dates, dollar amounts, addresses, and signer names and titles, signature block label differences, DocuSign and other e-signature artifacts, OCR rendering artifacts to ignore, and shared typos worth fixing in the original. Built for legal contract review, NDA comparison, MSA and SOW intake, vendor agreement onboarding, employment offer letter audits, partnership and referral agreement review, sales contract redlining, real estate purchase agreement comparison, insurance policy diff, lease and rental agreement review, and any returned-document intake workflow where you need to know exactly what changed before filing or counter-signing. Eliminates manual side-by-side reading, accelerates legal and operations review cycles, and prevents accidental acceptance of unfavorable revisions hidden inside a returned signed document.
Upload a photo of your handwritten or printed grocery list, and the agent will extract the items using OCR, search Kroger for each item to find the best-priced match, add them to your Kroger cart, then send you a notification that your order is ready for checkout.
Processes employee expense reports by accepting receipt uploads, extracting receipt data via OCR, categorizing expenses, booking them to Zoho Books with correct expense accounts, generating an expense breakdown chart, and sending the compiled report for manager approval. Streamlines the entire expense reimbursement process.
Accepts uploaded bank statements, extracts transactions via OCR, categorizes each transaction by expense type, logs categorized data to a spreadsheet, and generates a spending breakdown chart. Perfect for personal finance analysis or small business bookkeeping.
Automates accounts payable by accepting uploaded vendor invoices, extracting invoice data via OCR (vendor name, invoice number, date, line items, totals), categorizing expenses to the correct chart of accounts, booking them as bills in Zoho Books, and logging a processing summary. Eliminates manual invoice data entry for accounting teams.
Automates bank account reconciliation in Zoho Books. Accepts bank statement files (PDF, images, scans) from the user, uploads them to File Management, runs OCR to extract transaction data, then cross-references extracted transactions against unmatched bank feed transactions in Zoho Books. For each unmatched transaction, finds potential matching records (expenses, invoices, payments) and reconciles them. Generates a reconciliation summary and notifies the user when complete.
Automates the process of collecting receipt images, uploading them to File Management, running OCR to extract expense data, and adding them as categorized expenses in Zoho Books. Receipts paid with a credit card ending in 9018 are mapped to the Chase Example CC payment account, and receipts paid with a credit card ending in 0999 are mapped to the Bank Of America Example CC payment account. Receipts are processed through OCR 10 at a time. A human is notified when all receipts have been uploaded to the bookkeeping software.
Automates multi-stop route planning from photos of addresses. Collects the user's starting address, time needed at each stop, and departure time. Processes uploaded images through OCR to extract addresses, compiles them into a CSV, optimizes the route order, calculates arrival and departure times for each location, and delivers the final plan with a map image, detailed schedule, and Google Maps link.
Looking for help integrating AI into your business? Set up a free consultation.