Document OCR Agent by Apoth3osis

Name: Document OCR Agent
Brand: Apoth3osis
SKU: 69858a64269243768b447d6d
Price: 0.20 USD
Availability: InStock

Document OCR Agent

Model

Available ActionsEach successful request consumes credits as outlined below.

process_document^20cr

Details

Hire the OCR AI model to extract text, structured entities, and page-level data from scanned documents, receipts, invoices, PDFs, and image. Supports OCR text extraction from photos of receipts, handwritten notes, printed forms, business cards, shipping labels, contracts, and any document type. Identifies structured fields like dates, amounts, addresses, line items, tax totals, vendor names, and more. Accepts input via base64 content, public URL, or AgentPMT file storage ID. Ideal for expense tracking, invoice processing, receipt scanning, document digitization, data entry automation, bookkeeping ingestion, form parsing, and archival workflows.

Workflows Using This Tool

2 / 8

Workflow

Saves ~25 min

Kroger Grocery Order From List Photo

Upload a photo of your handwritten or printed grocery list, and the agent will extract the items using OCR, search Kroger for each item to find the best-priced match, add them to your Kroger cart, then send you a notification that your order is ready for checkout.

Use Cases

Receipt OCR and text extraction,Invoice parsing and field extraction,PDF document text extraction,Scanned image OCR,Handwritten note digitization,Business card scanning,Expense report data capture,Automated bookkeeping ingestion,Contract and legal document text extraction,Shipping label and barcode text reading,Tax form field extraction,Medical record digitization,Insurance claim document processing,Bank statement parsing,Purchase order data extraction,Form field recognition,ID and passport text extraction,Utility bill parsing,Restaurant receipt itemization,Real estate document processing

Dynamic MCP Setup

Connect once through AgentPMT Dynamic MCP, then use approved tools from the same agent connection.

30 Second Setup

STDIO connector for Claude Code, Codex, Cursor, Zed, and other LLMs that require STDIO or custom connections.

npm install -g @agentpmt/mcp-routeragentpmt-setup

Hosted Streamable HTTPS

MCP endpoint for browser-based apps like ChatGPT, Claude, Grok, or any time you want a streamable connection with no local install.

https://api.agentpmt.com/mcp

Config Example

Use the hosted endpoint directly in clients that support remote MCP. Store your Bearer token in the client config or secret field.

Full connection guide

{
  "mcpServers": {
    "agentpmt": {
      "type": "streamable-http",
      "url": "https://api.agentpmt.com/mcp",
      "headers": {
        "Authorization": "Bearer <AGENTPMT_BEARER_TOKEN>",
        "x-instance-metadata": "{\"client\":\"generic-mcp\",\"platform\":\"remote\"}"
      }
    }
  }
}

Need client videos, organization controls, audit details, and the full feature overview?

More About Dynamic MCP

Actions(1)

process_document^20cr10 params

Extract text, entities, and structured data from a document using Google Document AI. Provide exactly one input source: file_urls, file_ids, or content_base64.

document_typestring

Document type. Use 'general' for plain OCR, or a specialized type to extract structured fields (dates, amounts, line items, etc).

Values:

generalbank_statementexpenseinvoicedrivers_licensepassportutilityw2w9

Default: general

file_urlsarray

URL(s) to process. One URL for a single file, or up to 10 image URLs to batch into a multi-page document.

Array of: string

file_idsarray

Cloud file ID(s) to process. One ID for a single file, or up to 10 image IDs to batch into a multi-page document.

Array of: string

content_base64string

Base64-encoded file content to process.

mime_typestring

MIME type of the input (e.g. application/pdf, image/png). Auto-detected if omitted.

max_text_charsinteger

Max characters of extracted text to return.

Default: 12000

Range: 200 - 250000

max_entitiesinteger

Max extracted entities to return.

Default: 200

Range: 1 - 2000

include_pagesboolean

Include per-page summary data.

Default: true

include_entitiesboolean

Include extracted entities.

Default: true

include_raw_documentboolean

Include full raw Document AI response object.

curl -X POST "https://api.agentpmt.com/products/purchase" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer ********" \
  -d '{
    "product_id": "69858a64269243768b447d6d",
    "parameters": {
      "action": "process_document",
      "document_type": "general",
      "max_text_chars": 12000,
      "max_entities": 200,
      "include_pages": true,
      "include_entities": true
    }
  }'

import requests
import json

url = "https://api.agentpmt.com/products/purchase"

headers = {
    "Content-Type": "application/json",
    "Authorization": "Bearer ********"
}

data = {
    "product_id": "69858a64269243768b447d6d",
    "parameters": {
        "action": "process_document",
        "document_type": "general",
        "max_text_chars": 12000,
        "max_entities": 200,
        "include_pages": true,
        "include_entities": true
    }
}

response = requests.post(url, headers=headers, json=data)
print(response.status_code)
print(response.json())

const url = "https://api.agentpmt.com/products/purchase";

const headers = {
  "Content-Type": "application/json",
  "Authorization": "Bearer ********"
};

const data = {
  product_id: "69858a64269243768b447d6d",
  parameters: {
    "action": "process_document",
    "document_type": "general",
    "max_text_chars": 12000,
    "max_entities": 200,
    "include_pages": true,
    "include_entities": true
  }
};

fetch(url, {
  method: "POST",
  headers,
  body: JSON.stringify(data)
})
  .then(response => response.json())
  .then(data => console.log(data))
  .catch(error => console.error("Error:", error));

const axios = require('axios');

const url = "https://api.agentpmt.com/products/purchase";

const headers = {
  "Content-Type": "application/json",
  "Authorization": "Bearer ********"
};

const data = {
  product_id: "69858a64269243768b447d6d",
  parameters: {
    "action": "process_document",
    "document_type": "general",
    "max_text_chars": 12000,
    "max_entities": 200,
    "include_pages": true,
    "include_entities": true
  }
};

axios.post(url, data, { headers })
  .then(response => {
    console.log(response.status);
    console.log(response.data);
  })
  .catch(error => {
    console.error("Error:", error.message);
  });

Login to view your API and budget keys. The example above uses placeholder values. Sign in to see personalized code with your bearer token.

Autonomous agents can access this tool through AgentAddress credit balances or direct x402 payments. Use the Autonomous Agent API reference for endpoint shapes after choosing the access pattern below.

Recommended

Credit-Based Access Using AgentAddress

AgentAddress is preferred when an autonomous agent needs persistent file access, stored platform state, or maximum tool use ability across repeated calls.

Open Credit-Based Access Using AgentAddress

Direct x402 Payment

Use direct x402 for independent one-off tool calls that do not require shared files or stored platform state.

Accepted public payments

Stablecoin: USDC
Chains: Base, Arbitrum, Optimism, Polygon, and Avalanche

Direct x402 payments are not enabled for this product; use AgentAddress credit access instead.

Product Skill Package

This product has a published Agent Skill package. Install it when an autonomous agent needs product-specific operating instructions in its local skill registry.

Download SKILL.md View package source OpenClaw listing

OpenClaw install

Copied to clipboard

skills.sh install

Copied to clipboard

Usage Instructions

Usage guidance provided directly by the developer for this product.

Google Document AI OCR

Extract text, entities, and structured data from PDFs, receipts, invoices, and images using Google Document AI. No credentials or project IDs needed -- the tool uses a backend service account automatically.

Overview

This tool processes documents through Google Document AI with specialized processors for different document types. Provide a file via URL, cloud file ID, or base64-encoded content, and receive extracted text, structured entities (dates, amounts, names, line items), and per-page statistics. Multiple images can be batched into a single multi-page document for processing.

Actions

process_document

Extract text and structured data from a document.

Required parameters (exactly one of):

file_urls (array of strings) -- URL(s) to process. One URL for a single file, or up to 10 image URLs to batch into a multi-page document.
file_ids (array of strings) -- Cloud file ID(s) to process. One ID for a single file, or up to 10 image IDs to batch into a multi-page document.
content_base64 (string) -- Base64-encoded file content to process (single file only).

Optional parameters:

document_type (string, default: "general") -- Selects the specialized processor. Options: general, bank_statement, expense, invoice, drivers_license, passport, utility, w2, w9.
mime_type (string) -- MIME type of the input (e.g., application/pdf, image/png). Auto-detected from URL headers if omitted; defaults to application/pdf when unresolvable.
max_text_chars (integer, default: 12000, min: 200, max: 250000) -- Max characters of extracted text to return.
max_entities (integer, default: 200, min: 1, max: 2000) -- Max extracted entities to return.
include_pages (boolean, default: true) -- Include per-page summary data (page dimensions, token/line/paragraph/block/table/form field counts).
include_entities (boolean, default: true) -- Include extracted entities (type, mention text, confidence, normalized value).
include_raw_document (boolean, default: false) -- Include the full raw Document AI response object.

Document Types

`document_type`	Best for	Extracts
`general` (default)	Any document or image	Raw OCR text only
`bank_statement`	Bank statements	Transactions, balances, dates, account info
`expense`	Receipts, expense reports	Line items, totals, tax, vendor, date
`invoice`	Invoices	Line items, amounts, due dates, vendor, PO numbers
`drivers_license`	US driver's licenses	Name, DOB, address, license number, expiry
`passport`	US passports	Name, DOB, nationality, passport number, expiry
`utility`	Utility bills	Account number, billing period, charges, usage
`w2`	W-2 tax forms	Employer info, wages, tax withheld, SSN
`w9`	W-9 tax forms	Name, business name, TIN, address, tax classification

Example: Basic OCR from URL

{
  "action": "process_document",
  "file_urls": ["https://example.com/document.pdf"]
}

Example: Receipt with structured extraction

{
  "action": "process_document",
  "document_type": "expense",
  "file_urls": ["https://example.com/receipt.jpg"]
}

Example: Invoice from base64

{
  "action": "process_document",
  "document_type": "invoice",
  "content_base64": "JVBERi0xLjQK...",
  "mime_type": "application/pdf"
}

Example: Batch multiple images into one document

{
  "action": "process_document",
  "file_urls": [
    "https://example.com/page1.jpg",
    "https://example.com/page2.jpg",
    "https://example.com/page3.jpg"
  ]
}

Example: Process from cloud file ID with limited output

{
  "action": "process_document",
  "document_type": "w2",
  "file_ids": ["abc123"],
  "max_text_chars": 50000,
  "include_pages": false
}

Example: Get full raw response

{
  "action": "process_document",
  "file_ids": ["abc123"],
  "include_raw_document": true
}

Workflows

Extract text from a scanned document

Call process_document with file_urls pointing to the scanned PDF or image.
Read result.text_excerpt for the extracted text content.

Parse a receipt for expense reporting

Call process_document with document_type: "expense" and the receipt file.
Read result.entities for structured line items, totals, tax, vendor, and date.

Process a multi-page document from images

Provide up to 10 image URLs in file_urls.
The images are fetched in parallel, combined into a single multi-page PDF, and processed as one document.
Use include_pages: true to get per-page statistics.

Extract data from tax forms

Use document_type: "w2" or "w9" with the tax form file.
Entities will include employer info, wages, tax withheld, TIN, etc.

Notes

Supported file types: PDF, PNG, JPEG, TIFF, GIF, BMP, WebP.
Maximum input file size: 20 MB (including combined PDF in batch mode).
Maximum pages: 10 pages per PDF, or 10 images in batch mode.
Input source: Exactly one of file_urls, file_ids, or content_base64 must be provided. Providing multiple sources returns an error.
Batch mode: When 2+ URLs or file IDs are provided, all images are downloaded in parallel, combined into a single multi-page PDF (one image per page), and sent to Document AI as one request.
MIME type auto-detection: When mime_type is omitted, it is inferred from URL response headers or file metadata. Falls back to application/pdf if unresolvable.
Text truncation: Extracted text is truncated to max_text_chars characters. Increase this value for long documents.
Entity truncation: Entities are truncated to max_entities. Increase for documents with many structured fields.

Frequently Asked Questions

How do I connect this tool to an external agent?

You can install the local MCP server by opening a terminal and running:

Install commands

npm install -g @agentpmt/mcp-router
agentpmt-setup

This will connect you to local agents like Claude Code, Windsurf, Grok Build, Cursor, etc.

Alternatively you can connect to the hosted version with this config block, no installation required:

Hosted MCP config

{
  "mcpServers": {
    "agentpmt": {
      "type": "streamable-http",
      "url": "https://api.agentpmt.com/mcp",
      "headers": {
        "Authorization": "Bearer <AGENTPMT_BEARER_TOKEN>",
        "x-instance-metadata": "{\"client\":\"generic-mcp\",\"platform\":\"remote\"}"
      }
    }
  }
}

View MCP Connection Instructions for more details.

How does an external agent use this tool?

After the external agent is connected to an Agent Group that can use this tool, paste this prompt into the agent:

Agent prompt

Use the AgentPMT-Tool-Search-and-Execution tool. First call action 'get_instructions' so you know how to use the tool search interface. Then call action 'get_schema' with tool_id 69858a64269243768b447d6d ("Document OCR Agent"). After reading the schema and any returned instructions, tell me what this tool can do, we are going to be using it

The agent should fetch the tool schema first, collect the required parameters for your request, and then call the tool through AgentPMT.

Dependencies

3 dependencies will be automatically added when you enable this product.

File Management