# Document OCR Agent

## Links

- Product page URL: https://www.agentpmt.com/marketplace/google-document-ai-ocr
- Product markdown URL: https://www.agentpmt.com/marketplace/google-document-ai-ocr?format=agent-md
- Product JSON URL: https://www.agentpmt.com/marketplace/google-document-ai-ocr?format=agent-json

## Overview

- Product ID: 69858a64269243768b447d6d
- Type: model
- Unit type: request
- Price: 2000 credits
- Categories: Data Processing, Data Validation & Verification, Text Extraction & Parsing, Finance & Accounting, Task & Workflow Automation, Document Processing & OCR
- Generated at: 2026-06-24T12:25:06.497Z

### Page Description

Hire the OCR AI model to extract text, structured entities, and page-level data from scanned documents, receipts, invoices, PDFs, and image. Supports OCR text extraction from photos of receipts, handwritten notes, printed forms, business cards, shipping labels, contracts, and any document type. Identifies structured fields like dates, amounts, addresses, line items, tax totals, vendor names, and more. Accepts input via base64 content, public URL, or AgentPMT file storage ID. Ideal for expense tracking, invoice processing, receipt scanning, document digitization, data entry automation, bookkeeping ingestion, form parsing, and archival workflows.

### Agent Description

OCR and document intelligence tool. Send any PDF, image, or scanned document and receive extracted text, structured entities (dates, amounts, names, addresses, line items), and per-page metadata. Provide the document as base64, a public URL, or a file storage ID — no credentials or configuration needed.

## Details

### Details

Hire the OCR AI model to extract text, structured entities, and page-level data from scanned documents, receipts, invoices, PDFs, and image. Supports OCR text extraction from photos of receipts, handwritten notes, printed forms, business cards, shipping labels, contracts, and any document type. Identifies structured fields like dates, amounts, addresses, line items, tax totals, vendor names, and more. Accepts input via base64 content, public URL, or AgentPMT file storage ID. Ideal for expense tracking, invoice processing, receipt scanning, document digitization, data entry automation, bookkeeping ingestion, form parsing, and archival workflows.

### Actions

- `process_document` (20 credits): Extract text, entities, and structured data from a document using Google Document AI. Provide exactly one input source: file_urls, file_ids, or content_base64.

### Use Cases

Receipt OCR and text extraction,Invoice parsing and field extraction,PDF document text extraction,Scanned image OCR,Handwritten note digitization,Business card scanning,Expense report data capture,Automated bookkeeping ingestion,Contract and legal document text extraction,Shipping label and barcode text reading,Tax form field extraction,Medical record digitization,Insurance claim document processing,Bank statement parsing,Purchase order data extraction,Form field recognition,ID and passport text extraction,Utility bill parsing,Restaurant receipt itemization,Real estate document processing

### Workflows Using This Tool

#### AI Contract Redline: Compare Signed Documents Against Originals

Automatically redline any signed contract or agreement against its original and produce an exhaustive change report before counter-signing. Upload the returned signed document (PDF, DOCX, or scanned image), name the original stored in Google Drive (DOCX or native Google Doc), and the workflow OCRs the signed copy, locates and downloads the original from Drive, converts both to clean text, and surfaces every difference categorized by type: substantive wording and clause changes with section numbers and side-by-side quotes, filled-in fields such as parties, effective dates, dollar amounts, addresses, and signer names and titles, signature block label differences, DocuSign and other e-signature artifacts, OCR rendering artifacts to ignore, and shared typos worth fixing in the original. Built for legal contract review, NDA comparison, MSA and SOW intake, vendor agreement onboarding, employment offer letter audits, partnership and referral agreement review, sales contract redlining, real estate purchase agreement comparison, insurance policy diff, lease and rental agreement review, and any returned-document intake workflow where you need to know exactly what changed before filing or counter-signing. Eliminates manual side-by-side reading, accelerates legal and operations review cycles, and prevents accidental acceptance of unfavorable revisions hidden inside a returned signed document.

- Page URL: https://www.agentpmt.com/agent-workflow-skills/ai-contract-redline-compare-signed-documents-against-originals
- Markdown URL: https://www.agentpmt.com/agent-workflow-skills/ai-contract-redline-compare-signed-documents-against-originals?format=agent-md
- Published: 2026-04-25T00:38:15.240Z

#### Kroger Grocery Order From List Photo

Upload a photo of your handwritten or printed grocery list, and the agent will extract the items using OCR, search Kroger for each item to find the best-priced match, add them to your Kroger cart, then send you a notification that your order is ready for checkout.

- Page URL: https://www.agentpmt.com/agent-workflow-skills/kroger-grocery-order-from-list-photo
- Markdown URL: https://www.agentpmt.com/agent-workflow-skills/kroger-grocery-order-from-list-photo?format=agent-md
- Published: 2026-04-19T18:29:42.593Z

#### Expense Report Processor

Processes employee expense reports by accepting receipt uploads, extracting receipt data via OCR, categorizing expenses, booking them to Zoho Books with correct expense accounts, generating an expense breakdown chart, and sending the compiled report for manager approval. Streamlines the entire expense reimbursement process.

- Page URL: https://www.agentpmt.com/agent-workflow-skills/expense-report-processor
- Markdown URL: https://www.agentpmt.com/agent-workflow-skills/expense-report-processor?format=agent-md
- Published: 2026-04-19T18:29:42.593Z

#### Invoice OCR and Booking Pipeline

Automates accounts payable by accepting uploaded vendor invoices, extracting invoice data via OCR (vendor name, invoice number, date, line items, totals), categorizing expenses to the correct chart of accounts, booking them as bills in Zoho Books, and logging a processing summary. Eliminates manual invoice data entry for accounting teams.

- Page URL: https://www.agentpmt.com/agent-workflow-skills/invoice-ocr-and-booking-pipeline
- Markdown URL: https://www.agentpmt.com/agent-workflow-skills/invoice-ocr-and-booking-pipeline?format=agent-md
- Published: 2026-04-19T18:29:42.593Z

#### Bank Statement OCR and Expense Categorization

Accepts uploaded bank statements, extracts transactions via OCR, categorizes each transaction by expense type, logs categorized data to a spreadsheet, and generates a spending breakdown chart. Perfect for personal finance analysis or small business bookkeeping.

- Page URL: https://www.agentpmt.com/agent-workflow-skills/bank-statement-ocr-and-expense-categorization
- Markdown URL: https://www.agentpmt.com/agent-workflow-skills/bank-statement-ocr-and-expense-categorization?format=agent-md
- Published: 2026-04-19T18:29:42.593Z

#### Bank Statement OCR and Account Reconciliation

Automates bank account reconciliation in Zoho Books. Accepts bank statement files (PDF, images, scans) from the user, uploads them to File Management, runs OCR to extract transaction data, then cross-references extracted transactions against unmatched bank feed transactions in Zoho Books. For each unmatched transaction, finds potential matching records (expenses, invoices, payments) and reconciles them. Generates a reconciliation summary and notifies the user when complete.

- Page URL: https://www.agentpmt.com/agent-workflow-skills/bank-statement-ocr-and-account-reconciliation
- Markdown URL: https://www.agentpmt.com/agent-workflow-skills/bank-statement-ocr-and-account-reconciliation?format=agent-md
- Published: 2026-04-19T18:29:42.593Z

#### Receipt OCR to Zoho Books Expense Pipeline

Automates the process of collecting receipt images, uploading them to File Management, running OCR to extract expense data, and adding them as categorized expenses in Zoho Books. Receipts paid with a credit card ending in 9018 are mapped to the Chase Example CC payment account, and receipts paid with a credit card ending in 0999 are mapped to the Bank Of America Example CC payment account. Receipts are processed through OCR 10 at a time. A human is notified when all receipts have been uploaded to the bookkeeping software.

- Page URL: https://www.agentpmt.com/agent-workflow-skills/receipt-ocr-to-zoho-books-expense-pipeline
- Markdown URL: https://www.agentpmt.com/agent-workflow-skills/receipt-ocr-to-zoho-books-expense-pipeline?format=agent-md
- Published: 2026-04-19T18:29:42.593Z

#### Route Planner From Address Photos

Automates multi-stop route planning from photos of addresses. Collects the user's starting address, time needed at each stop, and departure time. Processes uploaded images through OCR to extract addresses, compiles them into a CSV, optimizes the route order, calculates arrival and departure times for each location, and delivers the final plan with a map image, detailed schedule, and Google Maps link.

- Page URL: https://www.agentpmt.com/agent-workflow-skills/address-image-route-planner
- Markdown URL: https://www.agentpmt.com/agent-workflow-skills/address-image-route-planner?format=agent-md
- Published: 2026-04-19T18:29:42.593Z

### Related Content

No related content is currently linked to this product.

## Integration Details

### DynamicMCP

- Setup page URL: https://www.agentpmt.com/dynamic-mcp
- Claude setup guide: https://www.agentpmt.com/dynamic-mcp#platform=claude
- ChatGPT setup guide: https://www.agentpmt.com/dynamic-mcp#platform=chatgpt
- Cursor setup guide: https://www.agentpmt.com/dynamic-mcp#platform=cursor
- Windsurf setup guide: https://www.agentpmt.com/dynamic-mcp#platform=windsurf

Use the local router for command-based MCP clients. It forwards requests to `https://api.agentpmt.com/mcp` and does not execute tools locally.

```bash
npm install -g @agentpmt/mcp-router
agentpmt-setup
```

### REST API

The live page renders cURL, Python, JavaScript, and Node.js examples. Logged-in users see those examples prefilled with their own API and budget credentials.

- Purchase endpoint: https://api.agentpmt.com/products/purchase
- Authorization format: `Bearer <base64(apiKey:budgetKey)>`

```bash
curl -X POST "https://api.agentpmt.com/products/purchase" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer eW91ci1hcGkta2V5LWhlcmU6eW91ci1idWRnZXQta2V5LWhlcmU=" \
  -d '{
    "product_id": "69858a64269243768b447d6d",
    "parameters": {
      "action": "process_document",
      "document_type": "general",
      "max_text_chars": 12000,
      "max_entities": 200,
      "include_pages": true,
      "include_entities": true
    }
  }'
```

### Autonomous Agents

Autonomous agents can access this tool through AgentAddress credit balances or direct x402 payments. Use the Autonomous Agent API reference for endpoint shapes after choosing the access pattern below.

- Autonomous Agent API reference URL: https://www.agentpmt.com/docs/api-reference/autonomous-agents
- Autonomous Agent API reference markdown URL: https://www.agentpmt.com/docs/api-reference/autonomous-agents?format=agent-md
- Credit-Based Access Using AgentAddress: https://www.agentpmt.com/docs/autonomous-agents/credit-based-tool-usage-with-agentaddress
- AgentAddress is preferred for persistent file access, stored platform state, and maximum tool use ability across repeated calls.
- Direct x402 is for independent one-off tool calls that do not require shared files or stored platform state.
- Direct x402 public payments: USDC on Base, Arbitrum, Optimism, Polygon, and Avalanche.

#### Product Skill Package

This product has a published Agent Skill package for product-specific operating instructions.

- Skill slug: document-ocr-agent
- Version: 1.0.0
- Download SKILL.md: https://raw.githubusercontent.com/AgentPMT/agent-skills/main/skills/document-ocr-agent/SKILL.md
- Package source: https://github.com/AgentPMT/agent-skills/tree/main/skills/document-ocr-agent
- OpenClaw listing: https://clawhub.ai/agentpmt/document-ocr-agent
- OpenClaw install: `openclaw skills install document-ocr-agent`
- skills.sh install: `npx skills add AgentPMT/agent-skills --skill document-ocr-agent`
- Last published: 2026-06-24T07:14:18.252Z

### Schema

#### Parameters

- Schema type: actions

```json
{
  "actions": {
    "process_document": {
      "description": "Extract text, entities, and structured data from a document using Google Document AI. Provide exactly one input source: file_urls, file_ids, or content_base64.",
      "properties": {
        "document_type": {
          "type": "string",
          "description": "Document type. Use 'general' for plain OCR, or a specialized type to extract structured fields (dates, amounts, line items, etc).",
          "required": false,
          "default": "general",
          "enum": [
            "general",
            "bank_statement",
            "expense",
            "invoice",
            "drivers_license",
            "passport",
            "utility",
            "w2",
            "w9"
          ]
        },
        "file_urls": {
          "type": "array",
          "description": "URL(s) to process. One URL for a single file, or up to 10 image URLs to batch into a multi-page document.",
          "required": false,
          "items": {
            "type": "string"
          }
        },
        "file_ids": {
          "type": "array",
          "description": "Cloud file ID(s) to process. One ID for a single file, or up to 10 image IDs to batch into a multi-page document.",
          "required": false,
          "items": {
            "type": "string"
          }
        },
        "content_base64": {
          "type": "string",
          "description": "Base64-encoded file content to process.",
          "required": false
        },
        "mime_type": {
          "type": "string",
          "description": "MIME type of the input (e.g. application/pdf, image/png). Auto-detected if omitted.",
          "required": false
        },
        "max_text_chars": {
          "type": "integer",
          "description": "Max characters of extracted text to return.",
          "required": false,
          "default": 12000,
          "minimum": 200,
          "maximum": 250000
        },
        "max_entities": {
          "type": "integer",
          "description": "Max extracted entities to return.",
          "required": false,
          "default": 200,
          "minimum": 1,
          "maximum": 2000
        },
        "include_pages": {
          "type": "boolean",
          "description": "Include per-page summary data.",
          "required": false,
          "default": true
        },
        "include_entities": {
          "type": "boolean",
          "description": "Include extracted entities.",
          "required": false,
          "default": true
        },
        "include_raw_document": {
          "type": "boolean",
          "description": "Include full raw Document AI response object.",
          "required": false
        }
      }
    }
  }
}
```

### Usage Instructions

# Google Document AI OCR

Extract text, entities, and structured data from PDFs, receipts, invoices, and images using Google Document AI. No credentials or project IDs needed -- the tool uses a backend service account automatically.

## Overview

This tool processes documents through Google Document AI with specialized processors for different document types. Provide a file via URL, cloud file ID, or base64-encoded content, and receive extracted text, structured entities (dates, amounts, names, line items), and per-page statistics. Multiple images can be batched into a single multi-page document for processing.

## Actions

### process_document

Extract text and structured data from a document.

**Required parameters (exactly one of):**
- `file_urls` (array of strings) -- URL(s) to process. One URL for a single file, or up to 10 image URLs to batch into a multi-page document.
- `file_ids` (array of strings) -- Cloud file ID(s) to process. One ID for a single file, or up to 10 image IDs to batch into a multi-page document.
- `content_base64` (string) -- Base64-encoded file content to process (single file only).

**Optional parameters:**
- `document_type` (string, default: `"general"`) -- Selects the specialized processor. Options: `general`, `bank_statement`, `expense`, `invoice`, `drivers_license`, `passport`, `utility`, `w2`, `w9`.
- `mime_type` (string) -- MIME type of the input (e.g., `application/pdf`, `image/png`). Auto-detected from URL headers if omitted; defaults to `application/pdf` when unresolvable.
- `max_text_chars` (integer, default: 12000, min: 200, max: 250000) -- Max characters of extracted text to return.
- `max_entities` (integer, default: 200, min: 1, max: 2000) -- Max extracted entities to return.
- `include_pages` (boolean, default: true) -- Include per-page summary data (page dimensions, token/line/paragraph/block/table/form field counts).
- `include_entities` (boolean, default: true) -- Include extracted entities (type, mention text, confidence, normalized value).
- `include_raw_document` (boolean, default: false) -- Include the full raw Document AI response object.

#### Document Types

| `document_type` | Best for | Extracts |
|---|---|---|
| `general` (default) | Any document or image | Raw OCR text only |
| `bank_statement` | Bank statements | Transactions, balances, dates, account info |
| `expense` | Receipts, expense reports | Line items, totals, tax, vendor, date |
| `invoice` | Invoices | Line items, amounts, due dates, vendor, PO numbers |
| `drivers_license` | US driver's licenses | Name, DOB, address, license number, expiry |
| `passport` | US passports | Name, DOB, nationality, passport number, expiry |
| `utility` | Utility bills | Account number, billing period, charges, usage |
| `w2` | W-2 tax forms | Employer info, wages, tax withheld, SSN |
| `w9` | W-9 tax forms | Name, business name, TIN, address, tax classification |

#### Example: Basic OCR from URL

```json
{
  "action": "process_document",
  "file_urls": ["https://example.com/document.pdf"]
}
```

#### Example: Receipt with structured extraction

```json
{
  "action": "process_document",
  "document_type": "expense",
  "file_urls": ["https://example.com/receipt.jpg"]
}
```

#### Example: Invoice from base64

```json
{
  "action": "process_document",
  "document_type": "invoice",
  "content_base64": "JVBERi0xLjQK...",
  "mime_type": "application/pdf"
}
```

#### Example: Batch multiple images into one document

```json
{
  "action": "process_document",
  "file_urls": [
    "https://example.com/page1.jpg",
    "https://example.com/page2.jpg",
    "https://example.com/page3.jpg"
  ]
}
```

#### Example: Process from cloud file ID with limited output

```json
{
  "action": "process_document",
  "document_type": "w2",
  "file_ids": ["abc123"],
  "max_text_chars": 50000,
  "include_pages": false
}
```

#### Example: Get full raw response

```json
{
  "action": "process_document",
  "file_ids": ["abc123"],
  "include_raw_document": true
}
```

## Workflows

### Extract text from a scanned document
1. Call `process_document` with `file_urls` pointing to the scanned PDF or image.
2. Read `result.text_excerpt` for the extracted text content.

### Parse a receipt for expense reporting
1. Call `process_document` with `document_type: "expense"` and the receipt file.
2. Read `result.entities` for structured line items, totals, tax, vendor, and date.

### Process a multi-page document from images
1. Provide up to 10 image URLs in `file_urls`.
2. The images are fetched in parallel, combined into a single multi-page PDF, and processed as one document.
3. Use `include_pages: true` to get per-page statistics.

### Extract data from tax forms
1. Use `document_type: "w2"` or `"w9"` with the tax form file.
2. Entities will include employer info, wages, tax withheld, TIN, etc.

## Notes

- **Supported file types:** PDF, PNG, JPEG, TIFF, GIF, BMP, WebP.
- **Maximum input file size:** 20 MB (including combined PDF in batch mode).
- **Maximum pages:** 10 pages per PDF, or 10 images in batch mode.
- **Input source:** Exactly one of `file_urls`, `file_ids`, or `content_base64` must be provided. Providing multiple sources returns an error.
- **Batch mode:** When 2+ URLs or file IDs are provided, all images are downloaded in parallel, combined into a single multi-page PDF (one image per page), and sent to Document AI as one request.
- **MIME type auto-detection:** When `mime_type` is omitted, it is inferred from URL response headers or file metadata. Falls back to `application/pdf` if unresolvable.
- **Text truncation:** Extracted text is truncated to `max_text_chars` characters. Increase this value for long documents.
- **Entity truncation:** Entities are truncated to `max_entities`. Increase for documents with many structured fields.

### Frequently Asked Questions

#### How do I connect this tool to an external agent?

- Page URL: https://www.agentpmt.com/faq
- Markdown URL: https://www.agentpmt.com/faq?format=agent-md

You can install the local MCP server by opening a terminal and running:

```
npm install -g @agentpmt/mcp-router
agentpmt-setup
```

This will connect you to local agents like Claude Code, Windsurf, Grok Build, Cursor, etc.

Alternatively you can connect to the hosted version with this config block, no installation required:

```
{
  "mcpServers": {
    "agentpmt": {
      "type": "streamable-http",
      "url": "https://api.agentpmt.com/mcp",
      "headers": {
        "Authorization": "Bearer <AGENTPMT_BEARER_TOKEN>",
        "x-instance-metadata": "{\"client\":\"generic-mcp\",\"platform\":\"remote\"}"
      }
    }
  }
}
```

[View MCP Connection Instructions](/docs/mcp-reference/connection) for more details.

#### How does an external agent use this tool?

- Page URL: https://www.agentpmt.com/faq
- Markdown URL: https://www.agentpmt.com/faq?format=agent-md

After the external agent is connected to an Agent Group that can use this tool, paste this prompt into the agent:

> Use the AgentPMT-Tool-Search-and-Execution tool. First call action 'get\_instructions' so you know how to use the tool search interface. Then call action 'get\_schema' with tool\_id 69858a64269243768b447d6d ("Document OCR Agent"). After reading the schema and any returned instructions, tell me what this tool can do, we are going to be using it

The agent should fetch the tool schema first, collect the required parameters for your request, and then call the tool through AgentPMT.

### Dependencies

These products are automatically added when this product is enabled on the page UI.

#### File Management

Upload, list, retrieve, share, download, delete, and manage files stored in AgentPMT cloud storage. This product now owns the full file lifecycle, including signed upload URLs for files up to 10MB and for files over 10MB up to 100MB, budget-scoped file listing with preview URLs, fresh signed download URLs, direct base64 download for smaller files, password-protected sharing, metadata and tag updates, access-history inspection, and expiration extension. All file operations are scoped to the current budget for isolation and are designed to let one budget create persistent files that can be revisited across later agent runs.

- Page URL: https://www.agentpmt.com/marketplace/file-management
- Markdown URL: https://www.agentpmt.com/marketplace/file-management?format=agent-md