# File To JSON Parsing

## Links

- Product page URL: https://www.agentpmt.com/marketplace/file-to-json-parsing
- Product markdown URL: https://www.agentpmt.com/marketplace/file-to-json-parsing?format=agent-md
- Product JSON URL: https://www.agentpmt.com/marketplace/file-to-json-parsing?format=agent-json

## Overview

- Product ID: 695c3797767df5adfd9bc872
- Vendor: Apoth3osis
- Type: core utility
- Unit type: request
- Price: 500 credits
- Categories: Developer Tools, Web Scraping & Data Collection, Testing & QA, Automation, Data Processing, Data Validation & Verification, Data Formatting & Conversion, Text Extraction & Parsing, File & Binary Operations, Sales, Finance & Accounting
- Generated at: 2026-06-02T13:33:05.579Z

### Page Description

A powerful data extraction tool that converts a wide variety of binary file formats into structured JSON output for seamless processing in automated workflows. This function supports eleven extraction actions covering the most common document and data formats: CSV for tabular data parsing, HTML for extracting text content and table structures using BeautifulSoup, JSON for direct parsing, ICS for calendar event extraction, ODS and XLSX/XLS for spreadsheet processing across LibreOffice and Microsoft Excel formats, PDF for page-by-page text and table extraction using pdfplumber, RTF for rich text conversion, and plain text for basic content retrieval. Users can provide input via base64-encoded content or cloud storage file ID, with support for files up to 100MB and inline base64 returns up to 10MB. Configurable parameters allow fine-tuning of extraction behavior including maximum row limits up to 100,000 for spreadsheets, maximum page counts up to 1,000 for PDFs, and toggles for text and table inclusion in applicable formats. The function automatically handles character encoding detection and returns consistently structured JSON with customizable output field names, making it an essential bridge between raw file uploads and downstream data processing pipelines.

### Agent Description

Parse files to JSON: CSV, HTML, JSON, ICS calendars, spreadsheets (ODS, XLSX), PDFs, RTF, plain text. Files up to 100MB.

## Details

### Details

A powerful data extraction tool that converts a wide variety of binary file formats into structured JSON output for seamless processing in automated workflows. This function supports eleven extraction actions covering the most common document and data formats: CSV for tabular data parsing, HTML for extracting text content and table structures using BeautifulSoup, JSON for direct parsing, ICS for calendar event extraction, ODS and XLSX/XLS for spreadsheet processing across LibreOffice and Microsoft Excel formats, PDF for page-by-page text and table extraction using pdfplumber, RTF for rich text conversion, and plain text for basic content retrieval. Users can provide input via base64-encoded content or cloud storage file ID, with support for files up to 100MB and inline base64 returns up to 10MB. Configurable parameters allow fine-tuning of extraction behavior including maximum row limits up to 100,000 for spreadsheets, maximum page counts up to 1,000 for PDFs, and toggles for text and table inclusion in applicable formats. The function automatically handles character encoding detection and returns consistently structured JSON with customizable output field names, making it an essential bridge between raw file uploads and downstream data processing pipelines.

### Actions

- `extract-csv` (5 credits): Parse a CSV file into structured row data.
- `extract-html` (5 credits): Parse an HTML file, extracting text content and/or table data.
- `extract-json` (5 credits): Parse a JSON file and return its contents as structured data.
- `extract-ics` (5 credits): Parse an ICS calendar file and extract events with summary, start, end, location, and description.
- `extract-ods` (5 credits): Parse an OpenDocument Spreadsheet (.ods) file, returning sheets with row data.
- `extract-pdf` (5 credits): Extract text and/or tables from a PDF document, page by page.
- `extract-rtf` (5 credits): Parse an RTF (Rich Text Format) file and extract plain text.
- `extract-text` (5 credits): Read a plain text file and return its contents.
- `extract-xls` (5 credits): Parse a legacy Excel (.xls) file, returning sheets with row data.
- `extract-xlsx` (5 credits): Parse a modern Excel (.xlsx) file, returning sheets with row data.
- `file-to-base64` (5 credits): Convert a file to base64-encoded string. File must be 10 MB or smaller for inline return.

### Use Cases

Parsing uploaded CSV files into structured records for database import or API submission, extracting tabular data from HTML reports or web page snapshots for analysis, converting calendar ICS files into event objects for scheduling integrations, processing Excel spreadsheets from user uploads into JSON for data transformation pipelines, extracting text and tables from PDF invoices or contracts for automated document processing, converting legacy XLS files from enterprise systems into modern JSON formats, parsing RTF documents from email attachments into plaintext for content indexing, scraping structured table data from HTML exports for reporting dashboards, extracting event details from shared calendar files for synchronization workflows, converting uploaded spreadsheet data into API-compatible payloads for third-party service integrations

### Workflows Using This Tool

#### Appointment Scheduling and Route Planner

Takes a CSV or spreadsheet file with addresses, asks for a starting drive time, time per stop, and starting address, then parses the addresses, optimizes the driving route, calculates estimated arrival and departure times for each location, and generates a CSV with the full schedule. Returns the optimized route map, Google Maps directions link, and the schedule CSV both locally and via notification. Ideal for field sales, service technicians, delivery planning, or any multi-stop appointment scheduling.

- Page URL: https://www.agentpmt.com/agent-workflow-skills/appointment-scheduling-and-route-planner
- Markdown URL: https://www.agentpmt.com/agent-workflow-skills/appointment-scheduling-and-route-planner?format=agent-md
- Published: 2026-04-19T18:29:42.593Z

### Related Content

No related content is currently linked to this product.

## Integration Details

### DynamicMCP

- Setup page URL: https://www.agentpmt.com/dynamic-mcp
- Claude setup guide: https://www.agentpmt.com/dynamic-mcp?platform=claude#videos
- ChatGPT setup guide: https://www.agentpmt.com/dynamic-mcp?platform=chatgpt#videos
- Cursor setup guide: https://www.agentpmt.com/dynamic-mcp?platform=cursor#videos
- Windsurf setup guide: https://www.agentpmt.com/dynamic-mcp?platform=windsurf#videos

Use the local router for command-based MCP clients. It forwards requests to `https://api.agentpmt.com/mcp` and does not execute tools locally.

```bash
npm install -g @agentpmt/mcp-router
agentpmt-setup
```

### REST API

The live page renders cURL, Python, JavaScript, and Node.js examples. Logged-in users see those examples prefilled with their own API and budget credentials.

- Purchase endpoint: https://api.agentpmt.com/products/purchase
- Authorization format: `Bearer <base64(apiKey:budgetKey)>`

```bash
curl -X POST "https://api.agentpmt.com/products/purchase" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer eW91ci1hcGkta2V5LWhlcmU6eW91ci1idWRnZXQta2V5LWhlcmU=" \
  -d '{
    "product_id": "695c3797767df5adfd9bc872",
    "parameters": {
      "action": "extract-csv",
      "output_field": "data",
      "max_rows": 1000
    }
  }'
```

### Autonomous Agents

Autonomous agents can access this tool through AgentAddress credit balances or direct x402 payments. Use the Autonomous Agent API reference for endpoint shapes after choosing the access pattern below.

- Autonomous Agent API reference URL: https://www.agentpmt.com/docs/api-reference/autonomous-agents
- Autonomous Agent API reference markdown URL: https://www.agentpmt.com/docs/api-reference/autonomous-agents?format=agent-md
- Credit-Based Access Using AgentAddress: https://www.agentpmt.com/docs/autonomous-agents/credit-based-tool-usage-with-agentaddress
- AgentAddress is preferred for persistent file access, stored platform state, and maximum tool use ability across repeated calls.
- Direct x402 is for independent one-off tool calls that do not require shared files or stored platform state.
- Direct x402 public payments: USDC on Base, Arbitrum, Optimism, Polygon, and Avalanche.

### Schema

#### Parameters

- Schema type: actions

```json
{
  "actions": {
    "extract-csv": {
      "description": "Parse a CSV file into structured row data.",
      "properties": {
        "input_base64": {
          "type": "string",
          "description": "Base64-encoded file content.",
          "required": false
        },
        "file_id": {
          "type": "string",
          "description": "File ID from cloud storage.",
          "required": false
        },
        "output_field": {
          "type": "string",
          "description": "Key name for the extracted data in the response.",
          "required": false,
          "default": "data"
        },
        "max_rows": {
          "type": "integer",
          "description": "Maximum rows to extract.",
          "required": false,
          "default": 1000,
          "minimum": 1,
          "maximum": 100000
        }
      }
    },
    "extract-html": {
      "description": "Parse an HTML file, extracting text content and/or table data.",
      "properties": {
        "input_base64": {
          "type": "string",
          "description": "Base64-encoded file content.",
          "required": false
        },
        "file_id": {
          "type": "string",
          "description": "File ID from cloud storage.",
          "required": false
        },
        "output_field": {
          "type": "string",
          "description": "Key name for the extracted data in the response.",
          "required": false,
          "default": "data"
        },
        "include_text": {
          "type": "boolean",
          "description": "Include extracted text content.",
          "required": false,
          "default": true
        },
        "include_tables": {
          "type": "boolean",
          "description": "Include extracted table data.",
          "required": false,
          "default": true
        },
        "max_rows": {
          "type": "integer",
          "description": "Maximum rows per table.",
          "required": false,
          "default": 1000,
          "minimum": 1,
          "maximum": 100000
        }
      }
    },
    "extract-json": {
      "description": "Parse a JSON file and return its contents as structured data.",
      "properties": {
        "input_base64": {
          "type": "string",
          "description": "Base64-encoded file content.",
          "required": false
        },
        "file_id": {
          "type": "string",
          "description": "File ID from cloud storage.",
          "required": false
        },
        "output_field": {
          "type": "string",
          "description": "Key name for the extracted data in the response.",
          "required": false,
          "default": "data"
        }
      }
    },
    "extract-ics": {
      "description": "Parse an ICS calendar file and extract events with summary, start, end, location, and description.",
      "properties": {
        "input_base64": {
          "type": "string",
          "description": "Base64-encoded file content.",
          "required": false
        },
        "file_id": {
          "type": "string",
          "description": "File ID from cloud storage.",
          "required": false
        },
        "output_field": {
          "type": "string",
          "description": "Key name for the extracted data in the response.",
          "required": false,
          "default": "data"
        }
      }
    },
    "extract-ods": {
      "description": "Parse an OpenDocument Spreadsheet (.ods) file, returning sheets with row data.",
      "properties": {
        "input_base64": {
          "type": "string",
          "description": "Base64-encoded file content.",
          "required": false
        },
        "file_id": {
          "type": "string",
          "description": "File ID from cloud storage.",
          "required": false
        },
        "output_field": {
          "type": "string",
          "description": "Key name for the extracted data in the response.",
          "required": false,
          "default": "data"
        },
        "max_rows": {
          "type": "integer",
          "description": "Maximum rows per sheet.",
          "required": false,
          "default": 1000,
          "minimum": 1,
          "maximum": 100000
        }
      }
    },
    "extract-pdf": {
      "description": "Extract text and/or tables from a PDF document, page by page.",
      "properties": {
        "input_base64": {
          "type": "string",
          "description": "Base64-encoded file content.",
          "required": false
        },
        "file_id": {
          "type": "string",
          "description": "File ID from cloud storage.",
          "required": false
        },
        "output_field": {
          "type": "string",
          "description": "Key name for the extracted data in the response.",
          "required": false,
          "default": "data"
        },
        "include_text": {
          "type": "boolean",
          "description": "Include text extraction per page.",
          "required": false,
          "default": true
        },
        "include_tables": {
          "type": "boolean",
          "description": "Include table extraction per page.",
          "required": false,
          "default": true
        },
        "max_pages": {
          "type": "integer",
          "description": "Maximum pages to process.",
          "required": false,
          "default": 50,
          "minimum": 1,
          "maximum": 1000
        }
      }
    },
    "extract-rtf": {
      "description": "Parse an RTF (Rich Text Format) file and extract plain text.",
      "properties": {
        "input_base64": {
          "type": "string",
          "description": "Base64-encoded file content.",
          "required": false
        },
        "file_id": {
          "type": "string",
          "description": "File ID from cloud storage.",
          "required": false
        },
        "output_field": {
          "type": "string",
          "description": "Key name for the extracted data in the response.",
          "required": false,
          "default": "data"
        }
      }
    },
    "extract-text": {
      "description": "Read a plain text file and return its contents.",
      "properties": {
        "input_base64": {
          "type": "string",
          "description": "Base64-encoded file content.",
          "required": false
        },
        "file_id": {
          "type": "string",
          "description": "File ID from cloud storage.",
          "required": false
        },
        "output_field": {
          "type": "string",
          "description": "Key name for the extracted data in the response.",
          "required": false,
          "default": "data"
        }
      }
    },
    "extract-xls": {
      "description": "Parse a legacy Excel (.xls) file, returning sheets with row data.",
      "properties": {
        "input_base64": {
          "type": "string",
          "description": "Base64-encoded file content.",
          "required": false
        },
        "file_id": {
          "type": "string",
          "description": "File ID from cloud storage.",
          "required": false
        },
        "output_field": {
          "type": "string",
          "description": "Key name for the extracted data in the response.",
          "required": false,
          "default": "data"
        },
        "max_rows": {
          "type": "integer",
          "description": "Maximum rows per sheet.",
          "required": false,
          "default": 1000,
          "minimum": 1,
          "maximum": 100000
        }
      }
    },
    "extract-xlsx": {
      "description": "Parse a modern Excel (.xlsx) file, returning sheets with row data.",
      "properties": {
        "input_base64": {
          "type": "string",
          "description": "Base64-encoded file content.",
          "required": false
        },
        "file_id": {
          "type": "string",
          "description": "File ID from cloud storage.",
          "required": false
        },
        "output_field": {
          "type": "string",
          "description": "Key name for the extracted data in the response.",
          "required": false,
          "default": "data"
        },
        "max_rows": {
          "type": "integer",
          "description": "Maximum rows per sheet.",
          "required": false,
          "default": 1000,
          "minimum": 1,
          "maximum": 100000
        }
      }
    },
    "file-to-base64": {
      "description": "Convert a file to base64-encoded string. File must be 10 MB or smaller for inline return.",
      "properties": {
        "input_base64": {
          "type": "string",
          "description": "Base64-encoded file content.",
          "required": false
        },
        "file_id": {
          "type": "string",
          "description": "File ID from cloud storage.",
          "required": false
        }
      }
    }
  },
  "properties": {}
}
```

### Usage Instructions

# File To JSON Parsing - Instructions

## Overview
Extract structured JSON data from a wide range of file formats. Provide a file via base64-encoded content or a cloud storage file ID, and receive parsed, structured output. Supports CSV, HTML, JSON, ICS (calendar), ODS, PDF, RTF, plain text, XLS, and XLSX files. Also supports converting any file to base64.

## File Input
Every action (except get_instructions) requires **one** of the following:
- **input_base64** (string) - Base64-encoded file content (up to 100 MB raw; 10 MB for file-to-base64 return)
- **file_id** (string) - File ID from cloud storage

## Actions

### extract-csv
Parse a CSV file into structured row data.

**Required:** `action`, plus `input_base64` or `file_id`
**Optional:**
- `max_rows` (integer, default 1000, max 100000) - Maximum rows to extract
- `output_field` (string, default "data") - Key name for the extracted data in the response

**Example:**
```json
{
  "action": "extract-csv",
  "input_base64": "bmFtZSxhZ2UKQWxpY2UsMzAKQm9iLDI1"
}
```

---

### extract-html
Parse an HTML file, extracting text content and/or table data.

**Required:** `action`, plus `input_base64` or `file_id`
**Optional:**
- `include_text` (boolean, default true) - Include extracted text content
- `include_tables` (boolean, default true) - Include extracted table data
- `max_rows` (integer, default 1000) - Maximum rows per table
- `output_field` (string, default "data")

**Example:**
```json
{
  "action": "extract-html",
  "file_id": "abc123",
  "include_text": true,
  "include_tables": true
}
```

---

### extract-json
Parse a JSON file and return its contents as structured data.

**Required:** `action`, plus `input_base64` or `file_id`
**Optional:**
- `output_field` (string, default "data")

**Example:**
```json
{
  "action": "extract-json",
  "input_base64": "eyJrZXkiOiAidmFsdWUifQ=="
}
```

---

### extract-ics
Parse an ICS calendar file and extract events with summary, start, end, location, and description.

**Required:** `action`, plus `input_base64` or `file_id`
**Optional:**
- `output_field` (string, default "data")

**Example:**
```json
{
  "action": "extract-ics",
  "file_id": "calendar_file_id"
}
```

---

### extract-ods
Parse an OpenDocument Spreadsheet (.ods) file, returning sheets with row data.

**Required:** `action`, plus `input_base64` or `file_id`
**Optional:**
- `max_rows` (integer, default 1000, max 100000) - Maximum rows per sheet
- `output_field` (string, default "data")

**Example:**
```json
{
  "action": "extract-ods",
  "file_id": "spreadsheet_file_id",
  "max_rows": 500
}
```

---

### extract-pdf
Extract text and/or tables from a PDF document, page by page.

**Required:** `action`, plus `input_base64` or `file_id`
**Optional:**
- `include_text` (boolean, default true) - Include text extraction per page
- `include_tables` (boolean, default true) - Include table extraction per page
- `max_pages` (integer, default 50, max 1000) - Maximum pages to process
- `output_field` (string, default "data")

**Example:**
```json
{
  "action": "extract-pdf",
  "file_id": "report_pdf_id",
  "max_pages": 10,
  "include_text": true,
  "include_tables": false
}
```

---

### extract-rtf
Parse an RTF (Rich Text Format) file and extract plain text.

**Required:** `action`, plus `input_base64` or `file_id`
**Optional:**
- `output_field` (string, default "data")

**Example:**
```json
{
  "action": "extract-rtf",
  "input_base64": "e1xydGYxIEhlbGxvIFdvcmxkfQ=="
}
```

---

### extract-text
Read a plain text file and return its contents.

**Required:** `action`, plus `input_base64` or `file_id`
**Optional:**
- `output_field` (string, default "data")

**Example:**
```json
{
  "action": "extract-text",
  "file_id": "text_file_id"
}
```

---

### extract-xls
Parse a legacy Excel (.xls) file, returning sheets with row data.

**Required:** `action`, plus `input_base64` or `file_id`
**Optional:**
- `max_rows` (integer, default 1000, max 100000) - Maximum rows per sheet
- `output_field` (string, default "data")

**Example:**
```json
{
  "action": "extract-xls",
  "file_id": "legacy_excel_id",
  "max_rows": 2000
}
```

---

### extract-xlsx
Parse a modern Excel (.xlsx) file, returning sheets with row data.

**Required:** `action`, plus `input_base64` or `file_id`
**Optional:**
- `max_rows` (integer, default 1000, max 100000) - Maximum rows per sheet
- `output_field` (string, default "data")

**Example:**
```json
{
  "action": "extract-xlsx",
  "input_base64": "<base64_encoded_xlsx>",
  "max_rows": 5000
}
```

---

### file-to-base64
Convert a cloud-stored file to base64 for inline use. The file must be 10 MB or smaller.

**Required:** `action`, plus `input_base64` or `file_id`

**Example:**
```json
{
  "action": "file-to-base64",
  "file_id": "image_file_id"
}
```

---

## Common Workflows

1. **Parse an uploaded spreadsheet:** Use `extract-xlsx` or `extract-xls` with a `file_id` to get structured row data from each sheet.
2. **Extract text from a PDF report:** Use `extract-pdf` with `include_text: true` and `include_tables: false` for text-only extraction.
3. **Convert HTML to structured data:** Use `extract-html` to pull both readable text and any embedded tables from an HTML file.
4. **Read calendar events:** Use `extract-ics` to get a list of events from an ICS calendar export.
5. **Retrieve a file as base64:** Use `file-to-base64` with a `file_id` to get the raw file content encoded for inline transfer.

## Important Notes
- Every extraction action requires either `input_base64` or `file_id` -- at least one must be provided.
- Maximum file size is 100 MB. The `file-to-base64` action has a stricter 10 MB limit for the returned content.
- The `max_rows` parameter applies to CSV, HTML tables, ODS, XLS, and XLSX extractions.
- The `max_pages` parameter applies only to PDF extraction.
- The `include_text` and `include_tables` options apply to HTML and PDF extraction.
- The `output_field` parameter lets you customize the key name in the response (default is "data").
- Text files are decoded as UTF-8, falling back to Latin-1 if needed.
- Spreadsheet actions (ODS, XLS, XLSX) return data organized by sheet, each with a name and rows array.

### Frequently Asked Questions

#### How do I connect this tool to an external agent?

- Page URL: https://www.agentpmt.com/faq
- Markdown URL: https://www.agentpmt.com/faq?format=agent-md

You can install the local MCP server by opening a terminal and running:

```
npm install -g @agentpmt/mcp-router
agentpmt-setup
```

This will connect you to local agents like Claude Code, Windsurf, Grok Build, Cursor, etc.

Alternatively you can connect to the hosted version with this config block, no installation required:

```
{
  "mcpServers": {
    "agentpmt": {
      "type": "streamable-http",
      "url": "https://api.agentpmt.com/mcp",
      "headers": {
        "Authorization": "Bearer <AGENTPMT_BEARER_TOKEN>",
        "x-instance-metadata": "{\"client\":\"generic-mcp\",\"platform\":\"remote\"}"
      }
    }
  }
}
```

[View MCP Connection Instructions](/docs/mcp-reference/connection) for more details.

#### How does an external agent use this tool?

- Page URL: https://www.agentpmt.com/faq
- Markdown URL: https://www.agentpmt.com/faq?format=agent-md

After the external agent is connected to an Agent Group that can use this tool, paste this prompt into the agent:

> Call the AgentPMT-Tool-Search-and-Execution tool with action 'get\_schema' and tool\_id 695c3797767df5adfd9bc872 ("File To JSON Parsing"). Then call the same tool with action 'call\_tool', tool\_id 695c3797767df5adfd9bc872, and the parameters needed for my request.

The agent should fetch the tool schema first, collect the required parameters for your request, and then call the tool through AgentPMT.

### Dependencies

These products are automatically added when this product is enabled on the page UI.

#### File Management

Upload, list, retrieve, share, download, delete, and manage files stored in AgentPMT cloud storage. This product now owns the full file lifecycle, including signed upload URLs for files up to 10MB and for files over 10MB up to 100MB, budget-scoped file listing with preview URLs, fresh signed download URLs, direct base64 download for smaller files, password-protected sharing, metadata and tag updates, access-history inspection, and expiration extension. All file operations are scoped to the current budget for isolation and are designed to let one budget create persistent files that can be revisited across later agent runs.

- Page URL: https://www.agentpmt.com/marketplace/file-management
- Markdown URL: https://www.agentpmt.com/marketplace/file-management?format=agent-md