# File To JSON Parsing

## Links

- Product page URL: https://www.agentpmt.com/marketplace/file-to-json-parsing
- Product markdown URL: https://www.agentpmt.com/marketplace/file-to-json-parsing?format=agent-md
- Product JSON URL: https://www.agentpmt.com/marketplace/file-to-json-parsing?format=agent-json

## Overview

- Product ID: 695c3797767df5adfd9bc872
- Vendor: Apoth3osis
- Type: core utility
- Unit type: request
- Price: 500 credits
- Categories: Developer Tools, Web Scraping & Data Collection, Testing & QA, Automation, Data Processing, Data Validation & Verification, Data Formatting & Conversion, Text Extraction & Parsing, File & Binary Operations, Sales, Finance & Accounting
- Generated at: 2026-04-18T08:08:10.942Z

### Page Description

A powerful data extraction tool that converts a wide variety of binary file formats into structured JSON output for seamless processing in automated workflows. This function supports eleven extraction actions covering the most common document and data formats: CSV for tabular data parsing, HTML for extracting text content and table structures using BeautifulSoup, JSON for direct parsing, ICS for calendar event extraction, ODS and XLSX/XLS for spreadsheet processing across LibreOffice and Microsoft Excel formats, PDF for page-by-page text and table extraction using pdfplumber, RTF for rich text conversion, and plain text for basic content retrieval. Users can provide input via base64-encoded content or cloud storage file ID, with support for files up to 100MB and inline base64 returns up to 10MB. Configurable parameters allow fine-tuning of extraction behavior including maximum row limits up to 100,000 for spreadsheets, maximum page counts up to 1,000 for PDFs, and toggles for text and table inclusion in applicable formats. The function automatically handles character encoding detection and returns consistently structured JSON with customizable output field names, making it an essential bridge between raw file uploads and downstream data processing pipelines.

### Agent Description

Parse files to JSON: CSV, HTML, JSON, ICS calendars, spreadsheets (ODS, XLSX), PDFs, RTF, plain text. Files up to 100MB.

## Details Tab

### Details

A powerful data extraction tool that converts a wide variety of binary file formats into structured JSON output for seamless processing in automated workflows. This function supports eleven extraction actions covering the most common document and data formats: CSV for tabular data parsing, HTML for extracting text content and table structures using BeautifulSoup, JSON for direct parsing, ICS for calendar event extraction, ODS and XLSX/XLS for spreadsheet processing across LibreOffice and Microsoft Excel formats, PDF for page-by-page text and table extraction using pdfplumber, RTF for rich text conversion, and plain text for basic content retrieval. Users can provide input via base64-encoded content or cloud storage file ID, with support for files up to 100MB and inline base64 returns up to 10MB. Configurable parameters allow fine-tuning of extraction behavior including maximum row limits up to 100,000 for spreadsheets, maximum page counts up to 1,000 for PDFs, and toggles for text and table inclusion in applicable formats. The function automatically handles character encoding detection and returns consistently structured JSON with customizable output field names, making it an essential bridge between raw file uploads and downstream data processing pipelines.

### Actions

- `extract-csv` (5 credits): Parse a CSV file into structured row data.
- `extract-html` (5 credits): Parse an HTML file, extracting text content and/or table data.
- `extract-json` (5 credits): Parse a JSON file and return its contents as structured data.
- `extract-ics` (5 credits): Parse an ICS calendar file and extract events with summary, start, end, location, and description.
- `extract-ods` (5 credits): Parse an OpenDocument Spreadsheet (.ods) file, returning sheets with row data.
- `extract-pdf` (5 credits): Extract text and/or tables from a PDF document, page by page.
- `extract-rtf` (5 credits): Parse an RTF (Rich Text Format) file and extract plain text.
- `extract-text` (5 credits): Read a plain text file and return its contents.
- `extract-xls` (5 credits): Parse a legacy Excel (.xls) file, returning sheets with row data.
- `extract-xlsx` (5 credits): Parse a modern Excel (.xlsx) file, returning sheets with row data.
- `file-to-base64` (5 credits): Convert a file to base64-encoded string. File must be 10 MB or smaller for inline return.

### Use Cases

Parsing uploaded CSV files into structured records for database import or API submission, extracting tabular data from HTML reports or web page snapshots for analysis, converting calendar ICS files into event objects for scheduling integrations, processing Excel spreadsheets from user uploads into JSON for data transformation pipelines, extracting text and tables from PDF invoices or contracts for automated document processing, converting legacy XLS files from enterprise systems into modern JSON formats, parsing RTF documents from email attachments into plaintext for content indexing, scraping structured table data from HTML exports for reporting dashboards, extracting event details from shared calendar files for synchronization workflows, converting uploaded spreadsheet data into API-compatible payloads for third-party service integrations

### Workflows Using This Tool

#### Appointment Scheduling and Route Planner

Takes a CSV or spreadsheet file with addresses, asks for a starting drive time, time per stop, and starting address, then parses the addresses, optimizes the driving route, calculates estimated arrival and departure times for each location, and generates a CSV with the full schedule. Returns the optimized route map, Google Maps directions link, and the schedule CSV both locally and via notification. Ideal for field sales, service technicians, delivery planning, or any multi-stop appointment scheduling.

- Page URL: https://www.agentpmt.com/agent-workflow-skills/appointment-scheduling-and-route-planner
- Markdown URL: https://www.agentpmt.com/agent-workflow-skills/appointment-scheduling-and-route-planner?format=agent-md
- Published: 2026-02-12T18:10:09.565Z

### Related Content

No related content is currently linked to this product.

## Advanced Tab

### DynamicMCP

- Setup page URL: https://www.agentpmt.com/dynamic-mcp
- Claude setup guide: https://www.agentpmt.com/dynamic-mcp?platform=claude#videos
- ChatGPT setup guide: https://www.agentpmt.com/dynamic-mcp?platform=chatgpt#videos
- Cursor setup guide: https://www.agentpmt.com/dynamic-mcp?platform=cursor#videos
- Windsurf setup guide: https://www.agentpmt.com/dynamic-mcp?platform=windsurf#videos

STDIO connector for Claude Code, Codex, Cursor, Zed, and other LLMs that require STDIO or custom connections. This lightweight connector routes requests to `https://api.agentpmt.com/mcp`. All tool execution happens in the cloud and the server cannot edit any files on your computer.

```bash
npm install -g @agentpmt/mcp-router
agentpmt-setup
```

### REST API

The live page renders cURL, Python, JavaScript, and Node.js examples. Logged-in users see those examples prefilled with their own API and budget credentials.

- Purchase endpoint: https://api.agentpmt.com/products/purchase
- Authorization format: `Bearer <base64(apiKey:budgetKey)>`

```bash
curl -X POST "https://api.agentpmt.com/products/purchase" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer eW91ci1hcGkta2V5LWhlcmU6eW91ci1idWRnZXQta2V5LWhlcmU=" \
  -d '{
    "product_id": "695c3797767df5adfd9bc872",
    "parameters": {
      "action": "extract-csv",
      "output_field": "data",
      "max_rows": 1000
    }
  }'
```

### Autonomous Agents

Do not use the abbreviated instructions in this product markdown for wallet-based invocation. Retrieve the full External Agent API markdown document instead.

- External Agent API page URL: https://www.agentpmt.com/external-agent-api
- External Agent API markdown URL: https://www.agentpmt.com/external-agent-api?format=agent-md

### Schema

#### Parameters

- Schema type: actions

```json
{
  "actions": {
    "extract-csv": {
      "description": "Parse a CSV file into structured row data.",
      "properties": {
        "input_base64": {
          "type": "string",
          "description": "Base64-encoded file content.",
          "required": false
        },
        "file_id": {
          "type": "string",
          "description": "File ID from cloud storage.",
          "required": false
        },
        "output_field": {
          "type": "string",
          "description": "Key name for the extracted data in the response.",
          "required": false,
          "default": "data"
        },
        "max_rows": {
          "type": "integer",
          "description": "Maximum rows to extract.",
          "required": false,
          "default": 1000,
          "minimum": 1,
          "maximum": 100000
        }
      }
    },
    "extract-html": {
      "description": "Parse an HTML file, extracting text content and/or table data.",
      "properties": {
        "input_base64": {
          "type": "string",
          "description": "Base64-encoded file content.",
          "required": false
        },
        "file_id": {
          "type": "string",
          "description": "File ID from cloud storage.",
          "required": false
        },
        "output_field": {
          "type": "string",
          "description": "Key name for the extracted data in the response.",
          "required": false,
          "default": "data"
        },
        "include_text": {
          "type": "boolean",
          "description": "Include extracted text content.",
          "required": false,
          "default": true
        },
        "include_tables": {
          "type": "boolean",
          "description": "Include extracted table data.",
          "required": false,
          "default": true
        },
        "max_rows": {
          "type": "integer",
          "description": "Maximum rows per table.",
          "required": false,
          "default": 1000,
          "minimum": 1,
          "maximum": 100000
        }
      }
    },
    "extract-json": {
      "description": "Parse a JSON file and return its contents as structured data.",
      "properties": {
        "input_base64": {
          "type": "string",
          "description": "Base64-encoded file content.",
          "required": false
        },
        "file_id": {
          "type": "string",
          "description": "File ID from cloud storage.",
          "required": false
        },
        "output_field": {
          "type": "string",
          "description": "Key name for the extracted data in the response.",
          "required": false,
          "default": "data"
        }
      }
    },
    "extract-ics": {
      "description": "Parse an ICS calendar file and extract events with summary, start, end, location, and description.",
      "properties": {
        "input_base64": {
          "type": "string",
          "description": "Base64-encoded file content.",
          "required": false
        },
        "file_id": {
          "type": "string",
          "description": "File ID from cloud storage.",
          "required": false
        },
        "output_field": {
          "type": "string",
          "description": "Key name for the extracted data in the response.",
          "required": false,
          "default": "data"
        }
      }
    },
    "extract-ods": {
      "description": "Parse an OpenDocument Spreadsheet (.ods) file, returning sheets with row data.",
      "properties": {
        "input_base64": {
          "type": "string",
          "description": "Base64-encoded file content.",
          "required": false
        },
        "file_id": {
          "type": "string",
          "description": "File ID from cloud storage.",
          "required": false
        },
        "output_field": {
          "type": "string",
          "description": "Key name for the extracted data in the response.",
          "required": false,
          "default": "data"
        },
        "max_rows": {
          "type": "integer",
          "description": "Maximum rows per sheet.",
          "required": false,
          "default": 1000,
          "minimum": 1,
          "maximum": 100000
        }
      }
    },
    "extract-pdf": {
      "description": "Extract text and/or tables from a PDF document, page by page.",
      "properties": {
        "input_base64": {
          "type": "string",
          "description": "Base64-encoded file content.",
          "required": false
        },
        "file_id": {
          "type": "string",
          "description": "File ID from cloud storage.",
          "required": false
        },
        "output_field": {
          "type": "string",
          "description": "Key name for the extracted data in the response.",
          "required": false,
          "default": "data"
        },
        "include_text": {
          "type": "boolean",
          "description": "Include text extraction per page.",
          "required": false,
          "default": true
        },
        "include_tables": {
          "type": "boolean",
          "description": "Include table extraction per page.",
          "required": false,
          "default": true
        },
        "max_pages": {
          "type": "integer",
          "description": "Maximum pages to process.",
          "required": false,
          "default": 50,
          "minimum": 1,
          "maximum": 1000
        }
      }
    },
    "extract-rtf": {
      "description": "Parse an RTF (Rich Text Format) file and extract plain text.",
      "properties": {
        "input_base64": {
          "type": "string",
          "description": "Base64-encoded file content.",
          "required": false
        },
        "file_id": {
          "type": "string",
          "description": "File ID from cloud storage.",
          "required": false
        },
        "output_field": {
          "type": "string",
          "description": "Key name for the extracted data in the response.",
          "required": false,
          "default": "data"
        }
      }
    },
    "extract-text": {
      "description": "Read a plain text file and return its contents.",
      "properties": {
        "input_base64": {
          "type": "string",
          "description": "Base64-encoded file content.",
          "required": false
        },
        "file_id": {
          "type": "string",
          "description": "File ID from cloud storage.",
          "required": false
        },
        "output_field": {
          "type": "string",
          "description": "Key name for the extracted data in the response.",
          "required": false,
          "default": "data"
        }
      }
    },
    "extract-xls": {
      "description": "Parse a legacy Excel (.xls) file, returning sheets with row data.",
      "properties": {
        "input_base64": {
          "type": "string",
          "description": "Base64-encoded file content.",
          "required": false
        },
        "file_id": {
          "type": "string",
          "description": "File ID from cloud storage.",
          "required": false
        },
        "output_field": {
          "type": "string",
          "description": "Key name for the extracted data in the response.",
          "required": false,
          "default": "data"
        },
        "max_rows": {
          "type": "integer",
          "description": "Maximum rows per sheet.",
          "required": false,
          "default": 1000,
          "minimum": 1,
          "maximum": 100000
        }
      }
    },
    "extract-xlsx": {
      "description": "Parse a modern Excel (.xlsx) file, returning sheets with row data.",
      "properties": {
        "input_base64": {
          "type": "string",
          "description": "Base64-encoded file content.",
          "required": false
        },
        "file_id": {
          "type": "string",
          "description": "File ID from cloud storage.",
          "required": false
        },
        "output_field": {
          "type": "string",
          "description": "Key name for the extracted data in the response.",
          "required": false,
          "default": "data"
        },
        "max_rows": {
          "type": "integer",
          "description": "Maximum rows per sheet.",
          "required": false,
          "default": 1000,
          "minimum": 1,
          "maximum": 100000
        }
      }
    },
    "file-to-base64": {
      "description": "Convert a file to base64-encoded string. File must be 10 MB or smaller for inline return.",
      "properties": {
        "input_base64": {
          "type": "string",
          "description": "Base64-encoded file content.",
          "required": false
        },
        "file_id": {
          "type": "string",
          "description": "File ID from cloud storage.",
          "required": false
        }
      }
    }
  },
  "properties": {}
}
```

### Usage Instructions

# File To JSON Parsing - Instructions

## Overview
Extract structured JSON data from a wide range of file formats. Provide a file via base64-encoded content or a cloud storage file ID, and receive parsed, structured output. Supports CSV, HTML, JSON, ICS (calendar), ODS, PDF, RTF, plain text, XLS, and XLSX files. Also supports converting any file to base64.

## File Input
Every action (except get_instructions) requires **one** of the following:
- **input_base64** (string) - Base64-encoded file content (up to 100 MB raw; 10 MB for file-to-base64 return)
- **file_id** (string) - File ID from cloud storage

## Actions

### extract-csv
Parse a CSV file into structured row data.

**Required:** `action`, plus `input_base64` or `file_id`
**Optional:**
- `max_rows` (integer, default 1000, max 100000) - Maximum rows to extract
- `output_field` (string, default "data") - Key name for the extracted data in the response

**Example:**
```json
{
  "action": "extract-csv",
  "input_base64": "bmFtZSxhZ2UKQWxpY2UsMzAKQm9iLDI1"
}
```

---

### extract-html
Parse an HTML file, extracting text content and/or table data.

**Required:** `action`, plus `input_base64` or `file_id`
**Optional:**
- `include_text` (boolean, default true) - Include extracted text content
- `include_tables` (boolean, default true) - Include extracted table data
- `max_rows` (integer, default 1000) - Maximum rows per table
- `output_field` (string, default "data")

**Example:**
```json
{
  "action": "extract-html",
  "file_id": "abc123",
  "include_text": true,
  "include_tables": true
}
```

---

### extract-json
Parse a JSON file and return its contents as structured data.

**Required:** `action`, plus `input_base64` or `file_id`
**Optional:**
- `output_field` (string, default "data")

**Example:**
```json
{
  "action": "extract-json",
  "input_base64": "eyJrZXkiOiAidmFsdWUifQ=="
}
```

---

### extract-ics
Parse an ICS calendar file and extract events with summary, start, end, location, and description.

**Required:** `action`, plus `input_base64` or `file_id`
**Optional:**
- `output_field` (string, default "data")

**Example:**
```json
{
  "action": "extract-ics",
  "file_id": "calendar_file_id"
}
```

---

### extract-ods
Parse an OpenDocument Spreadsheet (.ods) file, returning sheets with row data.

**Required:** `action`, plus `input_base64` or `file_id`
**Optional:**
- `max_rows` (integer, default 1000, max 100000) - Maximum rows per sheet
- `output_field` (string, default "data")

**Example:**
```json
{
  "action": "extract-ods",
  "file_id": "spreadsheet_file_id",
  "max_rows": 500
}
```

---

### extract-pdf
Extract text and/or tables from a PDF document, page by page.

**Required:** `action`, plus `input_base64` or `file_id`
**Optional:**
- `include_text` (boolean, default true) - Include text extraction per page
- `include_tables` (boolean, default true) - Include table extraction per page
- `max_pages` (integer, default 50, max 1000) - Maximum pages to process
- `output_field` (string, default "data")

**Example:**
```json
{
  "action": "extract-pdf",
  "file_id": "report_pdf_id",
  "max_pages": 10,
  "include_text": true,
  "include_tables": false
}
```

---

### extract-rtf
Parse an RTF (Rich Text Format) file and extract plain text.

**Required:** `action`, plus `input_base64` or `file_id`
**Optional:**
- `output_field` (string, default "data")

**Example:**
```json
{
  "action": "extract-rtf",
  "input_base64": "e1xydGYxIEhlbGxvIFdvcmxkfQ=="
}
```

---

### extract-text
Read a plain text file and return its contents.

**Required:** `action`, plus `input_base64` or `file_id`
**Optional:**
- `output_field` (string, default "data")

**Example:**
```json
{
  "action": "extract-text",
  "file_id": "text_file_id"
}
```

---

### extract-xls
Parse a legacy Excel (.xls) file, returning sheets with row data.

**Required:** `action`, plus `input_base64` or `file_id`
**Optional:**
- `max_rows` (integer, default 1000, max 100000) - Maximum rows per sheet
- `output_field` (string, default "data")

**Example:**
```json
{
  "action": "extract-xls",
  "file_id": "legacy_excel_id",
  "max_rows": 2000
}
```

---

### extract-xlsx
Parse a modern Excel (.xlsx) file, returning sheets with row data.

**Required:** `action`, plus `input_base64` or `file_id`
**Optional:**
- `max_rows` (integer, default 1000, max 100000) - Maximum rows per sheet
- `output_field` (string, default "data")

**Example:**
```json
{
  "action": "extract-xlsx",
  "input_base64": "<base64_encoded_xlsx>",
  "max_rows": 5000
}
```

---

### file-to-base64
Convert a cloud-stored file to base64 for inline use. The file must be 10 MB or smaller.

**Required:** `action`, plus `input_base64` or `file_id`

**Example:**
```json
{
  "action": "file-to-base64",
  "file_id": "image_file_id"
}
```

---

## Common Workflows

1. **Parse an uploaded spreadsheet:** Use `extract-xlsx` or `extract-xls` with a `file_id` to get structured row data from each sheet.
2. **Extract text from a PDF report:** Use `extract-pdf` with `include_text: true` and `include_tables: false` for text-only extraction.
3. **Convert HTML to structured data:** Use `extract-html` to pull both readable text and any embedded tables from an HTML file.
4. **Read calendar events:** Use `extract-ics` to get a list of events from an ICS calendar export.
5. **Retrieve a file as base64:** Use `file-to-base64` with a `file_id` to get the raw file content encoded for inline transfer.

## Important Notes
- Every extraction action requires either `input_base64` or `file_id` -- at least one must be provided.
- Maximum file size is 100 MB. The `file-to-base64` action has a stricter 10 MB limit for the returned content.
- The `max_rows` parameter applies to CSV, HTML tables, ODS, XLS, and XLSX extractions.
- The `max_pages` parameter applies only to PDF extraction.
- The `include_text` and `include_tables` options apply to HTML and PDF extraction.
- The `output_field` parameter lets you customize the key name in the response (default is "data").
- Text files are decoded as UTF-8, falling back to Latin-1 if needed.
- Spreadsheet actions (ODS, XLS, XLSX) return data organized by sheet, each with a name and rows array.

### About The Developer

- Vendor name: Apoth3osis
- Website: apoth3osis.io

We build tools that enable AI agents to excel in the mathematical realm.

Our small team develops experimental and unique solutions in the AI arena, with a strong focus on modular computing for agentic applications and custom model deployment. We have handled projects for a variety of applications across many sectors, from algorithmic trading and financial analysis, to molecular simulations and predictions, to habitat and biodiversity monitoring and wildlife conservation.

### Frequently Asked Questions

No linked FAQs are currently available.

### Dependencies

These products are automatically added when this product is enabled on the page UI.

#### File Management

Upload, list, retrieve, share, download, delete, and manage files stored in AgentPMT cloud storage. This product now owns the full file lifecycle, including signed upload URLs for files up to 10MB and for files over 10MB up to 100MB, budget-scoped file listing with preview URLs, fresh signed download URLs, direct base64 download for smaller files, password-protected sharing, metadata and tag updates, access-history inspection, and expiration extension. All file operations are scoped to the current budget for isolation and are designed to let one budget create persistent files that can be revisited across later agent runs.

- Page URL: https://www.agentpmt.com/marketplace/file-management
- Markdown URL: https://www.agentpmt.com/marketplace/file-management?format=agent-md