File To JSON Parsing - Instructions
Overview
Extract structured JSON data from a wide range of file formats. Provide a file via base64-encoded content or a cloud storage file ID, and receive parsed, structured output. Supports CSV, HTML, JSON, ICS (calendar), ODS, PDF, RTF, plain text, XLS, and XLSX files. Also supports converting any file to base64.
File Input
Every action (except get_instructions) requires one of the following:
- input_base64 (string) - Base64-encoded file content (up to 100 MB raw; 10 MB for file-to-base64 return)
- file_id (string) - File ID from cloud storage
Actions
extract-csv
Parse a CSV file into structured row data.
Required: action, plus input_base64 or file_id
Optional:
max_rows(integer, default 1000, max 100000) - Maximum rows to extractoutput_field(string, default "data") - Key name for the extracted data in the response
Example:
{
"action": "extract-csv",
"input_base64": "bmFtZSxhZ2UKQWxpY2UsMzAKQm9iLDI1"
}
extract-html
Parse an HTML file, extracting text content and/or table data.
Required: action, plus input_base64 or file_id
Optional:
include_text(boolean, default true) - Include extracted text contentinclude_tables(boolean, default true) - Include extracted table datamax_rows(integer, default 1000) - Maximum rows per tableoutput_field(string, default "data")
Example:
{
"action": "extract-html",
"file_id": "abc123",
"include_text": true,
"include_tables": true
}
extract-json
Parse a JSON file and return its contents as structured data.
Required: action, plus input_base64 or file_id
Optional:
output_field(string, default "data")
Example:
{
"action": "extract-json",
"input_base64": "eyJrZXkiOiAidmFsdWUifQ=="
}
extract-ics
Parse an ICS calendar file and extract events with summary, start, end, location, and description.
Required: action, plus input_base64 or file_id
Optional:
output_field(string, default "data")
Example:
{
"action": "extract-ics",
"file_id": "calendar_file_id"
}
extract-ods
Parse an OpenDocument Spreadsheet (.ods) file, returning sheets with row data.
Required: action, plus input_base64 or file_id
Optional:
max_rows(integer, default 1000, max 100000) - Maximum rows per sheetoutput_field(string, default "data")
Example:
{
"action": "extract-ods",
"file_id": "spreadsheet_file_id",
"max_rows": 500
}
extract-pdf
Extract text and/or tables from a PDF document, page by page.
Required: action, plus input_base64 or file_id
Optional:
include_text(boolean, default true) - Include text extraction per pageinclude_tables(boolean, default true) - Include table extraction per pagemax_pages(integer, default 50, max 1000) - Maximum pages to processoutput_field(string, default "data")
Example:
{
"action": "extract-pdf",
"file_id": "report_pdf_id",
"max_pages": 10,
"include_text": true,
"include_tables": false
}
extract-rtf
Parse an RTF (Rich Text Format) file and extract plain text.
Required: action, plus input_base64 or file_id
Optional:
output_field(string, default "data")
Example:
{
"action": "extract-rtf",
"input_base64": "e1xydGYxIEhlbGxvIFdvcmxkfQ=="
}
extract-text
Read a plain text file and return its contents.
Required: action, plus input_base64 or file_id
Optional:
output_field(string, default "data")
Example:
{
"action": "extract-text",
"file_id": "text_file_id"
}
extract-xls
Parse a legacy Excel (.xls) file, returning sheets with row data.
Required: action, plus input_base64 or file_id
Optional:
max_rows(integer, default 1000, max 100000) - Maximum rows per sheetoutput_field(string, default "data")
Example:
{
"action": "extract-xls",
"file_id": "legacy_excel_id",
"max_rows": 2000
}
extract-xlsx
Parse a modern Excel (.xlsx) file, returning sheets with row data.
Required: action, plus input_base64 or file_id
Optional:
max_rows(integer, default 1000, max 100000) - Maximum rows per sheetoutput_field(string, default "data")
Example:
{
"action": "extract-xlsx",
"input_base64": "<base64_encoded_xlsx>",
"max_rows": 5000
}
file-to-base64
Convert a cloud-stored file to base64 for inline use. The file must be 10 MB or smaller.
Required: action, plus input_base64 or file_id
Example:
{
"action": "file-to-base64",
"file_id": "image_file_id"
}
Common Workflows
- Parse an uploaded spreadsheet: Use
extract-xlsxorextract-xlswith afile_idto get structured row data from each sheet. - Extract text from a PDF report: Use
extract-pdfwithinclude_text: trueandinclude_tables: falsefor text-only extraction. - Convert HTML to structured data: Use
extract-htmlto pull both readable text and any embedded tables from an HTML file. - Read calendar events: Use
extract-icsto get a list of events from an ICS calendar export. - Retrieve a file as base64: Use
file-to-base64with afile_idto get the raw file content encoded for inline transfer.
Important Notes
- Every extraction action requires either
input_base64orfile_id-- at least one must be provided. - Maximum file size is 100 MB. The
file-to-base64action has a stricter 10 MB limit for the returned content. - The
max_rowsparameter applies to CSV, HTML tables, ODS, XLS, and XLSX extractions. - The
max_pagesparameter applies only to PDF extraction. - The
include_textandinclude_tablesoptions apply to HTML and PDF extraction. - The
output_fieldparameter lets you customize the key name in the response (default is "data"). - Text files are decoded as UTF-8, falling back to Latin-1 if needed.
- Spreadsheet actions (ODS, XLS, XLSX) return data organized by sheet, each with a name and rows array.







