Synthetic Data Generator - Instructions
Overview
Generate realistic synthetic data for testing, development, and prototyping. Supports individual records (person, company, family, technical, financial, edge cases) and complete relational datasets (e-commerce, auth system, CRM). Data is locale-aware with support for 10 regions. Use a seed for reproducible results across runs.
Actions
generate
Generate synthetic data of a specified type.
Required Fields:
action(string):"generate"data_type(string): Type of data to generate. One of:person,company,family,technical,financial,edge_cases,ecommerce_dataset,auth_system_dataset,crm_dataset
Optional Fields:
count(integer, default: 1): Number of records to generate (1-1000). For dataset types, this is ignored in favor of thesizeparameter.locale(string, default:"en_US"): Region for names, addresses, phone formats. Options:en_US,en_GB,de_DE,fr_FR,es_ES,it_IT,pt_BR,nl_NL,pl_PL,ja_JPseed(integer, default: null): Random seed for reproducible output. Same seed + same parameters = same data every time. Omit for random data.include_details(boolean, default: true): Include extended fields (addresses, financials, relationships). Set false for minimal records.include_edge_cases(boolean, default: false): Mix in unicode, special characters, and boundary values for robustness testing.size(string, default:"medium"): Only for dataset types (ecommerce_dataset,auth_system_dataset,crm_dataset). Options:small(~100 records),medium(~500 records),large(~2000+ records). Ignored for non-dataset types.options(object): Type-specific advanced options (see below).
Data Types
person
Individual profiles with names, emails, addresses, demographics.
Options:
age_range(array of 2 integers, 0-120): Filter by age range, e.g.[25, 65]
Example:
{
"action": "generate",
"data_type": "person",
"count": 5,
"locale": "fr_FR",
"options": { "age_range": [25, 45] }
}
company
Business profiles with industry, size, revenue, and sample employees.
Options:
industry_filter(string): Filter to a specific industry. Options:Technology,Healthcare,Finance,Manufacturing,Retail,Educationsize_category(string): Company size. Options:small(1-50 employees),medium(51-500),large(501-5000),enterprise(5000+)
Example:
{
"action": "generate",
"data_type": "company",
"count": 3,
"options": { "industry_filter": "Technology", "size_category": "medium" }
}
family
Family units with parents, children, shared addresses, and relationship mappings.
Options:
family_size_range(array of 2 integers, 2-10): Min and max family members, e.g.[3, 5]
Example:
{
"action": "generate",
"data_type": "family",
"count": 2,
"include_details": true,
"options": { "family_size_range": [3, 5] }
}
technical
IPs, UUIDs, URLs, domains, API keys, tokens, and system info.
Options:
data_types(array of strings): Which technical types to include. Options:ip,ipv6,mac,uuid,url,domain,email,user_agent,api_key,token. Default:["ip", "uuid", "url", "email", "user_agent"]
Example:
{
"action": "generate",
"data_type": "technical",
"count": 10,
"options": { "data_types": ["ip", "ipv6", "mac", "uuid", "api_key"] }
}
financial
Credit cards, bank accounts, balances, and transaction history.
Options:
currency(string, default:"USD"): ISO 4217 currency code (e.g.USD,EUR,GBP,JPY)include_transactions(boolean, default: false): Include 5-50 detailed transactions per record
Example:
{
"action": "generate",
"data_type": "financial",
"count": 3,
"options": { "currency": "EUR", "include_transactions": true }
}
edge_cases
Unicode, special characters, boundary values, injection patterns, and malformed data for robustness testing.
Options:
severity_level(string, default:"medium"): Options:low,medium,high. Higher severity produces more extreme test values.categories(array of strings): Which edge case types to include. Options:unicode,length,null,boundary,malformed,injection,special_chars,numeric. Default:["unicode", "length", "null", "boundary"]
Example:
{
"action": "generate",
"data_type": "edge_cases",
"count": 5,
"options": { "severity_level": "high", "categories": ["unicode", "injection", "boundary", "malformed"] }
}
ecommerce_dataset
Complete e-commerce dataset with interlinked customers, products, and orders.
Example:
{
"action": "generate",
"data_type": "ecommerce_dataset",
"size": "small",
"seed": 42
}
auth_system_dataset
Authentication/authorization dataset with users, roles, permissions, and sessions.
Example:
{
"action": "generate",
"data_type": "auth_system_dataset",
"size": "medium",
"locale": "en_GB"
}
crm_dataset
CRM dataset with companies, contacts, and deals/opportunities with pipeline stages.
Example:
{
"action": "generate",
"data_type": "crm_dataset",
"size": "large",
"seed": 123
}
Common Workflows
Reproducible Test Fixtures
Use seed to generate identical data across environments:
{
"action": "generate",
"data_type": "person",
"count": 100,
"seed": 12345,
"locale": "en_US"
}
Security and Validation Testing
Combine edge cases with regular data:
{
"action": "generate",
"data_type": "person",
"count": 50,
"include_edge_cases": true
}
Minimal Data for Quick Tests
Disable extended details for lightweight records:
{
"action": "generate",
"data_type": "company",
"count": 20,
"include_details": false
}
Important Notes
countranges from 1 to 1000. For dataset types, usesizeinstead to control volume.- Dataset types (
ecommerce_dataset,auth_system_dataset,crm_dataset) return multiple interlinked collections with arecord_countssummary. - Single-record requests (count=1) return the object directly; multiple records return a list wrapped in a named key.
- All generated data is fake and safe for testing. Financial data (credit cards, accounts) is not real.
- Edge case injection patterns are safe strings for testing input validation -- they do not perform actual attacks.
- The
localeparameter affects names, addresses, and phone number formats but not all fields (e.g., currency must be set separately via options).







