Turn messy support data into AI-ready datasets.

Pay once, upload your Salesforce, Zendesk, or Intercom export, show us the cleaned format you want, and get back a PII-redacted dataset for RAG, fine-tuning, or evals.

The mess

Your support data is gold. It's also a disaster.

Someone above you said “use the support data for AI.” Now it's on you. Here's what you're actually staring at.

Custom-field chaos

Salesforce exports come out as custom_field_3829103. Nobody on your team remembers what those map to.

PII everywhere

Free-text fields hide names, account IDs, internal codes. Chat Agent catches the easy ones, misses the domain-specific ones and your legal team will not be amused.

Half noise

Autoresponders, one-liners, abandoned threads. If you train on it raw, your model learns to ping back “Got it, looking into this!” forever.

No ML team to spare

The right answer is a 6-step pipeline. Your ML team isn't going to spend a week on data plumbing, they're already buried.

Who it's for

Built for the person who got the email.

This is for you if

You got handed a Salesforce, Zendesk, or Intercom export and told to “clean it for AI.”
You're not sure if it's for RAG, fine-tuning, or evals — and you need clean data either way.
You're worried about PII leaking, but you don't have an ML team to build the pipeline.
Your options today are: spend a week hand-rolling it, hire a contractor for $5K, or buy enterprise tools at $30K/yr.

How it works

Four steps. One clean dataset.

Most cleaning projects move from intake to delivery without you opening a single notebook.

01.

Pay for a project

Choose Standard or Rush and pay once through Stripe.

02.

Upload the export

Add your raw support export and a small sample showing how you want the cleaned data structured.

03.

ChatClean cleans it

Redact, normalize, dedupe, filter, format. Processed on infrastructure we control. Source files deleted within 7 days.

04.

You get the dataset

JSONL, Parquet, or CSV — plus a full audit log and a 1-page summary you can hand your manager.

What's in the box

Every project ships with all of this.

Cleaned dataset

Output in your format of choice — JSONL for fine-tuning, Parquet for RAG pipelines, CSV for analysis.

PII redaction audit log

Every entity caught, every type, every confidence score. Defensible to your legal and compliance teams.

Quality report

Duplicates removed, low-quality examples filtered, multi-turn threads reconstructed from message-level rows.

Format-normalized output

Salesforce custom-field gibberish translated to human-readable labels. Threads stitched into conversation turns.

One-page summary

What you got. What was kept. What was dropped. Why. Hand it to your manager and move on.

Pricing

One export. One clean AI-ready dataset.

Pay once, upload your support export, show us the format you want, and get back cleaned data ready for RAG, fine-tuning, or evals.

Standard

$299/ project

One support export, no emergency.

5 business day turnaround
Up to 10,000 records
PII redaction + basic quality report
JSONL / Parquet / CSV output
Desired-output sample collected after checkout

Most pickedRush

$599/ project

One support export, faster turnaround.

24-hour turnaround
Up to 10,000 records
PII redaction + basic quality report
JSONL / Parquet / CSV output
Desired-output sample collected after checkout

10k+ records or a recurring pipeline? intake@chat-clean.com →

FAQ

The questions people actually ask.

How do you handle our PII during processing?

Encrypted file transfer. Processed on infrastructure we operate ourselves — either local models or enterprise-tier cloud models in accounts we control, never sent to consumer AI services. Source files deleted within 7 days. NDA and DPA available on request.

What formats do you accept?

Generic CSV and JSON exports with chat or ticket data today — named-platform importers (Zendesk, Intercom, Salesforce, Freshdesk) are rolling out. Not sure if yours fits? Email us — we've likely seen it.

What if I'm not sure what my AI project is?

Most common case. The cleaning work is the same regardless of downstream task — we deliver the dataset in a structure that works for RAG, fine-tuning, and evals. You decide later.

Why not just use Microsoft Presidio?

Presidio handles one step (generic PII redaction) and does it reasonably well. It doesn't do format normalization, quality filtering, audit reporting, or domain-specific PII (customer IDs, internal employee names, product codes). Most teams that try it spend 2–3 days on setup before their first clean record — and still need to build the rest of the pipeline. We deliver the whole pipeline as a service.

Do you handle non-English data?

Yes — Spanish, French, German, Portuguese, Italian, Dutch, and Japanese are supported in v1. Custom recognizers for other languages on request.

Can this scale beyond one-off projects?

Yes. After your first project, we can stand up a recurring pipeline — weekly or monthly automated clean exports — at custom pricing. Email after delivery to discuss.

Ready when you are