Turn messy support data into AI-ready datasets.

ChatClean helps support teams turn their messy customer support chat data exports into clean, PII-redacted, AI-ready datasets

Your support data is gold. It's also a disaster.

Someone above you said “use the support data for AI.” Now it's on you. Here's what you're actually staring at.

01

Custom-field chaos

Salesforce exports come out as custom_field_3829103. Nobody on your team remembers what those map to.

02

PII everywhere

Free-text fields hide names, account IDs, internal codes. Chat Agent catches the easy ones, misses the domain-specific ones and your legal team will not be amused.

03

Half noise

Autoresponders, one-liners, abandoned threads. If you train on it raw, your model learns to ping back “Got it, looking into this!” forever.

04

No ML team to spare

The right answer is a 6-step pipeline. Your ML team isn't going to spend a week on data plumbing, they're already buried.

Built for the person who got the email.

This is for you if

  • You got handed a Salesforce, Zendesk, or Intercom export and told to “clean it for AI.”
  • You're not sure if it's for RAG, fine-tuning, or evals — and you need clean data either way.
  • You're worried about PII leaking, but you don't have an ML team to build the pipeline.
  • Your options today are: spend a week hand-rolling it, hire a contractor for $5K, or buy enterprise tools at $30K/yr.

Four steps. One clean dataset.

Most cleaning projects move from intake to delivery without you opening a single notebook.

01.

Get Started

Tell me about your sample data. Your desired format for output.

02.

Send the export

Encrypted transfer via your preferred method.

03.

I clean it

Redact, normalize, dedupe, filter, format. Processed locally. Source files deleted within 7 days.

04.

You get the dataset

JSONL, Parquet, or CSV — plus a full audit log and a 1-page summary you can hand your manager.

Every project ships with all of this.

Cleaned dataset

Output in your format of choice — JSONL for fine-tuning, Parquet for RAG pipelines, CSV for analysis.

PII redaction audit log

Every entity caught, every type, every confidence score. Defensible to your legal and compliance teams.

Quality report

Duplicates removed, low-quality examples filtered, multi-turn threads reconstructed from message-level rows.

Format-normalized output

Salesforce custom-field gibberish translated to human-readable labels. Threads stitched into conversation turns.

One-page summary

What you got. What was kept. What was dropped. Why. Hand it to your manager and move on.

Two prices. Zero surprises.

Payment via Stripe.

Standard
$299/ project

For teams with a deadline, not an emergency.

  • 2-day turnaround
  • Up to 10,000 records
  • Full audit log + quality report
  • JSONL / Parquet / CSV output
  • 30-min handover call
Get Started

10k+ records or a recurring pipeline? intake@chat-clean.com →

The questions people actually ask.

How do you handle our PII during processing?
Encrypted file transfer. Processed locally on a dedicated workstation. Source files deleted within 7 days. NDA signed before any data moves. DPA available on request.
What formats do you accept?
Salesforce Service Cloud exports (CSV/JSON), Zendesk full ticket exports, Intercom conversations exports, Freshdesk exports, and generic CSVs with chat data. Stack not listed? Email me — I've likely seen it.
What if I'm not sure what my AI project is?
Most common case. The cleaning work is the same regardless of downstream task — I deliver the dataset in a structure that works for RAG, fine-tuning, and evals. You decide later.
Why not just use Microsoft Presidio?
Presidio handles one step (generic PII redaction) and does it reasonably well. It doesn't do format normalization, quality filtering, audit reporting, or domain-specific PII (customer IDs, internal employee names, product codes). Most teams that try it spend 2–3 days on setup before their first clean record — and still need to build the rest of the pipeline. I deliver the whole pipeline as a service.
Do you handle non-English data?
Yes — Spanish, French, German, Portuguese, Italian, Dutch, and Japanese are supported in v1. Custom recognizers for other languages on request.
Can this scale beyond one-off projects?
Yes. After your first project, I can stand up a recurring pipeline — weekly or monthly automated clean exports — at custom pricing. Email after delivery to discuss.

Use cleaner data than your competitors.

Get Started