AI and Protecting Client Data

Workshop8 min read

Travel advisors handle some of the most sensitive personal data of any profession, and the answer is not to avoid AI or to paste everything into ChatGPT and hope for the best. Sort your client data into two buckets: stuff that creates legal liability if it leaks, and stuff you just strongly do not want leaking. Now you probably wont build these systems yourself but ask every AI vendor you use how they protect your data, or whether they are sending your client data raw to OpenAI and hoping..

Watch the recording

Download the presentation

Travel advisors hold some of the most sensitive personal data of any profession. Passport scans, payment cards, home addresses, family information, medical needs, supplier rates, and the kind of lifestyle preferences that would be uncomfortable in the wrong hands. AI is a useful tool and refusing to use it is not really an option anymore. Pasting client information into ChatGPT without thinking about where it goes is also not the answer.

The real question is how to use AI on real client work without anything weird happening to that data.

The goal is not to stop using AI on client data. The goal is to use AI on client data without leaking it.

Two classes of client data

Not all sensitive data is the same. There are two classes, and they need to be handled differently.

Class one is data where a leak makes you legally liable and you have zero risk tolerance. Payment card numbers and Passport information. If this data ends up somewhere it should not, the consequences are not a bad customer experience, they are lawyers and fines and regulatory action.

Class two is data you would strongly prefer not to leak. Family member names. Address. Internal notes about VIP clients. The reputational and trust cost of leaking this data is high, but the legal exposure is lower.

The mistake most people make is treating everything as one class. They either get paranoid about all of it (and prefer to store it all in an underground bunker) or they get casual about all of it (and paste everything into ChatGPT). Sorting your fields into the two buckets allows decision making about which tools you use for which workflow get a lot easier.

What actually happens when you paste into ChatGPT

This is the question advisors ask most often, and the truthful answer is more nuanced than yes or no.

No major model provider is selling your conversations or training a model specifically on what you typed. That version of the fear does not match reality.

What is true is that consumer tiers and enterprise tiers handle your data very differently.

On the consumer side, as of late 2025, ChatGPT Free, Plus, and Pro accounts have model training on by default unless you turn it off in settings (see OpenAI's policy on how your data is used to improve model performance and the corresponding help article on managing data sharing). Anthropic shifted to a similar model around the same time, where Claude Free, Pro, and Max train on data by default unless the user opts out (see Anthropic's update to consumer terms and the TechCrunch coverage of the change). Google Gemini and Microsoft Copilot have their own variants of the same pattern.

On the enterprise side, ChatGPT Team, ChatGPT Enterprise, and the OpenAI API contractually do not use your data for training (see OpenAI's business data page). Claude for Work, Claude Enterprise, and API access carry the same protection (see Anthropic's privacy article on model training). The protection is contractual, not optional, and it does not depend on a toggle.

The practical takeaway is direct: if you or your team are using consumer ChatGPT or consumer Claude on client data, your default position is that prompts can be used to improve future models.

That is only one part of the picture. The bigger conversation is not "will they train on this," it is "the data is sitting on someone else's servers, who else can see it, for how long, and what happens if they get breached." That is the real question.

Your vendors already have it. So do their vendors.

WhatsApp has client data. Gmail has client data. Your itinerary tool has client data. Your CRM has client data. The booking system you use has client data. None of this is news.

The relevant question is who they pass it to. Modern software is built on top of other software. Your itinerary tool runs on AWS or Google Cloud. It probably uses a model provider for any AI features (OpenAI, Anthropic, sometimes both). It probably uses an email service for outbound messaging. Each of those is a separate company with its own privacy posture.

The right frame is software supply chain. Every vendor you use is forwarding pieces of your data to a small constellation of other vendors behind the scenes. You do not get to opt out of that chain entirely if you want to operate. You do get to ask better questions about who is in the chain and what they actually do with what they receive.

Less data shared is safer, but less useful

Every system in software has the same trade-off, and AI makes it sharper. The less data you give a system, the lower the risk if it leaks. Also: the less it can do for you. AI in particular needs context to do real, useful, autonomous work. An AI that does not know who your client is, what trip they are on, or what you have already discussed with them is an AI that is going to ask you fifteen clarifying questions before it can write the simplest email.

The right answer is not "give nothing." It is making deliberate choices about what each vendor sees, with eyes open, knowing that you are choosing capability and you are choosing exposure at the same time.

Don't give the AI the data in the first place where you can

The most reliable protection is the data the model never sees. If a model never receives a passport number, that passport number cannot leak through that model.

This is a design choice that lives upstream of any AI tool. Instead of asking clients to send their passport details over WhatsApp or email, where they end up in your message history (and therefore in any AI tool that touches your message history), send them a link to a secure intake vault where the structured fields go directly into a system you control. You read them when you need them. The AI tools you use for itinerary drafting and supplier comms never see the raw values, because the raw values are not in those channels in the first place.

What if data does reach AI

You probably won't be building these AI tools yourself and also not the protective layers that keep PII (Personally Identifiable Information) away from the AI. This is ment to inform you on how masking client data works so you can ask your software providers the right questions.

A lot of the time, data does flow. Hotel confirmations come as PDFs with the guest name and passport number baked into the body. Supplier emails arrive with everything in the message text. Visa documents are images. So the question becomes: when sensitive data does need to pass through an AI workflow, what does responsible handling actually look like?

The mental model: detect, swap, run, restore

The core pipeline has four steps. Detect what is sensitive in the input. Swap each sensitive value for a placeholder. Run the AI on the masked version. Restore the real values in the output where you (the advisor) need to see them.

A simple before-and-after.

The advisor pastes: "Mrs. Anya Petrov, passport AB1234567, arriving at Hotel Splendido on June 14 from her home at 12 Eaton Square, London."

The system replaces it with: "[CLIENT_1], passport [PASSPORT_1], arriving at Hotel Splendido on June 14 from her home at [ADDRESS_1]."

The AI drafts whatever it was asked to draft, working with the masked version.

When the result comes back, the system swaps the placeholders for the real values, and the advisor sees a complete, useful output.

The model never saw the raw data. The advisor never saw the placeholders. The mapping is held only on your side, never sent to the model.

Three types of redaction rules, in order of difficulty

Not every kind of sensitive data needs the same kind of handling.

The first layer is static rules. These are deterministic patterns: regex for credit card numbers, passport-shaped strings, phone numbers, email addresses, IBAN codes. They are fast, cheap, and reliable. They catch a surprising amount of what advisors actually paste.

The second layer is dynamic rules. Take your own client database, your own supplier list, your own contact records. Every name, every address, every supplier code in those systems becomes something the redaction layer specifically watches for. A static rule cannot know that "Rose Waterman" is a VIP client name who does not want their contact in any system. A dynamic rule fed by your CRM can.

The third layer is generative rules. Some kinds of sensitive data like street addresses are hard to detect rule based. They look like a number followed by some words, but so does almost every line of text written by a human. For these the best thing you can have is a private AI model to make the judgment. It does not have to be a frontier model, but the model should run on private secure computers and not store any of the data.

The raw data never has to leave your private environment. The small local model sees the unredacted text, decides what to mask, swaps in placeholders, and only the masked version goes outbound to the frontier model. This is sometimes called the proxy pattern. From the user's perspective the request looks like it went to OpenAI or Anthropic. In reality it went through a small filter on your side first, and what reached the frontier model was the safe version.

The order matters. Do as much as you can with the cheap deterministic layers, then escalate to dynamic rules, and only reach for AI judgment for what genuinely needs it.

PDFs and email attachments

Most of the documents an advisor handles every day, hotel confirmations, supplier invoices, visa documents, are text-based PDFs. The text inside them can be extracted with normal PDF parsing tools before AI is involved at all. Once it is text, the same redaction layers apply to it.

There is no good reason for a hotel confirmation PDF to be uploaded to OpenAI as raw content. A more responsible workflow extracts the text, masks the sensitive fields (guest name, passport number, address), and only then hands the masked text to the AI for whatever the actual task was, parsing the booking, populating the CRM, drafting a confirmation email.

The same logic applies to attachments emails. The body is already text, run attachments through the redaction layer first.

Images and screenshots

Images are where this gets meaningfully harder. The simplest approach is to use a private AI to extract the text out of the image, redact the text, send the cleaned text to the model. This works, but it strips out all other context in the image.

The pattern that holds up better is the bounding-box approach. Use a vision model to detect what regions of the image need redaction. Have a second step draw bounding boxes around those regions. Then black out those regions before the image goes anywhere else. The image is now safe to send onward, and you have preserved the layout that made it useful in the first place.

Ask your vendors

With a better understanding of where sensitive information comes from and how to categorize that information, you can ask your software providers how they protect that sensitive information if they are using AI. There are 3 questions to ask:

What data do you will the system have access to?
Where and how do you store that data?
What is your PII redaction strategy before data goes to AI models?

A quick word on your team

Telling your team "do not paste anything sensitive into ChatGPT" is not a real strategy. They will. Not because they are reckless, but because the alternative is doing the work the slow manual way while the rest of the world races ahead.

The only durable answer is giving them tools that handle the data responsibly by default, so the responsible path and the easy path are the same path. That moves the burden from "every employee has to remember the rules" to "the system enforces the rules without anyone thinking about it."