The Challenge
This organisation operates in a highly regulated industry where data privacy isn't optional — it's existential. Patient records, financial details, and personal identifiers flow through every process. Their team recognised the transformative potential of frontier AI models for document summarisation, data classification, and internal knowledge retrieval. But there was a hard blocker:
- Regulatory frameworks prohibited sending personally identifiable information (PII) to any external service or cloud-based AI platform
- Manual redaction was eating up to 20 hours per week across the compliance team — and still missing edge cases
- Existing off-the-shelf redaction tools were pattern-based and failed on unstructured text, missing names embedded in sentences, contextual identifiers, and composite references
- The team had effectively given up on using AI for anything involving real data
Our Approach
We deployed a locally-hosted large language model specifically fine-tuned for PII detection and redaction. The architecture ensures that no raw data ever leaves the organisation's network — the local LLM acts as a secure gateway between sensitive internal data and powerful external AI models.
- Local LLM deployment: Installed and configured on the client's own infrastructure (on-premise server), with no internet-facing endpoints
- Multi-pass PII detection: The model runs multiple detection passes — entity recognition, contextual analysis, and pattern matching — to achieve comprehensive coverage
- Secure pipeline: Only after PII is fully stripped does the sanitised data get passed to a frontier model (e.g., for summarisation or classification). Results flow back through the same secure channel
- Audit trail: Every redaction is logged with the original text hash (not content) so compliance can verify what was processed and when
- Human review interface: For the initial calibration period, flagged edge cases were surfaced to a compliance officer for verification before the model was fully trusted
Runs Locally — Your Data Never Leaves Your Network
This is not a cloud AI solution with a privacy policy. The LLM runs entirely on local hardware within the client's own infrastructure. Here's what that means in practice:
- Zero external data transmission: Raw data never touches the internet. The local model processes everything in-house before any sanitised output is sent externally
- No third-party access: No vendor has access to the data, the model weights, or the processing logs. The client owns and controls everything
- Air-gapped option available: For maximum security, the local LLM can run on a fully air-gapped machine with no network connectivity at all
- Regulatory alignment: This architecture satisfies the data residency and processing requirements for healthcare (HIPAA-aligned), financial services (APRA CPS 234), and legal (client privilege) contexts
The Results
100%
PII Detection Rate
80%
Less Manual Redaction
20 Hrs/Wk
Compliance Time Saved
0
Data Breaches
The organisation can now use frontier AI models for document summarisation, classification, and internal search — all without compromising on data privacy. The compliance team went from spending most of their week on manual redaction to overseeing an automated pipeline that handles it in minutes.
"We'd written off AI entirely because of the privacy risk. This solution gave us the best of both worlds — cutting-edge AI capabilities with zero data exposure. The compliance team actually trusts it."
— Head of Compliance, Healthcare Organisation (name withheld)