When we put a capture agent on a finance clerk's machine, the first question the operator asks is rarely about the redesign we're there to do. It's "what are you collecting, and where does it go." That question deserves an answer from the architecture rather than a reassurance, because the only honest way to say "we don't keep what we don't need" is to build a system that physically cannot keep it. The design choice that follows from taking the question seriously is the difference between an observation tool a security team will approve and a screen recorder they'll escalate to legal. We capture the structure of the work, which steps run and in what order, and we don't record the worker, and almost everything below is downstream of that one decision.
Scoped to the surface, not the screen
A screen recorder is the lazy way to learn a process, and it's the wrong way. It captures everything in front of the clerk, which means it captures the personal email tab, the HR portal, the banking app the clerk forgot to close, and the family photo on the desktop, none of which the redesign needs and all of which a buyer's security team now has to reason about. We scope capture to the application and process surface the engagement covers, so the agent records that an invoice was opened in the ERP, which fields were keyed, which validation fired, and how long the step took, without taking a picture of the desktop to get there. The unit of capture is an event with structure ("a line item was added to a purchase order," "a vendor record was looked up"), not a frame of pixels we'd have to interpret after the fact. We narrow the capture on purpose, because the redesign needs to know the sequence and frequency of steps far more than it needs to see the cell contents the clerk happened to be staring at.
Redaction at the endpoint, before anything leaves
Whatever the scoped surface does still pick up gets processed where it's captured. PII detection and redaction run at the point of capture, on the endpoint, so that names, account numbers, and other identifiers are detected and masked before any record crosses the network. The principle is that raw identifiers never leave the machine; what leaves is the redacted event. This matters for a reason that's more architectural than rhetorical: a redaction step that runs in the cloud means the raw data was already in the cloud, and at that point you're trusting a deletion policy instead of a network boundary. Doing it at the endpoint moves the guarantee from "we promise to delete it" to "it was never transmitted." It also costs us something, since on-device detection has to be conservative and will occasionally mask a field the redesign would have liked to read, but that trade goes the right way every time.
Minimum retention, raw data discarded after extraction
We keep the minimum the redesign requires, and no more. The capture agent's job is to produce a model of the process, which is a set of features: which steps happen, in what order, how often, with what exceptions, taking how long. Once those features are extracted, the raw screen data they were derived from has served its purpose and is discarded. The feature table is what we baseline against and redesign from; the underlying capture is not an asset we hold "in case." This is the same discipline that makes the discovery phase defensible, where the deliverable is a description of how the work actually runs rather than a stored copy of everything the clerk did, and it's the posture that lets us tell a security reviewer exactly what exists after the engagement ends, which is the model and not the recording.
Processing inside the customer's own tenant
The raw stream, for the brief window it exists before extraction, stays inside the customer's own environment. Processing happens in the customer's tenant or VPC over private networking, not in a shared multi-tenant store we operate, with encryption in transit and at rest throughout. A multi-tenant store is a category of risk a buyer's security team has to evaluate on our behalf, since it asks them to trust our isolation between their data and the next customer's. Keeping the processing inside their boundary removes that question entirely; their data is never commingled with anyone else's, because it never leaves a network they control. This is also why the shape of an aligned engagement puts the infrastructure inside the boundary the customer's own controls already cover, rather than asking them to extend trust to a boundary we operate.
What a security team can actually audit against
None of this is meaningful unless a reviewer can test it, so each choice maps to a control category they already use. The AICPA Trust Services Criteria that underpin a SOC 2 examination give the framework: scoped capture and endpoint redaction speak to confidentiality and to privacy; in-tenant processing over private networking with encryption everywhere speaks to security; the discipline of extracting features faithfully and discarding the rest speaks to processing integrity. A reviewer doesn't have to take our word for any of it, because each control is a thing you can observe in the architecture and the logs. For EU exposure the same design satisfies the principles in Article 5 of the GDPR, specifically purpose limitation, since we capture only for the redesign and use it for nothing else, and data minimization, since we keep the minimum that purpose requires and discard the rest. The federal baseline in the United States, the Electronic Communications Privacy Act, governs the monitoring of communications and turns on business purpose and consent; the legal mechanics of doing this lawfully are their own subject, and the behavioral question of whether watching feels like shadowing or like surveillance is a third, but the architecture here is what makes the legal and the behavioral answers credible rather than aspirational.
Why the leaner model is also the more useful one
A reasonable counter is that we're giving up signal: a full screen recording captures everything, and surely more data produces a better redesign than a redacted event stream does. There's a version of that worry worth taking seriously, because the long tail of edge cases is exactly where automation projects stall, and you do need to see the messy reality rather than the happy path. But the thing the redesign actually consumes is the structure of the work, the sequence and frequency and exception rate of the steps, and a screen recording leaves that structure implicit in hours of footage someone then has to watch and transcribe before any of it becomes usable. The redacted event stream is the structure, already extracted, which means the leaner capture is not a compromise we accept for the sake of privacy but the form the useful signal was going to take regardless. The model that's safest to hold and the model that's most useful to redesign from turn out to be the same model, which is the agreeable case where the defensible architecture and the good one don't pull against each other.