01 — Architecture
Evidence Tracer runs as a stateless Cloudflare Worker at Cloudflare's edge — TypeScript compiled to a single bundled script. There are no persistent VMs, no Kubernetes, no polling daemons sitting inside your cloud. Scan state and evidence are persisted to Cloudflare D1 (edge SQLite) for the duration of the 30-day retention window, then deleted.
No long-lived AWS credentials. No write permissions. No installation in your account beyond the read-only IAM role you provision via CloudFormation.
02 — Collection
The agent makes direct AWS API calls using AWS Signature Version 4 hand-rolled signing — no heavy SDK dependencies, minimal surface area. Each service is called in parallel where possible, serialized where AWS requires it (for example, paginating IAM principals before querying their MFA devices).
GetAccountSummaryListUsersListMFADevicesGenerateCredentialReportListBucketsGetBucketEncryptionGetBucketPolicyGetPublicAccessBlockDescribeTrailsGetTrailStatusGetEventSelectorsDescribeConfigRulesGetComplianceSummaryDescribeConfigurationRecordersDescribeVolumesDescribeSecurityGroupsDescribeInstancesDescribeAlarmsListMetricsDescribeLogGroupsCoverage extends to KMS, Lambda, RDS, and SNS for specific control dependencies. Raw API responses — XML or JSON — are truncated where necessary for context-window efficiency, then chunked into D1 as discrete evidence items with their source endpoint, timestamp, and a content hash.
03 — Control mapping
Each SOC 2 Type I control is mapped deterministically — not learned, not inferred — to the specific AWS services and API response fields it depends on. This is proprietary schema work, and it's the thing that gets sharper with every pilot.
iam:GetAccountSummary, iam:ListUsers, MFA device enumeration, KMS key rotation status.
iam:ListRoles, policy attachment graph, service-role scoping evidence.
s3:GetBucketEncryption, ec2:DescribeVolumes, rds:DescribeDBInstances encryption status.
config:DescribeConfigRules, compliance summary, configuration recorder status.
cloudtrail:DescribeTrails, multi-region flag, log-file validation, event selectors.
cloudwatch:DescribeAlarms, critical alarm coverage, log-group retention.
Because the mapping is deterministic, every gap finding traces directly back to a specific API call and response field. There is no "the AI decided" — there is "this API call returned Encrypted: false, here it is."
04 — The reasoning layer
Raw AWS evidence is structured and passed to Claude Sonnet 4.5 (Anthropic's production reasoning model) with a per-control analysis prompt. Claude is not making the compliance decision from scratch — it's interpreting structured evidence against an explicit prompt contract that specifies the control requirements, severity weighting, and output schema. It returns structured JSON: gap findings, severity labels, and proposed remediation.
We use Claude because the interpretive work — distinguishing a technically-present-but-misconfigured password policy from a genuinely compliant one, recognizing when a CloudTrail trail exists but has a delivery gap — is brittle with pure pattern matching. Claude handles ambiguity that rule engines can't, and the structured output contract keeps its responses inside predictable bounds.
Every finding Claude returns is anchored to the source evidence item in D1, which means every claim in the final report is reproducible: the auditor can see the exact API response Claude reasoned over.
05 — Scoring
Gap Score (0–100)
Percentage of assessed control checkpoints meeting SOC 2 best-practice thresholds for each criterion. Checkpoints are weighted by audit-risk severity, applied to a baseline of 100.
A score above 80 indicates low audit risk for that control; 60–80 indicates moderate risk with likely auditor findings; below 60 indicates high audit risk that should be remediated before engaging an auditor.
Freshness Score (0–100)
Measures recency of security configurations against SOC 2 time-based thresholds. Scored independently from Gap Score — a control can be fully configured but use stale credentials, or actively monitored but misconfigured.
Inputs:
- — IAM access key age relative to the 90-day rotation threshold
- — Credential last-used dates for stale-account detection
- — CloudTrail log delivery recency
- — AWS Config rule last-evaluation timestamps
A score below 70 indicates configurations that auditors commonly flag for access-key management and monitoring gaps.
06 — Traceability
The whole point of the system. Every finding in the final report is anchored to:
-
—
The exact AWS API endpoint called (e.g.
iam.amazonaws.com/GetAccountSummary) - — The request timestamp in ISO 8601 UTC
- — The raw response body (truncated but preserved)
- — A SHA-256 hash of the evidence item for tamper-evidence
- — The AWS region the call was issued against
Your auditor can take any finding from the report and re-run the exact API call themselves, from their own machine, with their own credentials. If the AWS response at their end doesn't match ours within a reasonable drift window, something's wrong — and they'll see it immediately. That's what verifiable means.
What we're not yet claiming.
The compliance-software category is noisy. Everyone claims 99% accuracy, trained-on-millions-of-audits, proprietary ML, the lot. Some of it is real; most of it is marketing. We'd rather tell you what we don't have than pretend.
- No trained ML model. There is no proprietary transformer, no graph neural network, no anomaly detector trained on 10,000 audit cycles. The reasoning layer is Claude — a third-party model we send structured prompts to. That's more honest and, frankly, more accurate than anything a small team could train today.
- No published accuracy number. We've seen "94% accurate" on competitor sites. Without a benchmark dataset and published methodology, that number means nothing. We'll publish one when we have pilots audited by independent firms who can validate it.
- No zero-persistence claim. We store evidence in Cloudflare D1 during the 30-day retention window. "Zero data retention" on a compliance tool is usually cosmetic — the reports have to come from somewhere. We keep what's necessary; we delete it on schedule.
- Not a Type II continuous-monitoring tool yet. EVT today is a point-in-time Type I scan. Continuous monitoring and Type II coverage are on the roadmap, not shipped.
We'd rather earn the bigger claim than make it.