Methodology — LoxeAI

01 — Architecture

Evidence Tracer runs as a stateless Cloudflare Worker at Cloudflare's edge — TypeScript compiled to a single bundled script. There are no persistent VMs, no Kubernetes, no polling daemons sitting inside your cloud. Scan state and evidence are persisted to Cloudflare D1 (edge SQLite) for the duration of the 30-day retention window, then deleted.

01→Customer POSTs IAM Role ARN + ExternalId to the Worker

02→Worker calls STS AssumeRole for short-lived session credentials

03→Scan fans out across AWS services using SigV4-signed requests

04→Evidence items chunked & staged in D1 per scan

05→Claude (Sonnet 4.5) runs per-control analysis against structured prompts

06→Report assembled from control results, returned as HTML or JSON

07→Delete all evidence data 30 days after report delivery

No long-lived AWS credentials. No write permissions. No installation in your account beyond the read-only IAM role you provision via CloudFormation.

02 — Collection

The agent makes direct AWS API calls using AWS Signature Version 4 hand-rolled signing — no heavy SDK dependencies, minimal surface area. Each service is called in parallel where possible, serialized where AWS requires it (for example, paginating IAM principals before querying their MFA devices).

IAM

GetAccountSummary
ListUsers
ListMFADevices
GenerateCredentialReport

ListBuckets
GetBucketEncryption
GetBucketPolicy
GetPublicAccessBlock

CloudTrail

DescribeTrails
GetTrailStatus
GetEventSelectors

AWS Config

DescribeConfigRules
GetComplianceSummary
DescribeConfigurationRecorders

EC2

DescribeVolumes
DescribeSecurityGroups
DescribeInstances

CloudWatch

DescribeAlarms
ListMetrics
DescribeLogGroups

Coverage extends to KMS, Lambda, RDS, and SNS for specific control dependencies. Raw API responses — XML or JSON — are truncated where necessary for context-window efficiency, then chunked into D1 as discrete evidence items with their source endpoint, timestamp, and a content hash.

03 — Control mapping

Each SOC 2 Type I control is mapped deterministically — not learned, not inferred — to the specific AWS services and API response fields it depends on. This is proprietary schema work, and it's the thing that gets sharper with every pilot.

CC6.1 Logical access controls. Sources: iam:GetAccountSummary, iam:ListUsers, MFA device enumeration, KMS key rotation status.

CC6.3 Role-based access. Sources: iam:ListRoles, policy attachment graph, service-role scoping evidence.

CC6.6 Encryption at rest. Sources: s3:GetBucketEncryption, ec2:DescribeVolumes, rds:DescribeDBInstances encryption status.

CC6.7 Transmission & disposal. Sources: EBS volume encryption state, bucket-level public access blocks, TLS configuration evidence.

CC7.1 Baseline configuration. Sources: config:DescribeConfigRules, compliance summary, configuration recorder status.

CC7.2 System monitoring. Sources: cloudtrail:DescribeTrails, multi-region flag, log-file validation, event selectors.

CC7.3 Anomaly detection. Sources: cloudwatch:DescribeAlarms, critical alarm coverage, log-group retention.

CC8.1 Change management. Sources: CloudTrail management events, Config rule evaluation history, IaC evidence where present.

Because the mapping is deterministic, every gap finding traces directly back to a specific API call and response field. There is no "the AI decided" — there is "this API call returned Encrypted: false, here it is."

04 — The reasoning layer

Raw AWS evidence is structured and passed to Claude Sonnet 4.5 (Anthropic's production reasoning model) with a per-control analysis prompt. Claude is not making the compliance decision from scratch — it's interpreting structured evidence against an explicit prompt contract that specifies the control requirements, severity weighting, and output schema. It returns structured JSON: gap findings, severity labels, and proposed remediation.

We use Claude because the interpretive work — distinguishing a technically-present-but-misconfigured password policy from a genuinely compliant one, recognizing when a CloudTrail trail exists but has a delivery gap — is brittle with pure pattern matching. Claude handles ambiguity that rule engines can't, and the structured output contract keeps its responses inside predictable bounds.

Every finding Claude returns is anchored to the source evidence item in D1, which means every claim in the final report is reproducible: the auditor can see the exact API response Claude reasoned over.

05 — Scoring

Gap Score (0–100)

Percentage of assessed control checkpoints meeting SOC 2 best-practice thresholds for each criterion. Checkpoints are weighted by audit-risk severity, applied to a baseline of 100.

CRITICAL −25 HIGH −15 MEDIUM −5

80+

Low audit risk

60–80

Moderate · likely findings

<60

High · remediate first

A score above 80 indicates low audit risk for that control; 60–80 indicates moderate risk with likely auditor findings; below 60 indicates high audit risk that should be remediated before engaging an auditor.

Freshness Score (0–100)

Measures recency of security configurations against SOC 2 time-based thresholds. Scored independently from Gap Score — a control can be fully configured but use stale credentials, or actively monitored but misconfigured.

Inputs:

— IAM access key age relative to the 90-day rotation threshold
— Credential last-used dates for stale-account detection
— CloudTrail log delivery recency
— AWS Config rule last-evaluation timestamps

A score below 70 indicates configurations that auditors commonly flag for access-key management and monitoring gaps.

06 — Traceability

The whole point of the system. Every finding in the final report is anchored to:

— The exact AWS API endpoint called (e.g. iam.amazonaws.com/GetAccountSummary)
— The request timestamp in ISO 8601 UTC
— The raw response body (truncated but preserved)
— A SHA-256 hash of the evidence item for tamper-evidence
— The AWS region the call was issued against

Your auditor can take any finding from the report and re-run the exact API call themselves, from their own machine, with their own credentials. If the AWS response at their end doesn't match ours within a reasonable drift window, something's wrong — and they'll see it immediately. That's what verifiable means.

— An honesty note

What we're not yet claiming.

The compliance-software category is noisy. Everyone claims 99% accuracy, trained-on-millions-of-audits, proprietary ML, the lot. Some of it is real; most of it is marketing. We'd rather tell you what we don't have than pretend.

No trained ML model. There is no proprietary transformer, no graph neural network, no anomaly detector trained on 10,000 audit cycles. The reasoning layer is Claude — a third-party model we send structured prompts to. That's more honest and, frankly, more accurate than anything a small team could train today.
No published accuracy number. We've seen "94% accurate" on competitor sites. Without a benchmark dataset and published methodology, that number means nothing. We'll publish one when we have pilots audited by independent firms who can validate it.
No zero-persistence claim. We store evidence in Cloudflare D1 during the 30-day retention window. "Zero data retention" on a compliance tool is usually cosmetic — the reports have to come from somewhere. We keep what's necessary; we delete it on schedule.
Not a Type II continuous-monitoring tool yet. EVT today is a point-in-time Type I scan. Continuous monitoring and Type II coverage are on the roadmap, not shipped.

We'd rather earn the bigger claim than make it.

Pressure-test the methodology. Talk to the founder.

Book a call →