← All resources

comparison

ProbeSix and AWS Bedrock Automated Reasoning: Better Together

Automated Reasoning checks that what your model says is consistent with the domain policy you define. ProbeSix checks that your system holds up when someone is trying to break it. Different jobs, both needed.

Updated 16 Jun 2026

Automated Reasoning checks that what your model says is consistent with the domain policy you define.
ProbeSix checks that your system holds up when someone is trying to break it.
Different jobs, both needed.

Together they let you adopt Bedrock with the evidence to back it up.

How they differ

AWS Bedrock Automated ReasoningProbeSix
What it answersIs this output consistent with my defined domain policy?Will this system hold up under adversarial attack?
AxisCorrectness / soundnessSecurity & robustness
MethodFormal logic / symbolic verification of outputs against a defined domain policyAttack harness: jailbreak, prompt injection, adversarial conversational attacks, cross-lingual (30 languages), encoding attacks
Where it sitsRuntime guardrail on the live modelOn-demand, repeatable assurance against the deployed (or staging) system
ScopeOutput-level correctness, per requestSystem-level behaviour: adversarial testing of the deployed model/endpoint, including OWASP Agentic Top 10 coverage
OutputValidation result vs policy; hallucinations and ambiguity flagged at runtimeScored report mapped to framework references: OWASP LLM/Agentic, MITRE ATLAS, NIST AI RMF, EU AI Act, ISO 42001
Re-testPer-request verification at runtimeRe-runnable scans: replay the same config and starting prompts, retest failed-only or passed-only, before and after comparison of findings and scores

Adversarial conversational attacks run as a sustained back-and-forth: 5 to 20 exchanges (turns), each one adapting to the model's last reply.

Compliance and audit evidence

ProbeSix tests Bedrock endpoints natively via cross-account role-assumption with an external ID (no shared keys), executed in isolated, governed infrastructure. Every scan produces audit-ready evidence mapped to control references.

Scope and limits (today)

Today ProbeSix tests Bedrock (via cross-account role-assumption) and public HTTPS/REST endpoints, with Bedrock Guardrails tested on the model as deployed. Scan data is stored in eu-west-2 (London). Deeper guardrail testing (run-then-compare and ApplyGuardrail simulation) is on the roadmap. See the capability data sheet for full scope, data residency and roadmap.

Better together

ProbeSix puts the model as deployed behind your Bedrock guardrails (Automated Reasoning included) under adversarial pressure and hands back audit-ready evidence.

AWS positions Bedrock Automated Reasoning (a Bedrock Guardrails policy, generally available August 2025) as verifying the factual accuracy of model outputs at runtime using formal logic, citing up to 99% verification accuracy. That is the correctness axis. ProbeSix is the complementary security-and-robustness axis.