comparison
ProbeSix and AWS Bedrock Automated Reasoning: Better Together
Automated Reasoning checks that what your model says is consistent with the domain policy you define. ProbeSix checks that your system holds up when someone is trying to break it. Different jobs, both needed.
Automated Reasoning checks that what your model says is consistent with the domain policy you define.
ProbeSix checks that your system holds up when someone is trying to break it.
Different jobs, both needed.
Together they let you adopt Bedrock with the evidence to back it up.
How they differ
| AWS Bedrock Automated Reasoning | ProbeSix | |
|---|---|---|
| What it answers | Is this output consistent with my defined domain policy? | Will this system hold up under adversarial attack? |
| Axis | Correctness / soundness | Security & robustness |
| Method | Formal logic / symbolic verification of outputs against a defined domain policy | Attack harness: jailbreak, prompt injection, adversarial conversational attacks, cross-lingual (30 languages), encoding attacks |
| Where it sits | Runtime guardrail on the live model | On-demand, repeatable assurance against the deployed (or staging) system |
| Scope | Output-level correctness, per request | System-level behaviour: adversarial testing of the deployed model/endpoint, including OWASP Agentic Top 10 coverage |
| Output | Validation result vs policy; hallucinations and ambiguity flagged at runtime | Scored report mapped to framework references: OWASP LLM/Agentic, MITRE ATLAS, NIST AI RMF, EU AI Act, ISO 42001 |
| Re-test | Per-request verification at runtime | Re-runnable scans: replay the same config and starting prompts, retest failed-only or passed-only, before and after comparison of findings and scores |
Adversarial conversational attacks run as a sustained back-and-forth: 5 to 20 exchanges (turns), each one adapting to the model's last reply.
Compliance and audit evidence
ProbeSix tests Bedrock endpoints natively via cross-account role-assumption with an external ID (no shared keys), executed in isolated, governed infrastructure. Every scan produces audit-ready evidence mapped to control references.
Scope and limits (today)
Today ProbeSix tests Bedrock (via cross-account role-assumption) and public HTTPS/REST endpoints, with Bedrock Guardrails tested on the model as deployed. Scan data is stored in eu-west-2 (London). Deeper guardrail testing (run-then-compare and ApplyGuardrail simulation) is on the roadmap. See the capability data sheet for full scope, data residency and roadmap.
Better together
ProbeSix puts the model as deployed behind your Bedrock guardrails (Automated Reasoning included) under adversarial pressure and hands back audit-ready evidence.
AWS positions Bedrock Automated Reasoning (a Bedrock Guardrails policy, generally available August 2025) as verifying the factual accuracy of model outputs at runtime using formal logic, citing up to 99% verification accuracy. That is the correctness axis. ProbeSix is the complementary security-and-robustness axis.