Supports Bedrock, custom APIs, multiple models
Automated jailbreak and red-team attack simulation
Maps to ISO 42001, NIST AI Risk Management, MITRE ATLAS, OWASP LLM Top 10 and EU AI Act frameworks
Actionable insights with clear remediation guidance
Connect your Custom API or AWS Bedrock endpoint in seconds
Choose from ISO 42001, NIST AI Risk Management, MITRE ATLAS, OWASP LLM Top 10 and EU AI Act templates
Get detailed findings with clear remediation guidance
Explore a sample security assessment report. Toggle between free and full versions to see the difference.
Free tier: See vulnerability summary and risk scores. Detailed test examples are locked.
Acme Corporation • 2/3/2026
PII Leak
Baseline Testing, Multi-Vector Safety Bypass, Single-shot Optimisation
SSRF Enforcement
Baseline Testing, Multi-Vector Safety Bypass, Single-shot Optimisation
Imitation
Baseline Testing, Multi-Vector Safety Bypass, Single-shot Optimisation
SQL Injection
Baseline Testing, Multi-Vector Safety Bypass, Single-shot Optimisation
Hallucination
Baseline Testing, Multi-Vector Safety Bypass, Single-shot Optimisation
Misinformation & Disinformation Harmful lies and propaganda
Baseline Testing, Multi-Vector Safety Bypass, Single-shot Optimisation
Shell Injection
Baseline Testing, Multi-Vector Safety Bypass, Single-shot Optimisation
Hate
Baseline Testing, Multi-Vector Safety Bypass, Single-shot Optimisation
Overreliance
Baseline Testing, Multi-Vector Safety Bypass, Single-shot Optimisation
Excessive Agency
Baseline Testing, Multi-Vector Safety Bypass, Single-shot Optimisation
Privacy violations
Baseline Testing, Multi-Vector Safety Bypass, Single-shot Optimisation
| Category | Severity | Failed |
|---|---|---|
PII Leak Baseline Testing, Multi-Vector Safety Bypass, Single-shot Optimisation | Critical | 36/210 |
SSRF Enforcement Baseline Testing, Multi-Vector Safety Bypass, Single-shot Optimisation | Critical | 32/70 |
Imitation Baseline Testing, Multi-Vector Safety Bypass, Single-shot Optimisation | Critical | 29/70 |
SQL Injection Baseline Testing, Multi-Vector Safety Bypass, Single-shot Optimisation | Critical | 35/70 |
Hallucination Baseline Testing, Multi-Vector Safety Bypass, Single-shot Optimisation | Critical | 20/70 |
Misinformation & Disinformation Harmful lies and propaganda Baseline Testing, Multi-Vector Safety Bypass, Single-shot Optimisation | Critical | 33/70 |
Shell Injection Baseline Testing, Multi-Vector Safety Bypass, Single-shot Optimisation | High | 11/70 |
Hate Baseline Testing, Multi-Vector Safety Bypass, Single-shot Optimisation | High | 13/70 |
Overreliance Baseline Testing, Multi-Vector Safety Bypass, Single-shot Optimisation | High | 30/70 |
Excessive Agency Baseline Testing, Multi-Vector Safety Bypass, Single-shot Optimisation | High | 12/70 |
Privacy violations Baseline Testing, Multi-Vector Safety Bypass, Single-shot Optimisation | Medium | 3/70 |
This report uses two complementary severity measures. The vulnerability breakdown counts individual failed tests by their inherent threat level — how dangerous each specific attack type is regardless of how often it succeeded.
The security findingstable rates each category using a calculated risk score (0–10) that factors in the attack success rate, impact, and exploit complexity. Each category carries a weighting based on the potential impact of a compromise in that area — for instance, a data exfiltration vulnerability is weighted more heavily than a minor content policy violation. A high failure rate combined with a high impact weighting elevates a category's risk score even when individual tests are low-severity, because the volume of successful attacks poses a significant cumulative risk.
Risk score thresholds: 7.5+ Critical, 5.0+ High, 2.5+ Medium, below 2.5 Low.
Automated security testing purpose-built for the age of large language models.
Organisations are integrating LLMs and AI-powered endpoints into production faster than ever but most go live without any structured security testing. Traditional application security tools were never designed to catch AI-specific vulnerabilities like prompt injection, hallucination or unsafe content generation.
Probe Six fills that gap. We provide automated, evidence-based security assessments that test how models actually behave under adversarial conditions — not just whether they have the right configuration but whether they produce safe, accurate, and compliant outputs when challenged.
Every scan produces a structured report with real failed-test evidence, risk scores and actionable remediation guidance your team can act on immediately.
Probe Six is built by djinn six ltd, a London-based security consultancy and AWS Partner that combines compliance expertise with innovation. We specialise in securing AWS infrastructure, responsible AI and quantum readiness for regulated industries.
We built Probe Six because we saw first-hand how difficult it was for security teams to assess the risks introduced by LLM integrations using existing tooling. Our assessments map directly to the frameworks that matter: ISO 42001, NIST AI Risk Management Framework, OWASP LLM Top 10, MITRE ATLAS and the EU AI Act.
Whether you need to satisfy internal governance or demonstrate AI-specific compliance to regulators, Probe Six gives you the evidence. We believe security testing for AI should be rigorous, repeatable and accessible to every organisation — not just those with dedicated red teams.
Start free with 5 scans per month. Subscribe for more scans with full detailed reports included, or upgrade individual reports for £100 + VAT.
5 scans per month
Upgrade any report to full detail for £100 + VAT
15 scans per month
Need help understanding your findings? Book a 1-hour session with a djinn six security consultant directly from your report. £250 + VAT
Free Forever: 5 scans per month · Unlimited endpoints · Summary report included