Red Team Assessment
CyberDoc's Red Team feature provides AI-powered autonomous penetration testing. A dual-model architecture pairs a PentestAgent (Claude Sonnet) for tool execution with an xAI Grok orchestrator for strategic reasoning. After the engagement completes, a multi-agent analysis team produces structured findings with CVSS-aligned severity, CVE research, attack chains, and threat intelligence.
How It Works
- Domain Verification — Users must verify ownership of the target domain via DNS TXT record or file upload before launching an engagement. Admins bypass this requirement.
- Launch Engagement — Select a target, playbook (Recon, Web, Network, or Full), and infrastructure mode. The request is proxied to the PentestAgent backend.
- AI Agent Loop — PentestAgent autonomously executes security tools, analyses results, and decides next steps via the MCP (Model Context Protocol) server. The agent runs until the task is complete or max iterations are reached.
- Multi-Agent Analysis — Raw pentest output is sent to an xAI Grok multi-agent team (4 or 16 agents) that produces structured findings with CVSS severity, recon data, attack chains, CVE research, and threat intelligence gathered from web and X/Twitter searches.
- Attack Chain Verification — Proposed attack chains can be executed directly against the target using automated Kali tool command mapping, with success validated via regex pattern matching.
- Artifacts & Report — All output files (exploit proofs, screenshots, Metasploit output) are collected as artifacts. A comprehensive HTML report is generated with executive summary, findings by severity, attack chains, and remediation recommendations.
Agent Architecture
The Red Team system uses a dual-model architecture with specialised roles:
| Role | Model | Purpose |
|---|---|---|
| PentestAgent | Claude Sonnet (claude-sonnet-4-20250514) | Tool execution, shell commands, browser interaction, structured note-taking |
| Orchestrator | xAI Grok (grok-4-1-fast-reasoning) | Strategic reasoning, task planning, adaptive decision-making |
| Analysis Team | xAI Grok Multi-Agent (grok-4.20-multi-agent) | Post-engagement structured analysis with 4 or 16 parallel agents |
Infrastructure Modes
| Mode | Backend | Tools | Max Iterations | Access |
|---|---|---|---|---|
| Standard | Docker container (App Runner) | PentestAgent + ProjectDiscovery tools (nmap, nuclei, subfinder, httpx, ffuf, nikto) | 60 | Business, Enterprise |
| Advanced | Docker container (App Runner) | Standard + Kali tools (metasploit, hydra, john, sqlmap, wpscan) | 80 | Enterprise only |
| Expert | Dedicated EC2 instance (Kali Linux) | Full Kali arsenal + SecLists wordlists, privileged network access, Cloudflare Tunnel | 100 | Enterprise only |
Playbooks
| Playbook | Focus | Typical Duration |
|---|---|---|
| Recon | Subdomain enumeration, port scanning, service fingerprinting, DNS configuration, exposed endpoints | 10–30 minutes |
| Web | OWASP Top 10, security headers, TLS config, cookie security, directory discovery, injection testing | 30–60 minutes |
| Network | Port/service enumeration, known CVEs, default credentials, network segmentation | 30–60 minutes |
| Full Red Team | All phases: Recon, Web, Network, then Report generation with remediation steps | 1–3 hours |
Agent Tools
The PentestAgent communicates via MCP (Model Context Protocol) and has access to these tool categories:
- terminal — Execute shell commands (nmap, nuclei, curl, sqlmap, metasploit, hydra, wpscan, ffuf, gobuster, etc.). Output truncated at 50K chars.
- browser — Playwright headless browser for web interaction (navigate, click, type, screenshot, extract links/forms).
- notes — Structured finding storage with category validation (credential, vulnerability, finding, artifact, recon, infrastructure, report). Persists across the engagement.
- web_search — Web search integration for OSINT and CVE lookup.
Multi-Agent Analysis
After the PentestAgent completes, raw output is analysed by an xAI Grok multi-agent team. The analysis produces a structured result with the following components:
| Component | Description |
|---|---|
| Findings | Vulnerabilities with CVSS-aligned severity, CWE classification, evidence, impact, and specific remediation steps |
| Recon Data | IP addresses, subdomains, open ports, and detected technologies |
| Attack Chains | Multi-step exploit paths with risk level, step-by-step actions, and overall impact assessment |
| CVE Research | Relevant CVEs with exploit-in-the-wild status and patch availability (via live web search) |
| Threat Intel | Recent threat discussions from web and X/Twitter searches about the target's technology stack |
| Positive Controls | Security measures the target has correctly implemented |
| Risk Rating | Overall risk level with justification |
Analysis can use either a 4-agent team (standard) or 16-agent team (deep analysis) and can be re-run on demand via the reanalyze endpoint.
Finding Severity
Findings use CVSS-aligned severity ratings assigned by the multi-agent analysis team:
| Severity | CVSS Range | Examples |
|---|---|---|
| Critical | 9.0+ | RCE, auth bypass, credential exposure, actively exploited CVEs |
| High | 7.0–8.9 | SQLi, stored XSS, exposed SSH, SSRF, file upload vulnerabilities |
| Medium | 4.0–6.9 | Missing security headers, user enumeration, outdated software, info disclosure CVEs |
| Low | 0.1–3.9 | Verbose error messages, directory listings, minor misconfigurations |
| Info | — | Informational only, not a vulnerability |
Attack Chain Verification
Attack chains identified by the multi-agent analysis can be verified by executing them directly against the target. The chain command mapper translates high-level chain steps into concrete Kali tool commands.
Supported Attack Patterns
- WordPress — User enumeration (wpscan), plugin scanning, XML-RPC brute force, credential testing, shell upload
- SQL Injection — Automated sqlmap execution with database enumeration
- Directory Brute Forcing — ffuf, dirb, gobuster with custom wordlists
- SSH Brute Force — Hydra with configurable wordlists (smart, custom, top1000, top10000 modes)
- Exploitation — Metasploit framework integration
- Lateral Movement — Post-exploitation and privilege escalation
Chain Execution Modes
- Exploit Chain — Automated execution of analysis-identified chains with templated commands, timeout management, and regex-based success validation
- Custom Chain — User-defined chains with custom parameters (usernames, passwords, target overrides, brute force mode selection)
- Adaptive Chain — Dynamic chain execution that adapts based on results from previous steps
Artifacts
Engagement artifacts (exploit proofs, tool output, screenshots, loot) are automatically collected from the PentestAgent backend and stored for review:
- Fetched from the backend
/rt/artifactsendpoint (bulk or individual) - Stored in KV with 90-day retention
- Tracked in the
engagement_artifactsdatabase table with file metadata - Downloadable via the artifacts API endpoints
Red Team Operator
The Red Team Operator is a voice and text AI agent interface for administrators to interactively manage engagements:
- Voice conversations via xAI voice API with real-time transcript
- Text-based chat with persistent conversation history
- Can programmatically create engagements, launch attack chains, and fetch findings
- Conversations are linked to engagements and stored in the
redteam_conversationstable - Admin-only access with unrestricted pentest safeguards
Domain Verification
Before launching a red team engagement, you must verify ownership of the target domain. Two methods are supported:
- DNS TXT Record — Add a TXT record to the domain with a generated verification token. Checked via Google DNS.
- File Upload — Place a file at
/.well-known/cyberdoc-verify.txtcontaining the token. Checked via HTTP fetch.
Once verified, the domain remains verified for future engagements. Admins bypass this requirement.
Engagement Lifecycle
- Launch — Create engagement with target, playbook, scope, and infrastructure mode
- Poll Status — Monitor progress (queued, running, complete, failed, cancelled)
- View Results — Retrieve structured findings, recon data, attack chains, and analysis
- Reanalyze — Re-run multi-agent analysis on existing results with updated prompts
- Chain Verification — Execute identified attack chains for proof-of-exploit
- Report — Generate branded HTML report for download or print
- Archive/Delete — Archive old engagements or permanently delete them
- Cancel — Abort a running engagement
Security Guardrails
Tool output from untrusted sources is filtered through prompt injection guardrails adapted from the CAI framework:
- 40+ regex patterns detecting instruction overrides, hidden commands, encoding tricks
- Unicode homograph normalization (Cyrillic/Greek to Latin)
- Content sanitization with security delimiters
- Prevents target servers from hijacking the agent via crafted responses
API Endpoints
All red team endpoints require authentication and are prefixed with /api/redteam. Business or Enterprise plan required (admins exempt).
Engagements
| Method | Endpoint | Description |
|---|---|---|
| POST | /api/redteam/launch | Start engagement (target, playbook, scope, mode) |
| GET | /api/redteam/status?id= | Poll engagement status and progress |
| GET | /api/redteam/result?id= | Get full results with structured findings and analysis |
| GET | /api/redteam/notes?id= | Get raw PentestAgent notes for an engagement |
| POST | /api/redteam/cancel | Cancel a running engagement |
| GET | /api/redteam/engagements | List workspace engagements |
| GET | /api/redteam/engagements/:id | Get single engagement details |
| POST | /api/redteam/engagement/:id/archive | Archive an engagement |
| POST | /api/redteam/engagement/:id/unarchive | Restore an archived engagement |
| DELETE | /api/redteam/engagement/:id | Permanently delete an engagement (admin only) |
Analysis & Chain Verification
| Method | Endpoint | Description |
|---|---|---|
| POST | /api/redteam/reanalyze/:id | Re-run multi-agent analysis on existing results |
| POST | /api/redteam/exploit-chain/:id | Execute an attack chain with automated Kali commands |
| POST | /api/redteam/custom-chain/:id | Execute a custom chain with user-defined parameters |
Artifacts & Reports
| Method | Endpoint | Description |
|---|---|---|
| GET | /api/redteam/artifacts/:id | List artifacts for an engagement |
| GET | /api/redteam/artifact/:id/:filename | Download a specific artifact file |
| GET | /api/redteam/report/:id | Generate branded HTML report |
Domain Verification
| Method | Endpoint | Description |
|---|---|---|
| POST | /api/redteam/verify-domain | Request a domain verification token |
| POST | /api/redteam/check-verification | Check domain verification status |
| GET | /api/redteam/domains | List verified domains for the workspace |
Operator & Conversations
| Method | Endpoint | Description |
|---|---|---|
| POST | /api/redteam/voice | Get ephemeral voice token for Red Team operator |
| POST | /api/redteam/conversation | Create or send message in an operator conversation |
| GET | /api/redteam/conversations | List operator conversations |
| DELETE | /api/redteam/conversation/:id | Delete a conversation |
Expert Instance Management
| Method | Endpoint | Description |
|---|---|---|
| POST | /api/redteam/expert/:action | Start, stop, or check status of Expert EC2 instance |
| POST | /api/redteam/expert-health | Check Expert instance readiness before launching |
Metrics (Admin)
| Method | Endpoint | Description |
|---|---|---|
| GET | /api/redteam/metrics | Engagement statistics and usage metrics (admin only) |