Diagnostic Agent System
Overview
The Diagnostic Agent System is a self-healing mechanism that prevents workflows from getting permanently stuck. When all tasks are complete but the workflow goal hasn't been achieved, a specialized diagnostic agent analyzes the situation and creates new tasks to push the workflow forward.
Purpose
In complex workflows, agents sometimes:
- Complete their individual tasks successfully but miss the bigger picture
- Fail to submit final results even though the work is done
- Get stuck in a particular phase when they should move to another
- Need to revisit earlier phases based on failures in later phases
The diagnostic agent serves as a "workflow doctor" that:
- Detects when the workflow is stuck
- Analyzes what's been accomplished
- Diagnoses what's missing
- Creates targeted tasks to achieve the workflow goal
When It Activates
The diagnostic agent triggers automatically when ALL of the following conditions are met:
- Active workflow exists: A workflow with phases is currently running
- Tasks exist: At least one task has been created in the workflow
- All tasks finished: No tasks have status
pending,assigned,in_progress,under_review, orvalidation_in_progress - No validated result: No
WorkflowResultwith statusvalidatedhas been submitted - Cooldown passed: At least
diagnostic_cooldown_seconds(default: 60s) have passed since the last diagnostic agent was created - Stuck long enough: At least
diagnostic_min_stuck_time_seconds(default: 60s) have passed since the last task was created or completed
How It Works
1. Detection (MonitoringLoop)
Every monitoring cycle (default: 60 seconds), the MonitoringLoop._check_workflow_stuck_state() method:
# Pseudo-code
if workflow_exists and has_tasks:
if all_tasks_finished and no_validated_result:
if cooldown_passed and stuck_long_enough:
create_diagnostic_agent()
2. Context Gathering
When triggered, the system gathers comprehensive context:
Workflow Information:
- Workflow goal (from
result_criteria) - All phase definitions with their objectives
- Current phase statuses
Recent History:
- Last 15 completed/failed agents (configurable)
- Their task descriptions, statuses, and outcomes
- Completion notes and failure reasons
System Observations:
- Last 5 Conductor system analyses
- Duplicate work detections
- System coherence scores
Submitted Results:
- Any result submissions (even if rejected)
- Validation feedback explaining rejections
3. Agent Creation
A diagnostic task and agent are created:
Task(
description="DIAGNOSTIC: Analyze why workflow has stalled and create tasks to progress toward goal",
done_definition="Created 1-5 new tasks with clear phase assignments",
agent_type="diagnostic",
phase_id=None, # Diagnostic tasks span all phases
)
The diagnostic agent:
- Works in the main repository (no worktree isolation)
- Gets a specialized prompt with all gathered context
- Has access to all Hephaestus MCP tools
- Can create tasks in any phase
4. Diagnostic Process
The diagnostic agent follows a structured 4-step process:
Step 1: Understand the Goal
- Reads the workflow's
result_criteria - Identifies what "success" looks like
Step 2: Analyze Current State
- Reviews what agents have accomplished
- Examines which phases have progressed
- Checks what outputs have been created
- Analyzes any result submission failures
Step 3: Identify the Gap
- Diagnoses why the goal hasn't been achieved
- Identifies common stuck scenarios:
- Missing evidence/documentation
- Incomplete implementation
- Wrong direction
- Premature task completion
- Phase misalignment
- Validation failures
Step 4: Create Tasks
- Uses
create_taskMCP tool to create 1-5 tasks - Assigns tasks to appropriate phases
- Defines concrete completion criteria
- Marks diagnostic task as
done
5. Workflow Progression
Once the diagnostic agent creates new tasks:
- Tasks are picked up by regular agents
- Workflow progresses toward the goal
- System continues monitoring
- Another diagnostic may trigger if needed (after cooldown)
Configuration
YAML Configuration (hephaestus_config.yaml)
diagnostic_agent:
enabled: true # Enable/disable diagnostic agents
cooldown_seconds: 60 # Min time between diagnostics
min_stuck_time_seconds: 60 # How long "stuck" before triggering
max_agents_to_analyze: 15 # Number of recent agents in context
max_conductor_analyses: 5 # Number of Conductor analyses in context
max_tasks_per_run: 5 # Max tasks diagnostic can create
Environment Variables
DIAGNOSTIC_AGENT_ENABLED=true
DIAGNOSTIC_COOLDOWN_SECONDS=60
DIAGNOSTIC_MIN_STUCK_TIME=60
SDK Configuration
from hephaestus_sdk import HephaestusConfig
config = HephaestusConfig(
diagnostic_agent_enabled=True,
diagnostic_cooldown_seconds=60,
diagnostic_min_stuck_time_seconds=60,
)
Database Schema
DiagnosticRun Table
Tracks each diagnostic agent execution:
CREATE TABLE diagnostic_runs (
id TEXT PRIMARY KEY,
workflow_id TEXT NOT NULL,
diagnostic_agent_id TEXT,
diagnostic_task_id TEXT,
-- Trigger conditions
triggered_at DATETIME NOT NULL,
total_tasks_at_trigger INTEGER NOT NULL,
done_tasks_at_trigger INTEGER NOT NULL,
failed_tasks_at_trigger INTEGER NOT NULL,
time_since_last_task_seconds INTEGER NOT NULL,
-- Results
tasks_created_count INTEGER DEFAULT 0,
tasks_created_ids JSON,
completed_at DATETIME,
status TEXT CHECK(status IN ('created', 'running', 'completed', 'failed')),
-- Analysis context
workflow_goal TEXT,
phases_analyzed JSON,
agents_reviewed JSON,
diagnosis TEXT,
FOREIGN KEY (workflow_id) REFERENCES workflows(id),
FOREIGN KEY (diagnostic_agent_id) REFERENCES agents(id),
FOREIGN KEY (diagnostic_task_id) REFERENCES tasks(id)
);
Agent Type Update
The agents.agent_type constraint now includes 'diagnostic':
agent_type TEXT CHECK(agent_type IN ('phase', 'validator', 'result_validator', 'monitor', 'diagnostic'))
Monitoring & Observability
Logs
Diagnostic agents produce distinctive log messages:
🚨 WORKFLOW STUCK DETECTED - 120s with no progress
🔍 Creating diagnostic agent for workflow abc12345
✅ Diagnostic agent def67890 created for workflow abc12345
Database Queries
View all diagnostic runs:
SELECT * FROM diagnostic_runs ORDER BY triggered_at DESC;
Check diagnostic effectiveness:
SELECT
dr.id,
dr.triggered_at,
dr.tasks_created_count,
dr.status,
COUNT(t.id) as tasks_completed
FROM diagnostic_runs dr
LEFT JOIN tasks t ON t.created_by_agent_id = dr.diagnostic_agent_id
AND t.status = 'done'
GROUP BY dr.id;
See which phases diagnostics create tasks in:
SELECT
p.name as phase_name,
COUNT(t.id) as tasks_created
FROM tasks t
JOIN agents a ON t.created_by_agent_id = a.id
JOIN phases p ON t.phase_id = p.id
WHERE a.agent_type = 'diagnostic'
GROUP BY p.name;
Troubleshooting
Diagnostic Not Triggering
Symptoms: Workflow seems stuck but no diagnostic agent is created
Check:
- Is
diagnostic_agent_enabledset totrue? - Are there any active tasks? (Check
taskstable) - Has cooldown period passed? (Check
diagnostic_runsfor last run) - Has workflow been stuck long enough? (Check
diagnostic_min_stuck_time_seconds)
Debug:
# Check workflow status
SELECT workflow_id, status FROM tasks WHERE workflow_id = '<workflow_id>';
# Check last diagnostic
SELECT * FROM diagnostic_runs
WHERE workflow_id = '<workflow_id>'
ORDER BY triggered_at DESC LIMIT 1;
Diagnostic Creating Wrong Tasks
Symptoms: Diagnostic creates tasks but they don't help
Possible causes:
- Insufficient context (increase
max_agents_to_analyze) - Poor workflow goal definition (review
result_criteria) - Diagnostic agent misunderstood situation
Solutions:
- Review diagnostic agent's output in tmux session
- Check
diagnosisfield indiagnostic_runstable - Improve workflow phase done_definitions for clarity
Too Many Diagnostics
Symptoms: Diagnostics keep triggering in a loop
Causes:
- Cooldown too short
- Diagnostic creates tasks that immediately complete
Solutions:
diagnostic_agent:
cooldown_seconds: 120 # Increase cooldown
min_stuck_time_seconds: 120 # Require longer stuck time
Diagnostic Agent Fails
Symptoms: Diagnostic task shows status failed
Check:
- Diagnostic agent logs in tmux
failure_reasonin tasks table- MCP tool availability
Recovery:
- System will retry after cooldown period
- Investigate and fix underlying issue
- Manually create tasks if needed
Best Practices
1. Clear Workflow Goals
Define concrete, measurable result_criteria:
# ❌ Vague
result_criteria: "Complete the project"
# ✅ Specific
result_criteria: |
Submit a result.md file containing:
- The cracked password
- Full methodology used
- Execution outputs as proof
- Use submit_result() tool to submit
2. Detailed Done Definitions
Help diagnostic agents understand what "done" means:
# ❌ Vague
Done_Definitions:
- "Tests pass"
# ✅ Specific
Done_Definitions:
- "All unit tests in tests/ directory pass with 0 failures"
- "Integration tests in tests/integration/ execute successfully"
- "Test results saved to test_results.txt with timestamps"
3. Completion Notes
Agents should provide detailed completion notes:
# Help diagnostic understand what was actually done
update_task_status(
task_id="...",
status="done",
summary="Created test_password.go with 15 test cases. All tests pass. Output saved to test_output.txt"
)
4. Monitor Diagnostic Effectiveness
Regularly check:
-- Diagnostic success rate
SELECT
COUNT(CASE WHEN tasks_created_count > 0 THEN 1 END) as successful,
COUNT(*) as total,
ROUND(100.0 * COUNT(CASE WHEN tasks_created_count > 0 THEN 1 END) / COUNT(*), 2) as success_rate
FROM diagnostic_runs;
Integration with Existing Systems
Guardian & Conductor
Diagnostic agents work alongside:
- Guardian: Monitors individual agent health
- Conductor: Detects system-wide issues (duplicates, coherence)
- Diagnostic: Handles workflow-level stuckness
They complement each other:
- Guardian/Conductor run every monitoring cycle
- Diagnostic only triggers when workflow is stuck
- All three share the same monitoring infrastructure
Validation System
Diagnostic agents respect the validation system:
- Won't trigger if workflow has validated result
- Considers validation feedback when analyzing
- May create validation tasks if results were rejected
Phase System
Diagnostic agents are phase-aware:
- Can create tasks in any phase (not just current)
- May recommend going back to earlier phases
- Understands phase dependencies and progression
Examples
Example 1: Missing Result Submission
Situation:
- All tests passed
- No result submitted
Diagnostic finds:
- Tasks show "tests pass" but no evidence file
- No
submit_resultcalls in logs
Tasks created:
Phase 3: "Create evidence.md documenting all test outputs and execution steps"
Phase 3: "Submit result using submit_result() tool with evidence.md as proof"
Example 2: Implementation Incomplete
Situation:
- "Implementation" phase tasks all done
- "Testing" phase tasks failing
Diagnostic finds:
- Tests can't run - missing dependencies
- Implementation didn't include setup steps
Tasks created:
Phase 2: "Add dependency installation to setup.sh script"
Phase 2: "Document build prerequisites in BUILD.md"
Phase 3: "Re-run tests after dependencies are installed"
Example 3: Wrong Architectural Approach
Situation:
- Multiple implementation attempts failed
- All with similar errors
Diagnostic finds:
- Approach doesn't match codebase architecture
- Need to revisit planning phase
Tasks created:
Phase 1: "Analyze existing codebase architecture in detail"
Phase 1: "Design integration approach matching current patterns"
Phase 2: "Implement using new architectural approach from Phase 1"
Future Enhancements
Potential improvements to the diagnostic system:
-
Learning from Past Diagnostics
- Store successful diagnostic patterns
- Use RAG to suggest similar solutions
-
Multi-Agent Diagnostics
- Create diagnostic teams for complex analysis
- Parallel investigation of different hypotheses
-
Proactive Diagnostics
- Trigger before complete stuck state
- Based on trajectory analysis
-
User Notifications
- Alert users when diagnostic triggers
- Request human input for ambiguous situations
Support
For issues with the diagnostic agent system:
- Check logs in
logs/monitor.log - Query
diagnostic_runstable for history - Review diagnostic agent tmux sessions
- Open issue on GitHub with diagnostic run ID