Diagnostic Agent System

Overview

The Diagnostic Agent System is a self-healing mechanism that prevents workflows from getting permanently stuck. When all tasks are complete but the workflow goal hasn't been achieved, a specialized diagnostic agent analyzes the situation and creates new tasks to push the workflow forward.

Purpose

In complex workflows, agents sometimes:

Complete their individual tasks successfully but miss the bigger picture
Fail to submit final results even though the work is done
Get stuck in a particular phase when they should move to another
Need to revisit earlier phases based on failures in later phases

The diagnostic agent serves as a "workflow doctor" that:

Detects when the workflow is stuck
Analyzes what's been accomplished
Diagnoses what's missing
Creates targeted tasks to achieve the workflow goal

When It Activates

The diagnostic agent triggers automatically when ALL of the following conditions are met:

Active workflow exists: A workflow with phases is currently running
Tasks exist: At least one task has been created in the workflow
All tasks finished: No tasks have status pending, assigned, in_progress, under_review, or validation_in_progress
No validated result: No WorkflowResult with status validated has been submitted
Cooldown passed: At least diagnostic_cooldown_seconds (default: 60s) have passed since the last diagnostic agent was created
Stuck long enough: At least diagnostic_min_stuck_time_seconds (default: 60s) have passed since the last task was created or completed

How It Works

1. Detection (MonitoringLoop)

Every monitoring cycle (default: 60 seconds), the MonitoringLoop._check_workflow_stuck_state() method:

# Pseudo-code
if workflow_exists and has_tasks:
    if all_tasks_finished and no_validated_result:
        if cooldown_passed and stuck_long_enough:
            create_diagnostic_agent()

2. Context Gathering

When triggered, the system gathers comprehensive context:

Workflow Information:

Workflow goal (from result_criteria)
All phase definitions with their objectives
Current phase statuses

Recent History:

Last 15 completed/failed agents (configurable)
Their task descriptions, statuses, and outcomes
Completion notes and failure reasons

System Observations:

Last 5 Conductor system analyses
Duplicate work detections
System coherence scores

Submitted Results:

Any result submissions (even if rejected)
Validation feedback explaining rejections

3. Agent Creation

A diagnostic task and agent are created:

Task(
    description="DIAGNOSTIC: Analyze why workflow has stalled and create tasks to progress toward goal",
    done_definition="Created 1-5 new tasks with clear phase assignments",
    agent_type="diagnostic",
    phase_id=None,  # Diagnostic tasks span all phases
)

The diagnostic agent:

Works in the main repository (no worktree isolation)
Gets a specialized prompt with all gathered context
Has access to all Hephaestus MCP tools
Can create tasks in any phase

4. Diagnostic Process

The diagnostic agent follows a structured 4-step process:

Step 1: Understand the Goal

Reads the workflow's result_criteria
Identifies what "success" looks like

Step 2: Analyze Current State

Reviews what agents have accomplished
Examines which phases have progressed
Checks what outputs have been created
Analyzes any result submission failures

Step 3: Identify the Gap

Diagnoses why the goal hasn't been achieved
Identifies common stuck scenarios:
- Missing evidence/documentation
- Incomplete implementation
- Wrong direction
- Premature task completion
- Phase misalignment
- Validation failures

Step 4: Create Tasks

Uses create_task MCP tool to create 1-5 tasks
Assigns tasks to appropriate phases
Defines concrete completion criteria
Marks diagnostic task as done

5. Workflow Progression

Once the diagnostic agent creates new tasks:

Tasks are picked up by regular agents
Workflow progresses toward the goal
System continues monitoring
Another diagnostic may trigger if needed (after cooldown)

Configuration

YAML Configuration (`hephaestus_config.yaml`)

diagnostic_agent:
  enabled: true  # Enable/disable diagnostic agents
  cooldown_seconds: 60  # Min time between diagnostics
  min_stuck_time_seconds: 60  # How long "stuck" before triggering
  max_agents_to_analyze: 15  # Number of recent agents in context
  max_conductor_analyses: 5  # Number of Conductor analyses in context
  max_tasks_per_run: 5  # Max tasks diagnostic can create

Environment Variables

DIAGNOSTIC_AGENT_ENABLED=true
DIAGNOSTIC_COOLDOWN_SECONDS=60
DIAGNOSTIC_MIN_STUCK_TIME=60

SDK Configuration

from hephaestus_sdk import HephaestusConfig

config = HephaestusConfig(
    diagnostic_agent_enabled=True,
    diagnostic_cooldown_seconds=60,
    diagnostic_min_stuck_time_seconds=60,
)

Database Schema

DiagnosticRun Table

Tracks each diagnostic agent execution:

CREATE TABLE diagnostic_runs (
    id TEXT PRIMARY KEY,
    workflow_id TEXT NOT NULL,
    diagnostic_agent_id TEXT,
    diagnostic_task_id TEXT,

    -- Trigger conditions
    triggered_at DATETIME NOT NULL,
    total_tasks_at_trigger INTEGER NOT NULL,
    done_tasks_at_trigger INTEGER NOT NULL,
    failed_tasks_at_trigger INTEGER NOT NULL,
    time_since_last_task_seconds INTEGER NOT NULL,

    -- Results
    tasks_created_count INTEGER DEFAULT 0,
    tasks_created_ids JSON,
    completed_at DATETIME,
    status TEXT CHECK(status IN ('created', 'running', 'completed', 'failed')),

    -- Analysis context
    workflow_goal TEXT,
    phases_analyzed JSON,
    agents_reviewed JSON,
    diagnosis TEXT,

    FOREIGN KEY (workflow_id) REFERENCES workflows(id),
    FOREIGN KEY (diagnostic_agent_id) REFERENCES agents(id),
    FOREIGN KEY (diagnostic_task_id) REFERENCES tasks(id)
);

Agent Type Update

The agents.agent_type constraint now includes 'diagnostic':

agent_type TEXT CHECK(agent_type IN ('phase', 'validator', 'result_validator', 'monitor', 'diagnostic'))

Monitoring & Observability

Logs

Diagnostic agents produce distinctive log messages:

🚨 WORKFLOW STUCK DETECTED - 120s with no progress
🔍 Creating diagnostic agent for workflow abc12345
✅ Diagnostic agent def67890 created for workflow abc12345

Database Queries

View all diagnostic runs:

SELECT * FROM diagnostic_runs ORDER BY triggered_at DESC;

Check diagnostic effectiveness:

SELECT
    dr.id,
    dr.triggered_at,
    dr.tasks_created_count,
    dr.status,
    COUNT(t.id) as tasks_completed
FROM diagnostic_runs dr
LEFT JOIN tasks t ON t.created_by_agent_id = dr.diagnostic_agent_id
    AND t.status = 'done'
GROUP BY dr.id;

See which phases diagnostics create tasks in:

SELECT
    p.name as phase_name,
    COUNT(t.id) as tasks_created
FROM tasks t
JOIN agents a ON t.created_by_agent_id = a.id
JOIN phases p ON t.phase_id = p.id
WHERE a.agent_type = 'diagnostic'
GROUP BY p.name;

Troubleshooting

Diagnostic Not Triggering

Symptoms: Workflow seems stuck but no diagnostic agent is created

Check:

Is diagnostic_agent_enabled set to true?
Are there any active tasks? (Check tasks table)
Has cooldown period passed? (Check diagnostic_runs for last run)
Has workflow been stuck long enough? (Check diagnostic_min_stuck_time_seconds)

Debug:

# Check workflow status
SELECT workflow_id, status FROM tasks WHERE workflow_id = '<workflow_id>';

# Check last diagnostic
SELECT * FROM diagnostic_runs
WHERE workflow_id = '<workflow_id>'
ORDER BY triggered_at DESC LIMIT 1;

Diagnostic Creating Wrong Tasks

Symptoms: Diagnostic creates tasks but they don't help

Possible causes:

Insufficient context (increase max_agents_to_analyze)
Poor workflow goal definition (review result_criteria)
Diagnostic agent misunderstood situation

Solutions:

Review diagnostic agent's output in tmux session
Check diagnosis field in diagnostic_runs table
Improve workflow phase done_definitions for clarity

Too Many Diagnostics

Symptoms: Diagnostics keep triggering in a loop

Causes:

Cooldown too short
Diagnostic creates tasks that immediately complete

Solutions:

diagnostic_agent:
  cooldown_seconds: 120  # Increase cooldown
  min_stuck_time_seconds: 120  # Require longer stuck time

Diagnostic Agent Fails

Symptoms: Diagnostic task shows status failed

Check:

Diagnostic agent logs in tmux
failure_reason in tasks table
MCP tool availability

Recovery:

System will retry after cooldown period
Investigate and fix underlying issue
Manually create tasks if needed

Best Practices

1. Clear Workflow Goals

Define concrete, measurable result_criteria:

# ❌ Vague
result_criteria: "Complete the project"

# ✅ Specific
result_criteria: |
  Submit a result.md file containing:
  - The cracked password
  - Full methodology used
  - Execution outputs as proof
  - Use submit_result() tool to submit

2. Detailed Done Definitions

Help diagnostic agents understand what "done" means:

# ❌ Vague
Done_Definitions:
  - "Tests pass"

# ✅ Specific
Done_Definitions:
  - "All unit tests in tests/ directory pass with 0 failures"
  - "Integration tests in tests/integration/ execute successfully"
  - "Test results saved to test_results.txt with timestamps"

3. Completion Notes

Agents should provide detailed completion notes:

# Help diagnostic understand what was actually done
update_task_status(
    task_id="...",
    status="done",
    summary="Created test_password.go with 15 test cases. All tests pass. Output saved to test_output.txt"
)

4. Monitor Diagnostic Effectiveness

Regularly check:

-- Diagnostic success rate
SELECT
    COUNT(CASE WHEN tasks_created_count > 0 THEN 1 END) as successful,
    COUNT(*) as total,
    ROUND(100.0 * COUNT(CASE WHEN tasks_created_count > 0 THEN 1 END) / COUNT(*), 2) as success_rate
FROM diagnostic_runs;

Integration with Existing Systems

Guardian & Conductor

Diagnostic agents work alongside:

Guardian: Monitors individual agent health
Conductor: Detects system-wide issues (duplicates, coherence)
Diagnostic: Handles workflow-level stuckness

They complement each other:

Guardian/Conductor run every monitoring cycle
Diagnostic only triggers when workflow is stuck
All three share the same monitoring infrastructure

Validation System

Diagnostic agents respect the validation system:

Won't trigger if workflow has validated result
Considers validation feedback when analyzing
May create validation tasks if results were rejected

Phase System

Diagnostic agents are phase-aware:

Can create tasks in any phase (not just current)
May recommend going back to earlier phases
Understands phase dependencies and progression

Examples

Example 1: Missing Result Submission

Situation:

All tests passed
No result submitted

Diagnostic finds:

Tasks show "tests pass" but no evidence file
No submit_result calls in logs

Tasks created:

Phase 3: "Create evidence.md documenting all test outputs and execution steps"
Phase 3: "Submit result using submit_result() tool with evidence.md as proof"

Example 2: Implementation Incomplete

Situation:

"Implementation" phase tasks all done
"Testing" phase tasks failing

Diagnostic finds:

Tests can't run - missing dependencies
Implementation didn't include setup steps

Tasks created:

Phase 2: "Add dependency installation to setup.sh script"
Phase 2: "Document build prerequisites in BUILD.md"
Phase 3: "Re-run tests after dependencies are installed"

Example 3: Wrong Architectural Approach

Situation:

Multiple implementation attempts failed
All with similar errors

Diagnostic finds:

Approach doesn't match codebase architecture
Need to revisit planning phase

Tasks created:

Phase 1: "Analyze existing codebase architecture in detail"
Phase 1: "Design integration approach matching current patterns"
Phase 2: "Implement using new architectural approach from Phase 1"

Future Enhancements

Potential improvements to the diagnostic system:

Learning from Past Diagnostics
- Store successful diagnostic patterns
- Use RAG to suggest similar solutions
Multi-Agent Diagnostics
- Create diagnostic teams for complex analysis
- Parallel investigation of different hypotheses
Proactive Diagnostics
- Trigger before complete stuck state
- Based on trajectory analysis
User Notifications
- Alert users when diagnostic triggers
- Request human input for ambiguous situations

Support

For issues with the diagnostic agent system:

Check logs in logs/monitor.log
Query diagnostic_runs table for history
Review diagnostic agent tmux sessions
Open issue on GitHub with diagnostic run ID

Overview​

Purpose​

When It Activates​

How It Works​

1. Detection (MonitoringLoop)​

2. Context Gathering​

3. Agent Creation​

4. Diagnostic Process​

5. Workflow Progression​

Configuration​

YAML Configuration (hephaestus_config.yaml)​

Environment Variables​

SDK Configuration​

Database Schema​

DiagnosticRun Table​

Agent Type Update​

Monitoring & Observability​

Logs​

Database Queries​

Troubleshooting​

Diagnostic Not Triggering​

Diagnostic Creating Wrong Tasks​

Too Many Diagnostics​

Diagnostic Agent Fails​

Best Practices​

1. Clear Workflow Goals​

2. Detailed Done Definitions​

3. Completion Notes​

4. Monitor Diagnostic Effectiveness​

Integration with Existing Systems​

Guardian & Conductor​

Validation System​

Phase System​

Examples​

Example 1: Missing Result Submission​

Example 2: Implementation Incomplete​

Example 3: Wrong Architectural Approach​

Future Enhancements​

Support​

Overview

Purpose

When It Activates

How It Works

1. Detection (MonitoringLoop)

2. Context Gathering

3. Agent Creation

4. Diagnostic Process

5. Workflow Progression

Configuration

YAML Configuration (`hephaestus_config.yaml`)

Environment Variables

SDK Configuration

Database Schema

DiagnosticRun Table

Agent Type Update

Monitoring & Observability

Logs

Database Queries

Troubleshooting

Diagnostic Not Triggering

Diagnostic Creating Wrong Tasks

Too Many Diagnostics

Diagnostic Agent Fails

Best Practices

1. Clear Workflow Goals

2. Detailed Done Definitions

3. Completion Notes

4. Monitor Diagnostic Effectiveness

Integration with Existing Systems

Guardian & Conductor

Validation System

Phase System

Examples

Example 1: Missing Result Submission

Example 2: Implementation Incomplete

Example 3: Wrong Architectural Approach

Future Enhancements

Support