Skip to content

AgentX Development Roadmap

This document tracks the development history and future direction of AgentX.

Progress Overview

Phase Status Description
Phase 1: Critical Fixes Complete Codebase baseline and dependency fixes
Phase 2: Wire Up Code Complete Connect scaffolded components
Phase 3: MCP Client Complete External tool server integration
Phase 4: Model Providers Complete Multi-provider LLM abstraction
Phase 5: Drafting Framework Complete Speculative decoding and pipelines
Phase 6: Reasoning Framework Complete CoT, ToT, ReAct, Reflection
Phase 7: Agent Core Complete Unified agent architecture
Phase 8: Client Updates Complete Tauri UI with cosmic theme
Phase 9: Security Complete Foundation security measures
Phase 10: Testing (Core) Complete 50 tests covering core functionality
Phase 11: Memory System 94% Persistent knowledge graphs
Phase 12: Documentation In Progress Comprehensive docs refresh
Phase 13: UI Implementation In Progress Chat and Agent interfaces

Completed Phases

Phase 1: Critical Fixes

Goal: Get the codebase into a working baseline state

1.1 Missing Dependencies

  • Added redis>=5.0.0, sqlalchemy>=2.0.0, mcp>=1.0.0 to pyproject.toml
  • Added openai>=1.0.0, anthropic>=0.20.0, httpx>=0.27.0 for model providers
  • Added networkx>=3.0 for reasoning graphs
  • Ran uv sync to regenerate lock file and verified imports

1.2 Taskfile Corrections

  • Fixed client:build: bunx next buildbun run build
  • Fixed client:start: bunx next startbun run preview
  • Fixed client:dev: bunx tauri run devbun run tauri dev
  • Removed Next.js references (project uses Vite)

1.3 OpenAPI Specification

  • Updated server URL port: 800012319
  • Documented POST /tools/translate endpoint with request/response schemas
  • Fixed language-detect path: /language-detect/tools/language-detect-20
  • Added proper response schemas and Error component schema

1.4 Environment Configuration

  • Created .env.example with all required variables
  • Updated docker-compose.yml to use environment variables
  • Ensured .env is in .gitignore
  • Added redis-commander to optional tools profile

1.5 Code Quality

  • Replaced print() statements with logging in views.py
  • Fixed language_detect to accept POST with text body (backwards compatible)
  • Fixed test import paths and URL paths to match new API structure
  • Updated CLAUDE.md with correct API endpoints

Phase 2: Wire Up Existing Code

Goal: Connect the scaffolded code that already exists

2.1 Language Detection API Fix

  • Modified views.language_detect() to accept POST
  • Removed hardcoded test string
  • Added input validation with structured response (language code + confidence)

2.2 Agent Memory System Connection

  • Renamed agent_memory.pymemory_utils.py to avoid naming conflict
  • Added database connection health check endpoint: GET /api/health
  • Created lazy-loading get_agent_memory() function
  • Made PostgreSQL connection lazy (PostgresConnection class)
  • Added check_memory_health() for connection status

2.3 Code Quality Improvements

  • Replaced all print() statements with logging
  • Added logging configuration to Django settings
  • Added psycopg2-binary for PostgreSQL connectivity
  • Added sentence-transformers for local embeddings
  • Added health check tests

Phase 3: MCP Client Integration

Goal: Enable AgentX to consume tools from external MCP servers

3.1 MCP Client Infrastructure

Created module structure:

api/agentx_ai/mcp/
├── __init__.py
├── client.py           # MCP client manager
├── server_registry.py  # Track connected MCP servers
├── tool_executor.py    # Execute tools on remote servers
└── transports/
    ├── __init__.py
    ├── stdio.py        # stdio transport (subprocess)
    └── sse.py          # SSE transport (HTTP)

Implemented MCPClientManager class: - connect_server(server_config) → async context manager with ServerConnection - list_tools(server_name?) → available tools - call_tool(server_name, tool_name, args) → ToolResult - list_resources(server_name?) → available resources - read_resource(server_name, uri) → content

3.2 Server Configuration

  • Created mcp_servers.json.example configuration format
  • Added Taskfile commands: task mcp:list-servers, task mcp:list-tools
  • Added API endpoints: /api/mcp/servers, /api/mcp/tools, /api/mcp/resources

3.3 Tool Discovery & Caching

  • Cache tool schemas on connection (in ToolExecutor)
  • Handle server disconnection gracefully (via context managers)

Phase 4: Model Provider Abstraction

Goal: Support multiple LLM backends with unified interface

4.1 Provider Interface

Created abstract ModelProvider interface:

class ModelProvider(ABC):
    @abstractmethod
    async def complete(self, messages, **kwargs) -> CompletionResult

    @abstractmethod
    async def stream(self, messages, **kwargs) -> AsyncIterator[StreamChunk]

    @abstractmethod
    def get_capabilities(self) -> ModelCapabilities

4.2 Provider Implementations

  • OpenAIProvider - GPT-4, GPT-4-turbo, GPT-3.5
  • AnthropicProvider - Claude 3 Opus/Sonnet/Haiku
  • OllamaProvider - Local models (Llama, Mistral, etc.)

4.3 Model Registry

  • Created models.yaml configuration
  • Model selection based on task requirements (via ProviderRegistry)

Phase 5: Drafting Models Framework

Goal: Implement multi-model generation strategies

5.1 Drafting Infrastructure

Created module structure:

api/agentx_ai/drafting/
├── __init__.py
├── base.py            # DraftingStrategy, DraftResult
├── speculative.py     # Speculative decoding
├── pipeline.py        # Multi-model pipelines
├── candidate.py       # N-best candidate generation
└── drafting_strategies.yaml

5.2 Speculative Decoding

Implemented SpeculativeDecoder: - Draft model generates N tokens quickly - Target model verifies/corrects - Configurable draft length and acceptance threshold - Support draft/target model pairs (configurable)

5.3 Multi-Model Pipeline

Implemented ModelPipeline: - Define pipeline stages (analyze → draft → refine → validate) - Route to different models per stage - Pass context between stages - Predefined stage roles (ANALYZE, DRAFT, REVIEW, REFINE, etc.)

5.4 Candidate Generation

Implemented CandidateGenerator: - Generate N candidates (same or different models) - Scoring/ranking strategies: - Self-consistency (majority vote) - Verifier model scoring - Heuristic scoring (length preference) - Best-of-N selection

5.5 Drafting Configuration

  • Created drafting_strategies.yaml with pre-configured strategies

Phase 6: Reasoning Framework

Goal: Implement flexible reasoning patterns for complex tasks

6.1 Reasoning Infrastructure

Created module structure:

api/agentx_ai/reasoning/
├── __init__.py
├── base.py              # ReasoningStrategy, ThoughtStep
├── chain_of_thought.py
├── tree_of_thought.py
├── react.py             # ReAct pattern
├── reflection.py        # Self-critique
└── orchestrator.py      # Combines strategies

6.2 Chain-of-Thought (CoT)

Implemented ChainOfThought: - Zero-shot CoT ("Let's think step by step...") - Few-shot CoT (with examples) - Auto-CoT mode - Step extraction and validation - Answer extraction

6.3 Tree-of-Thought (ToT)

Implemented TreeOfThought: - BFS exploration (breadth-first) - DFS exploration (depth-first) - Beam search (top-k branches) - State evaluation heuristics - Pruning strategies - Tree serialization for traces

6.4 ReAct (Reasoning + Acting)

Implemented ReActAgent: - Thought → Action → Observation loop - Tool execution framework - Max iterations / termination conditions - Action parsing and execution - Built-in tools (search, calculate)

6.5 Reflection & Self-Critique

Implemented ReflectiveReasoner: - Generate initial response - Self-critique: identify weaknesses - Revise based on critique - Configurable reflection depth

6.6 Reasoning Orchestrator

Implemented ReasoningOrchestrator: - Select reasoning strategy based on task - Task type classification - Fallback strategies on failure - Metrics: steps taken, tokens used, time elapsed


Phase 7: Agent Core

Goal: Unify MCP, drafting, and reasoning into coherent agent

7.1 Agent Architecture

Created module structure:

api/agentx_ai/agent/
├── __init__.py
├── core.py            # Main Agent class
├── planner.py         # Task decomposition
├── context.py         # Context management
└── session.py         # Conversation session

7.2 Agent Core Implementation

Implemented Agent class with: - Integration with reasoning orchestrator - Integration with drafting strategies - Tool registration - Status tracking - Task cancellation

7.3 Task Planning

Implemented TaskPlanner: - Decompose complex tasks into subtasks - Identify subtask types (research, analysis, generation, etc.) - Estimate complexity - Select reasoning strategy based on plan

7.4 Context Management

Implemented ContextManager: - Token estimation - Sliding window for long conversations - Summarization of old context - Memory injection support

7.5 Session Management

Implemented Session and SessionManager: - Message history tracking - Session creation/retrieval - Session cleanup for inactive sessions

7.6 Agent API Endpoints

  • POST /api/agent/run - Execute a task
  • POST /api/agent/chat - Conversational interaction
  • GET /api/agent/status - Check agent status

Phase 8: Client Updates

Goal: Update UI to support new agent capabilities

8.1 Agent Tab

  • Created AgentTab component
  • Task input with natural language
  • Real-time reasoning trace display
  • Tool usage visualization
  • Cancel/pause controls

8.2 Dashboard Updates

  • Connected MCP servers status
  • Model provider status (API keys valid, local models loaded)
  • Agent status widget
  • System health overview

8.3 Settings Tab

  • Multi-server configuration (per-server settings storage)
  • Model provider API key management
  • Drafting strategy selection
  • Reasoning preferences
  • Memory/storage info

8.4 Tools Tab Updates

  • MCP tool browser (all available tools from connected servers)
  • Tool search functionality
  • Tool testing interface (placeholder)
  • Server status display

8.5 Design System Updates

  • Dark cosmic theme (nebula gradients, star accents)
  • Lucide-react icons throughout
  • Glassmorphism effects
  • Glow animations
  • Updated color palette (purple/cyan/pink cosmic theme)

8.6 Infrastructure

  • Multi-server storage system (localStorage)
  • ServerContext for app-wide server state
  • Typed API client with React hooks
  • Per-server metadata and preferences

Phase 9: Security & Infrastructure

Goal: Secure and stabilize the platform (Foundation)

9.1 Tauri Security

  • Configured Content Security Policy in tauri.conf.json
  • Reviewed and restricted Tauri capabilities (already minimal)
  • Improved window defaults (1200x800, min size)

9.2 API Security

  • Foundation for rate limiting (settings placeholder, not enforced)
  • Foundation for API key authentication (settings placeholder, not enforced)
  • Input validation limits defined (AGENTX_MAX_TEXT_LENGTH, AGENTX_MAX_CHAT_LENGTH)
  • CORS configuration made environment-configurable
  • CORS permissive in DEBUG mode for development

Note: Security is intentionally permissive for private server development. Foundation is in place for future hardening.


Phase 10: Testing (Core)

Goal: Ensure reliability through automated testing Result: 50 tests, 49 pass, 1 skipped for Docker

10.1 Backend Tests

  • Fixed and unskipped test_language_detect (POST and GET both work)
  • Translation tests: multiple language pairs, error handling, long text
  • MCP tool tests: servers/tools/resources endpoints, registry operations
  • Reasoning framework tests: base classes, CoT, ToT, ReAct, Reflection
  • Drafting strategy tests: speculative, pipeline, candidate generation
  • Provider tests: registry, model config, Message/CompletionResult

10.2 Integration Tests

  • Full translation flow (via API endpoint tests)
  • Docker service dependencies (health check with skipUnless)

Phase 11: Memory System (94%)

Goal: Persistent knowledge graphs with extraction, consolidation, and intelligent recall

11.1–11.7: Core Memory Infrastructure

  • 4-type memory architecture: episodic (PostgreSQL), semantic (Neo4j + PostgreSQL), procedural (Neo4j), working (Redis)
  • AgentMemory unified interface with lazy-loaded database connections
  • Memory models: Turn, Entity, Fact, Goal, Strategy with embeddings and salience scoring
  • Channel-scoped memory with _global and _default channels
  • Memory API: 12 endpoints for recall, entities, facts, strategies, stats, settings

11.8: Testing & Security

  • 80+ memory tests covering integration, security, and edge cases
  • Tenant isolation (user_id filtering), audit logging, input validation
  • Graceful degradation when databases are unavailable

11.9: Extraction & Consolidation

  • ExtractionService: LLM-based entity/fact/relationship extraction in single call
  • Consolidation pipeline: scheduled jobs for extraction, entity linking, contradiction detection
  • JobRegistry with worker threads and configurable intervals

11.10: Memory Explorer UI — Pending

11.11–11.12: Advanced Recall & Intelligence

  • RecallLayer: 5 retrieval techniques (hybrid search, entity-centric, query expansion, HyDE, self-query)
  • Confidence calibration, temporal reasoning, reinforcement signals
  • Source attribution, user correction handling, contradiction detection

Phase 12: Documentation (In Progress)

Goal: Comprehensive backend documentation refresh

  • Rewrote architecture docs (overview, API layer, memory)
  • Expanded API reference: all 54 endpoints documented with request/response examples
  • Comprehensive API models reference (providers, agent, memory, prompts, MCP, SSE)
  • Created 5 new feature docs: reasoning, drafting, MCP, providers, prompts
  • Rewrote/expanded 3 existing feature docs: chat, translation, memory
  • Expanded development docs: contributing guide, setup guide
  • Updated configuration reference with all config layers
  • Updated quickstart with examples for streaming, MCP, memory, prompts
  • Mermaid diagrams throughout for architecture and data flow visualization

Decision Log

Date Decision Rationale
2026-01-21 MCP Client (not server) AgentX consumes tools from external MCP servers
2026-01-21 Multi-strategy drafting Support speculative, pipeline, and candidate generation
2026-01-21 Flexible reasoning Support CoT, ToT, ReAct, Reflection patterns
2026-01-21 Keep agent_memory architecture Well-designed, integrates with reasoning
2026-01-21 Use Todo.md for tracking Simple, version controlled
2026-01-31 Extensible memory over backwards-compat Memory system prioritizes easy extension; no backwards-compatibility shims
2026-01-31 Audit logging in PostgreSQL All memory operations traceable per session with configurable verbosity
2026-01-31 Tenant isolation required Semantic memory must filter by user_id to prevent cross-user data leakage
2026-01-31 Extraction via LLM preferred Entity/fact extraction uses model providers for quality; spaCy as lightweight fallback
2026-01-31 Memory channels over multiple databases Logical channel scoping (property on nodes/rows) vs multiple DBs
2026-01-31 _global as default channel User-wide memory (preferences, general facts) lives in _global
2026-01-31 Channels are traceable scopes, not isolation Retrieval merges active channel + _global
2026-01-31 Cross-channel promotion via consolidation Prominent project facts auto-promote to _global
2026-01-31 Audit logs partitioned by day Configurable retention with daily resolution
2026-01-31 LLM-only extraction, no spaCy Entity/fact extraction uses model providers exclusively
2026-02-13 Dual interface: Chat vs Agent Chat = minimal friction; Agent = full power for prompt engineering
2026-02-13 Agent Profiles as first-class concept Stored configurations that personalize agent behavior
2026-02-13 Conversation branching over linear history Power users need to explore alternatives
2026-02-13 Profile inheritance via extends Child profiles inherit parent settings with overrides
2026-02-13 Global prompt library with tags Templates shared across profiles
2026-02-22 Scheduler deferred, manual trigger priority Manual consolidation trigger for debugging
2026-02-23 LLM providers for consolidation stages Route through existing provider system
2026-02-23 Filter consolidation to user turns only Extract facts from user messages only
2026-02-23 Default memory channel is _default, not _global _global populated via promotion

Resolved Questions

Question Resolution
Which LLM providers to prioritize? OpenAI, Anthropic, Ollama implemented; LM-Studio preferred
Should agent memory require authentication? No, one server = one user. Multiple servers can exist on one system.
Target platforms for distribution? Linux and Windows for now
Reasoning trace storage format? Neo4j graph + PostgreSQL audit log
Entity extraction method? LLM-only via model providers; spaCy too constraining
Audit log retention policy? 30 days default, configurable with daily resolution
Memory retrieval sync or async? Blocking for now, may scale to concurrent for complex tasks
Cross-channel promotion thresholds? confidence>=0.85, access_count>=5, conversations>=2

Deferred Items

These items were identified during development but deferred to future phases:

From Phase 3 (MCP Client)

  • UI for managing MCP server connections
  • Test with standard MCP servers (filesystem, github, postgres, brave-search)
  • Document tested/supported servers
  • Custom MCP server templates
  • Tool refresh and search/filter functionality

From Phase 4 (Model Providers)

  • TogetherProvider - Together.ai API
  • LocalTransformersProvider - HuggingFace transformers
  • Cost tracking and budgeting

From Phase 6 (Reasoning)

  • DebateReasoner implementation

From Phase 9 (Security)

  • Secure storage for API keys (Keyring/OS keychain)
  • Encryption for sensitive config
  • Audit logging for sensitive operations

From Phase 10 (Testing)

  • React component tests
  • E2E tests with Playwright/Cypress