AgentX Development Roadmap¶
This document tracks the development history and future direction of AgentX.
Progress Overview¶
| Phase | Status | Description |
|---|---|---|
| Phase 1: Critical Fixes | Complete | Codebase baseline and dependency fixes |
| Phase 2: Wire Up Code | Complete | Connect scaffolded components |
| Phase 3: MCP Client | Complete | External tool server integration |
| Phase 4: Model Providers | Complete | Multi-provider LLM abstraction |
| Phase 5: Drafting Framework | Complete | Speculative decoding and pipelines |
| Phase 6: Reasoning Framework | Complete | CoT, ToT, ReAct, Reflection |
| Phase 7: Agent Core | Complete | Unified agent architecture |
| Phase 8: Client Updates | Complete | Tauri UI with cosmic theme |
| Phase 9: Security | Complete | Foundation security measures |
| Phase 10: Testing (Core) | Complete | 50 tests covering core functionality |
| Phase 11: Memory System | 94% | Persistent knowledge graphs |
| Phase 12: Documentation | In Progress | Comprehensive docs refresh |
| Phase 13: UI Implementation | In Progress | Chat and Agent interfaces |
Completed Phases¶
Phase 1: Critical Fixes¶
Goal: Get the codebase into a working baseline state
1.1 Missing Dependencies¶
- Added
redis>=5.0.0,sqlalchemy>=2.0.0,mcp>=1.0.0to pyproject.toml - Added
openai>=1.0.0,anthropic>=0.20.0,httpx>=0.27.0for model providers - Added
networkx>=3.0for reasoning graphs - Ran
uv syncto regenerate lock file and verified imports
1.2 Taskfile Corrections¶
- Fixed
client:build:bunx next build→bun run build - Fixed
client:start:bunx next start→bun run preview - Fixed
client:dev:bunx tauri run dev→bun run tauri dev - Removed Next.js references (project uses Vite)
1.3 OpenAPI Specification¶
- Updated server URL port:
8000→12319 - Documented
POST /tools/translateendpoint with request/response schemas - Fixed language-detect path:
/language-detect→/tools/language-detect-20 - Added proper response schemas and Error component schema
1.4 Environment Configuration¶
- Created
.env.examplewith all required variables - Updated
docker-compose.ymlto use environment variables - Ensured
.envis in.gitignore - Added redis-commander to optional
toolsprofile
1.5 Code Quality¶
- Replaced
print()statements withloggingin views.py - Fixed
language_detectto accept POST with text body (backwards compatible) - Fixed test import paths and URL paths to match new API structure
- Updated CLAUDE.md with correct API endpoints
Phase 2: Wire Up Existing Code¶
Goal: Connect the scaffolded code that already exists
2.1 Language Detection API Fix¶
- Modified
views.language_detect()to accept POST - Removed hardcoded test string
- Added input validation with structured response (language code + confidence)
2.2 Agent Memory System Connection¶
- Renamed
agent_memory.py→memory_utils.pyto avoid naming conflict - Added database connection health check endpoint:
GET /api/health - Created lazy-loading
get_agent_memory()function - Made PostgreSQL connection lazy (PostgresConnection class)
- Added
check_memory_health()for connection status
2.3 Code Quality Improvements¶
- Replaced all
print()statements withlogging - Added logging configuration to Django settings
- Added
psycopg2-binaryfor PostgreSQL connectivity - Added
sentence-transformersfor local embeddings - Added health check tests
Phase 3: MCP Client Integration¶
Goal: Enable AgentX to consume tools from external MCP servers
3.1 MCP Client Infrastructure¶
Created module structure:
api/agentx_ai/mcp/
├── __init__.py
├── client.py # MCP client manager
├── server_registry.py # Track connected MCP servers
├── tool_executor.py # Execute tools on remote servers
└── transports/
├── __init__.py
├── stdio.py # stdio transport (subprocess)
└── sse.py # SSE transport (HTTP)
Implemented MCPClientManager class:
- connect_server(server_config) → async context manager with ServerConnection
- list_tools(server_name?) → available tools
- call_tool(server_name, tool_name, args) → ToolResult
- list_resources(server_name?) → available resources
- read_resource(server_name, uri) → content
3.2 Server Configuration¶
- Created
mcp_servers.json.exampleconfiguration format - Added Taskfile commands:
task mcp:list-servers,task mcp:list-tools - Added API endpoints:
/api/mcp/servers,/api/mcp/tools,/api/mcp/resources
3.3 Tool Discovery & Caching¶
- Cache tool schemas on connection (in ToolExecutor)
- Handle server disconnection gracefully (via context managers)
Phase 4: Model Provider Abstraction¶
Goal: Support multiple LLM backends with unified interface
4.1 Provider Interface¶
Created abstract ModelProvider interface:
class ModelProvider(ABC):
@abstractmethod
async def complete(self, messages, **kwargs) -> CompletionResult
@abstractmethod
async def stream(self, messages, **kwargs) -> AsyncIterator[StreamChunk]
@abstractmethod
def get_capabilities(self) -> ModelCapabilities
4.2 Provider Implementations¶
OpenAIProvider- GPT-4, GPT-4-turbo, GPT-3.5AnthropicProvider- Claude 3 Opus/Sonnet/HaikuOllamaProvider- Local models (Llama, Mistral, etc.)
4.3 Model Registry¶
- Created
models.yamlconfiguration - Model selection based on task requirements (via ProviderRegistry)
Phase 5: Drafting Models Framework¶
Goal: Implement multi-model generation strategies
5.1 Drafting Infrastructure¶
Created module structure:
api/agentx_ai/drafting/
├── __init__.py
├── base.py # DraftingStrategy, DraftResult
├── speculative.py # Speculative decoding
├── pipeline.py # Multi-model pipelines
├── candidate.py # N-best candidate generation
└── drafting_strategies.yaml
5.2 Speculative Decoding¶
Implemented SpeculativeDecoder:
- Draft model generates N tokens quickly
- Target model verifies/corrects
- Configurable draft length and acceptance threshold
- Support draft/target model pairs (configurable)
5.3 Multi-Model Pipeline¶
Implemented ModelPipeline:
- Define pipeline stages (analyze → draft → refine → validate)
- Route to different models per stage
- Pass context between stages
- Predefined stage roles (ANALYZE, DRAFT, REVIEW, REFINE, etc.)
5.4 Candidate Generation¶
Implemented CandidateGenerator:
- Generate N candidates (same or different models)
- Scoring/ranking strategies:
- Self-consistency (majority vote)
- Verifier model scoring
- Heuristic scoring (length preference)
- Best-of-N selection
5.5 Drafting Configuration¶
- Created
drafting_strategies.yamlwith pre-configured strategies
Phase 6: Reasoning Framework¶
Goal: Implement flexible reasoning patterns for complex tasks
6.1 Reasoning Infrastructure¶
Created module structure:
api/agentx_ai/reasoning/
├── __init__.py
├── base.py # ReasoningStrategy, ThoughtStep
├── chain_of_thought.py
├── tree_of_thought.py
├── react.py # ReAct pattern
├── reflection.py # Self-critique
└── orchestrator.py # Combines strategies
6.2 Chain-of-Thought (CoT)¶
Implemented ChainOfThought:
- Zero-shot CoT ("Let's think step by step...")
- Few-shot CoT (with examples)
- Auto-CoT mode
- Step extraction and validation
- Answer extraction
6.3 Tree-of-Thought (ToT)¶
Implemented TreeOfThought:
- BFS exploration (breadth-first)
- DFS exploration (depth-first)
- Beam search (top-k branches)
- State evaluation heuristics
- Pruning strategies
- Tree serialization for traces
6.4 ReAct (Reasoning + Acting)¶
Implemented ReActAgent:
- Thought → Action → Observation loop
- Tool execution framework
- Max iterations / termination conditions
- Action parsing and execution
- Built-in tools (search, calculate)
6.5 Reflection & Self-Critique¶
Implemented ReflectiveReasoner:
- Generate initial response
- Self-critique: identify weaknesses
- Revise based on critique
- Configurable reflection depth
6.6 Reasoning Orchestrator¶
Implemented ReasoningOrchestrator:
- Select reasoning strategy based on task
- Task type classification
- Fallback strategies on failure
- Metrics: steps taken, tokens used, time elapsed
Phase 7: Agent Core¶
Goal: Unify MCP, drafting, and reasoning into coherent agent
7.1 Agent Architecture¶
Created module structure:
api/agentx_ai/agent/
├── __init__.py
├── core.py # Main Agent class
├── planner.py # Task decomposition
├── context.py # Context management
└── session.py # Conversation session
7.2 Agent Core Implementation¶
Implemented Agent class with:
- Integration with reasoning orchestrator
- Integration with drafting strategies
- Tool registration
- Status tracking
- Task cancellation
7.3 Task Planning¶
Implemented TaskPlanner:
- Decompose complex tasks into subtasks
- Identify subtask types (research, analysis, generation, etc.)
- Estimate complexity
- Select reasoning strategy based on plan
7.4 Context Management¶
Implemented ContextManager:
- Token estimation
- Sliding window for long conversations
- Summarization of old context
- Memory injection support
7.5 Session Management¶
Implemented Session and SessionManager:
- Message history tracking
- Session creation/retrieval
- Session cleanup for inactive sessions
7.6 Agent API Endpoints¶
POST /api/agent/run- Execute a taskPOST /api/agent/chat- Conversational interactionGET /api/agent/status- Check agent status
Phase 8: Client Updates¶
Goal: Update UI to support new agent capabilities
8.1 Agent Tab¶
- Created AgentTab component
- Task input with natural language
- Real-time reasoning trace display
- Tool usage visualization
- Cancel/pause controls
8.2 Dashboard Updates¶
- Connected MCP servers status
- Model provider status (API keys valid, local models loaded)
- Agent status widget
- System health overview
8.3 Settings Tab¶
- Multi-server configuration (per-server settings storage)
- Model provider API key management
- Drafting strategy selection
- Reasoning preferences
- Memory/storage info
8.4 Tools Tab Updates¶
- MCP tool browser (all available tools from connected servers)
- Tool search functionality
- Tool testing interface (placeholder)
- Server status display
8.5 Design System Updates¶
- Dark cosmic theme (nebula gradients, star accents)
- Lucide-react icons throughout
- Glassmorphism effects
- Glow animations
- Updated color palette (purple/cyan/pink cosmic theme)
8.6 Infrastructure¶
- Multi-server storage system (localStorage)
- ServerContext for app-wide server state
- Typed API client with React hooks
- Per-server metadata and preferences
Phase 9: Security & Infrastructure¶
Goal: Secure and stabilize the platform (Foundation)
9.1 Tauri Security¶
- Configured Content Security Policy in
tauri.conf.json - Reviewed and restricted Tauri capabilities (already minimal)
- Improved window defaults (1200x800, min size)
9.2 API Security¶
- Foundation for rate limiting (settings placeholder, not enforced)
- Foundation for API key authentication (settings placeholder, not enforced)
- Input validation limits defined (AGENTX_MAX_TEXT_LENGTH, AGENTX_MAX_CHAT_LENGTH)
- CORS configuration made environment-configurable
- CORS permissive in DEBUG mode for development
Note: Security is intentionally permissive for private server development. Foundation is in place for future hardening.
Phase 10: Testing (Core)¶
Goal: Ensure reliability through automated testing Result: 50 tests, 49 pass, 1 skipped for Docker
10.1 Backend Tests¶
- Fixed and unskipped
test_language_detect(POST and GET both work) - Translation tests: multiple language pairs, error handling, long text
- MCP tool tests: servers/tools/resources endpoints, registry operations
- Reasoning framework tests: base classes, CoT, ToT, ReAct, Reflection
- Drafting strategy tests: speculative, pipeline, candidate generation
- Provider tests: registry, model config, Message/CompletionResult
10.2 Integration Tests¶
- Full translation flow (via API endpoint tests)
- Docker service dependencies (health check with skipUnless)
Phase 11: Memory System (94%)¶
Goal: Persistent knowledge graphs with extraction, consolidation, and intelligent recall
11.1–11.7: Core Memory Infrastructure¶
- 4-type memory architecture: episodic (PostgreSQL), semantic (Neo4j + PostgreSQL), procedural (Neo4j), working (Redis)
AgentMemoryunified interface with lazy-loaded database connections- Memory models: Turn, Entity, Fact, Goal, Strategy with embeddings and salience scoring
- Channel-scoped memory with
_globaland_defaultchannels - Memory API: 12 endpoints for recall, entities, facts, strategies, stats, settings
11.8: Testing & Security¶
- 80+ memory tests covering integration, security, and edge cases
- Tenant isolation (user_id filtering), audit logging, input validation
- Graceful degradation when databases are unavailable
11.9: Extraction & Consolidation¶
ExtractionService: LLM-based entity/fact/relationship extraction in single call- Consolidation pipeline: scheduled jobs for extraction, entity linking, contradiction detection
JobRegistrywith worker threads and configurable intervals
11.10: Memory Explorer UI — Pending¶
11.11–11.12: Advanced Recall & Intelligence¶
RecallLayer: 5 retrieval techniques (hybrid search, entity-centric, query expansion, HyDE, self-query)- Confidence calibration, temporal reasoning, reinforcement signals
- Source attribution, user correction handling, contradiction detection
Phase 12: Documentation (In Progress)¶
Goal: Comprehensive backend documentation refresh
- Rewrote architecture docs (overview, API layer, memory)
- Expanded API reference: all 54 endpoints documented with request/response examples
- Comprehensive API models reference (providers, agent, memory, prompts, MCP, SSE)
- Created 5 new feature docs: reasoning, drafting, MCP, providers, prompts
- Rewrote/expanded 3 existing feature docs: chat, translation, memory
- Expanded development docs: contributing guide, setup guide
- Updated configuration reference with all config layers
- Updated quickstart with examples for streaming, MCP, memory, prompts
- Mermaid diagrams throughout for architecture and data flow visualization
Decision Log¶
| Date | Decision | Rationale |
|---|---|---|
| 2026-01-21 | MCP Client (not server) | AgentX consumes tools from external MCP servers |
| 2026-01-21 | Multi-strategy drafting | Support speculative, pipeline, and candidate generation |
| 2026-01-21 | Flexible reasoning | Support CoT, ToT, ReAct, Reflection patterns |
| 2026-01-21 | Keep agent_memory architecture | Well-designed, integrates with reasoning |
| 2026-01-21 | Use Todo.md for tracking | Simple, version controlled |
| 2026-01-31 | Extensible memory over backwards-compat | Memory system prioritizes easy extension; no backwards-compatibility shims |
| 2026-01-31 | Audit logging in PostgreSQL | All memory operations traceable per session with configurable verbosity |
| 2026-01-31 | Tenant isolation required | Semantic memory must filter by user_id to prevent cross-user data leakage |
| 2026-01-31 | Extraction via LLM preferred | Entity/fact extraction uses model providers for quality; spaCy as lightweight fallback |
| 2026-01-31 | Memory channels over multiple databases | Logical channel scoping (property on nodes/rows) vs multiple DBs |
| 2026-01-31 | _global as default channel |
User-wide memory (preferences, general facts) lives in _global |
| 2026-01-31 | Channels are traceable scopes, not isolation | Retrieval merges active channel + _global |
| 2026-01-31 | Cross-channel promotion via consolidation | Prominent project facts auto-promote to _global |
| 2026-01-31 | Audit logs partitioned by day | Configurable retention with daily resolution |
| 2026-01-31 | LLM-only extraction, no spaCy | Entity/fact extraction uses model providers exclusively |
| 2026-02-13 | Dual interface: Chat vs Agent | Chat = minimal friction; Agent = full power for prompt engineering |
| 2026-02-13 | Agent Profiles as first-class concept | Stored configurations that personalize agent behavior |
| 2026-02-13 | Conversation branching over linear history | Power users need to explore alternatives |
| 2026-02-13 | Profile inheritance via extends |
Child profiles inherit parent settings with overrides |
| 2026-02-13 | Global prompt library with tags | Templates shared across profiles |
| 2026-02-22 | Scheduler deferred, manual trigger priority | Manual consolidation trigger for debugging |
| 2026-02-23 | LLM providers for consolidation stages | Route through existing provider system |
| 2026-02-23 | Filter consolidation to user turns only | Extract facts from user messages only |
| 2026-02-23 | Default memory channel is _default, not _global | _global populated via promotion |
Resolved Questions¶
| Question | Resolution |
|---|---|
| Which LLM providers to prioritize? | OpenAI, Anthropic, Ollama implemented; LM-Studio preferred |
| Should agent memory require authentication? | No, one server = one user. Multiple servers can exist on one system. |
| Target platforms for distribution? | Linux and Windows for now |
| Reasoning trace storage format? | Neo4j graph + PostgreSQL audit log |
| Entity extraction method? | LLM-only via model providers; spaCy too constraining |
| Audit log retention policy? | 30 days default, configurable with daily resolution |
| Memory retrieval sync or async? | Blocking for now, may scale to concurrent for complex tasks |
| Cross-channel promotion thresholds? | confidence>=0.85, access_count>=5, conversations>=2 |
Deferred Items¶
These items were identified during development but deferred to future phases:
From Phase 3 (MCP Client)¶
- UI for managing MCP server connections
- Test with standard MCP servers (filesystem, github, postgres, brave-search)
- Document tested/supported servers
- Custom MCP server templates
- Tool refresh and search/filter functionality
From Phase 4 (Model Providers)¶
- TogetherProvider - Together.ai API
- LocalTransformersProvider - HuggingFace transformers
- Cost tracking and budgeting
From Phase 6 (Reasoning)¶
- DebateReasoner implementation
From Phase 9 (Security)¶
- Secure storage for API keys (Keyring/OS keychain)
- Encryption for sensitive config
- Audit logging for sensitive operations
From Phase 10 (Testing)¶
- React component tests
- E2E tests with Playwright/Cypress