Apex Free Tier — What You Get With the Server¶

The free tier isn't a crippled demo. It's the full AI agent platform, open source, self-hosted. All conversations stay on your machine. Zero third-party data flow.

Multi-Model Unified Chat (Use Your Existing Subscriptions)¶

Users connect their existing subscriptions and get one unified interface across every model. No separate API billing for Claude or Codex — you're using the subscriptions you already pay for.

Claude (Your Claude Code / Pro / Max Subscription) Runs through the Claude Agent SDK, which uses your existing Claude subscription — the same one you use for Claude Code CLI or claude.ai. Not a separate API key. Full Agent SDK integration with persistent sessions that survive across messages (no subprocess respawning). Complete tool ecosystem: Read, Write, Edit, Grep, Glob, Bash. Up to 1M context with Opus 4.6. Also supports Sonnet 4.6 and Haiku 4.5. The SDK maintains conversation history server-side, so context carries across the entire session.

Codex (Your ChatGPT Plus / Pro Subscription) Runs through the Codex CLI, which uses your existing OpenAI subscription — the same one you use for ChatGPT. Not a separate API key. GPT-5.4, GPT-5.3, o3, o4-mini. Supports one-shot and multi-turn resume modes. Can run in background as a parallel agent via the /delegate skill. Sandbox permissions control for network and disk access.

Grok (xAI API — Pay-Per-Use) The one model that requires a separate API key. xAI doesn't offer a subscription product to hook into, so Grok runs through the xAI API with a pay-per-use key. 2M context window. Native web search and X/Twitter search built into the model. Configurable thinking levels (off through extra-high). Runs through the same unified interface — same memory injection, same skills, same conversation history.

Local Models (Ollama / MLX — Free, No Account Needed) Run Qwen 3.5, Gemma 3, or any Ollama-compatible model on your own hardware. MLX support for optimized Mac inference (128K context). Full tool-calling loop included — local models can read files, run commands, search code, just like Claude. Genuinely free inference with no API charges and no account required.

Per-Chat Model Routing Each conversation is its own channel targeting a specific backend. Run Claude for complex coding, Grok for web research, and a local model for quick questions — all simultaneously, all in the same app. The server routes each chat independently.

The Memory System (Subconscious)¶

This is what separates Apex from every other chat wrapper. The AI remembers you across sessions, across models, across restarts.

APEX.md + MEMORY.md Injection Project instructions (APEX.md) and accumulated knowledge (MEMORY.md) are automatically loaded into every new session. Every model — Claude, Grok, local — sees the same instructions and memory. You define your rules once; every model follows them.

Embedding Index (Gemini Embedding 2) All memory files and conversation transcripts are indexed with 3072-dimensional vectors using Google's Gemini Embedding 2. The index updates incrementally — only changed files get re-embedded. Hybrid recall combines BM25 keyword matching with cosine similarity for semantic search. This powers both the /recall skill and the whisper system.

Whisper Injection (Automatic Context Recall) On each message you send, the server silently searches the embedding index for memories relevant to your current topic. Matching memories are injected into the prompt before your message reaches the model. The AI "remembers" conversations from weeks ago without you asking. You might ask about Plan M and the AI already knows your risk parameters, DTE preferences, and the last screener results — because the whisper system found and injected those memories.

Session Recovery (Continuity Across Restarts) When a session hits the context limit and compacts, or when the server restarts, the system generates a two-layer recovery block:

Recovery Briefing — A structured summary generated by a fast model (Grok → Haiku → Ollama fallback chain). Covers: task in progress, status, last action, pending items, key decisions, and extracted guidance (corrections, enforcements, decisions to persist). Gives the AI topic-level orientation.
Prior Session Transcript — The raw tail (~1,500 characters) of the conversation immediately before the reset, pulled directly from the database. This gives the AI moment-of-pause precision — not just what you were working on, but exactly where you stopped.

Both layers are injected together into the next message. The summary tells the AI what the project is about; the transcript tail tells it where to pick up. No "I don't have context from our previous conversation" — it just continues.

Recent Exchange Context For brand-new sessions (no SDK history), the last 2 Q&A pairs from the database are injected so the AI has immediate conversational continuity even before the embedding system kicks in.

Transcript Export & Search Every conversation is automatically exported to searchable JSONL transcripts. These feed the embedding index and the /recall skill. Your conversation history becomes a searchable knowledge base.

The Key Insight: This Works Across All Models Claude gets memory injection via the Agent SDK system prompt. Grok gets the same context injected into its prompt. Local models get it too. Switch from Claude to Grok mid-project — Grok already knows what you were working on because it receives the same memory, the same project instructions, the same whisper injections. The continuity layer is model-agnostic.

Skills System (18+ Server-Side Skills)¶

Slash commands that the server intercepts and executes before the model sees them. The model gets the results as context.

Search & Memory¶

/recall — Full-text search across all conversation transcripts (170+ sessions). Find what you discussed, when, and what was decided.
/embedding — Semantic search via the Gemini Embedding 2 index. Finds relevant context even when you don't use exact keywords.

Multi-Agent Delegation¶

/codex — Delegate a task to OpenAI Codex (GPT-5.4) as a parallel background agent. It runs independently and returns structured results. Supports sandbox permissions, multi-turn resume.
/grok — Web research via xAI. Live web search, X/Twitter search, bookmark injection, configurable thinking levels. For questions that need current information.
/ask-claude — Query Claude Code locally with full workspace context. Read-only, 15-60 second responses.
/delegate — Dispatch tasks to Codex agents in parallel. Handles prompt construction, flag selection, background execution, and convergence checking.

Content & Analysis¶

/evaluate-repo — Sandbox-assess any GitHub repo. Clones to /tmp (never your workspace), reads code, dependencies, license. Structured risk/relevance report.
/youtube — Fetch and analyze YouTube transcripts without web scraping.
/x-post — Fetch X/Twitter post threads via CLI.
/scrapling — Adaptive web scraping with anti-bot bypass and Cloudflare support.

Thinking Skills (Agent Follows Instructions)¶

/first-principles — 4-layer deep analysis: strip assumptions to bedrock truth, Feynman test (explain to a 12-year-old), self-challenge (attack your own model), zero-base test (rebuild from scratch).
/stop-slop — Scores prose on 5 dimensions and rewrites AI-sounding text to sound human. Identifies 8 core violations (throat-clearing, binary contrasts, passive voice, etc.).
/improve — Analyzes skill usage metrics and feedback logs, proposes concrete improvements with structured diffs and risk assessment.

Utilities¶

/check-logs — Read and search application logs with level filtering.
/portfolio-manager — Log trades to Google Sheets (specific to trading use case, would be removed in OSS extraction).

Admin Dashboard¶

Full web-based server management portal at /admin. 61 REST API endpoints — usable by humans (web UI) and AI agents (JSON API).

Health & Monitoring - Server status, uptime, active client count - Per-provider model reachability (green/red dots: Is Claude API responding? Is Ollama running? Is Grok reachable?) - Database stats (size, tables, WAL usage) - TLS certificate validity and expiration warnings

Configuration - Server settings (host, port, debug mode) - Default model selection - Compaction thresholds and SDK timeouts - Whisper and skill dispatch toggles - Ollama/MLX server URLs

Credential Management - Set/update API keys with password-masked input - Format validation (prefix checks for sk-ant-, xai-, etc.) - Rate limiting (5s cooldown per provider) - Audit logging (masked key + remote IP on every change) - Atomic writes (temp file + rename, no partial state on crash)

TLS Certificate Management - Generate CA, server certificates, client certificates - Download .p12 bundles for mobile devices - QR codes for easy mobile cert provisioning - Revoke client certificates - Update Subject Alternative Names (SANs)

Workspace & Skills - Edit APEX.md directly from the dashboard (with backup) - List and edit memory files - Enable/disable skills per chat - Manage guardrail whitelists

Session Management - List all active SDK sessions with context token usage - Force compaction on specific chats - Kill/disconnect sessions - View session recovery briefings

Logs & Diagnostics - Tail server logs with search and level filtering - SSE live log streaming - Database VACUUM - Full backup/restore (tarball export)

Alert System¶

Multi-channel alert delivery for when the AI (or your external systems) needs to reach you:

In-app (WebSocket) — Real-time toast notifications with badge counters. Long-poll fallback for when WebSocket isn't connected.
Telegram — Bot delivery for mobile push notifications when you're away from the app.
REST API — Authenticated POST endpoint so any external script, cron job, or trading system can push alerts into Apex.
Persistent inbox — All alerts stored in the database with ack/unack tracking, searchable by category, severity, time range.
Alert categories — Custom categorization (system, trading, health, etc.) with severity levels (info, warning, critical).

Security (SecureClaw)¶

Production-grade security from day one. This is what differentiates Apex from tools that treat security as an afterthought.

Already Shipped - mTLS — Client certificate required for ALL connections (dashboard + chat). No passwords to steal, no session tokens to hijack. - Password-masked credential input with autocomplete disabled - API key format validation (prefix checks, length bounds, control character rejection) - Rate limiting on credential updates (5s cooldown per provider) - Audit logging on all credential changes (masked key + remote IP) - Atomic .env writes (temp file + rename — no partial state on crash) - Whitespace stripping and input sanitization (newline/null/control char rejection) - Secrets never stored in the database — only in .env file

Planned for OSS Release - Encrypted-at-rest secrets (macOS Keychain / Linux keyring / Windows DPAPI) - CSRF tokens on all state-changing endpoints - Content Security Policy headers - Secrets redaction in log output

What You Need to Get Started¶

Requirement	Cost	Notes
Machine (Mac/Linux/Windows)	Your existing hardware	Python 3.14+, runs on anything
Claude subscription	$20-100/mo (you probably already have this)	Claude Code / Pro / Max — uses your existing sub via Agent SDK
OpenAI subscription	$20-200/mo (you probably already have this)	ChatGPT Plus / Pro — uses your existing sub via Codex CLI
xAI API key (Grok)	Pay-per-use	The only model that requires a separate API key
Google API key (embeddings)	Free tier sufficient	Powers semantic search
Ollama or MLX	Free	Optional — zero-cost local inference, no account needed
Telegram bot token	Free	Optional — for mobile alert delivery

Use any combination. Already paying for Claude Pro? You can use Claude through Apex at no extra cost. Already paying for ChatGPT Plus? Same — Codex works through your existing subscription. Only Grok requires a separate API key. Want zero-cost local only? Just run Ollama, no accounts needed.

What the Free Tier Does NOT Include¶

Group channels, multi-agent orchestration, custom personas — Free through September 30, 2026 (no license required). After that, these features require a license key ($29.99/mo, $249/yr, or $499 lifetime for the first 500).
iOS app connectivity — Free through September 30, 2026. After that, the iOS app (free download) requires your server to have a valid license for it to connect.
Multi-tenant isolation — Enterprise tier (separate DB/workspace/credentials per user)
SSO/OIDC — Enterprise tier (Okta, Google Workspace, Azure AD)
RBAC — Enterprise tier (admin vs viewer roles)
Audit trail API — Enterprise tier (queryable admin action history)
Key rotation workflows — Enterprise tier (scheduled rotation, zero-downtime)
Compliance reporting — Enterprise tier (SOC 2 / GDPR evidence generation)

The webapp (browser access to your server) is free and unlimited. One server license key unlocks premium features and native app connectivity.