Skip to main content
Console →
Nous Ergon — Alpha Engine

Intelligence at work

A harness for rigorous AI/ML experiments in finance.

An equity research-and-trading system — multi-agent research, ML prediction, risk-gated execution, weekly self-tuning — instrumented end-to-end[1].

End-to-end measurement

Every signal, prediction, fill, and dollar of P&L instrumented and traceable. The console is a view, not a measurement layer.

Multi-agent research

Six sector teams, a portfolio decision agent, and a macro layer on LangGraph + Claude. Structured outputs, LLM-as-judge.

Machine-learning overlay

Stacked ensemble of gradient-boosted and linear models. 21-day market-relative return predictions with confidence-driven veto.

Self-improvement loop

Weekly evaluation writes optimized parameters back to four S3 configs. Downstream modules read them on cold-start.

Current phase

Every aspect of the system reliable and measurable — every experiment decided on data, not vibes.

Phase 1 ✓
Completeness
KPI: Coverage
Phase 2 ▶
Reliability + Measurability
KPI: Uptime + Coverage
Phase 3 ·
Performance (paper)
KPI: Alpha vs SPY
Phase 4 ·
Performance (live)
KPI: NAV
Phase 1 · Completeness
  • Seven modules wired end-to-end via S3 — data, research, prediction, execution, backtesting, evaluation, dashboard.
  • Multi-agent research, stacked meta-ensemble, risk-gated executor, weekly backtester.
  • Three Step Functions running unattended (Saturday weekly + weekday morning + EOD).
Phase 2 · Reliability + Measurability
  • Step Functions reliable end-to-end with drift detection and runtime trend alarms.
  • Every decision point measurable — agent calls, predictor verdicts, fills, P&L attribution, risk events.
  • Closed feedback loop — backtester writing four optimized configs to S3 weekly.
Phase 3 · Performance (paper)
  • Runs featured experiments against pre-committed bars on the Phase-2 substrate.
  • Broader feature breadth in inference (current 21 features → ~50-feature ArcticDB store).
  • Gated on ≥99% SF success rate over 8 weeks + transparency-inventory complete.
Phase 4 · Performance (live)
  • Paper → live capital with progressive sizing.
  • Portfolio-level risk overlays beyond per-position gates.
  • Gated on sustained positive alpha vs SPY over a 12-week Phase 3 window.

Instrumented end-to-end[1]

Every layer of the pipeline is observable and auditable:

Decision artifacts

Research-agent LLM calls capture prompt, response, tool calls, and structured metadata to s3://alpha-engine-research/decision_artifacts/, with LLM-as-judge rubric scores attached on every load-bearing agent type (sector quant, qual, peer-review, thesis-update, macro economist, CIO).

Trade audit log

Every order, fill, and exit decision recorded with per-trade realized_pnl and rationale where applicable; the audit trail that surfaced the PFE short-sell retro.

Performance metrics

Signal accuracy at 10d / 30d, predictor rolling 30d IC, NAV vs SPY daily returns, per-trade realized P&L, daily portfolio-level attribution (position P&L + interest + unattributed residual).

Cost telemetry

Per-call cost tracked at LLM call time and aggregated to a weekly cost parquet.

LangSmith tracing

Trace ID + token counts on every production LLM call.

Parity replay

The backtester replays the morning-planner stage of historical runs as an observational diff against current code.