learn-trading/quant_scaffold.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "0229f5ae",
   "metadata": {},
   "source": [
    "# Quant Trading Scaffold\n",
    "## Data Ingestion → Indicators → Walk-Forward ML → Backtesting → Tearsheet\n",
    "\n",
    "Pipeline:\n",
    "1. **Ingest** OHLCV data via yfinance\n",
    "2. **Engineer features** — momentum, trend, volatility, volume indicators\n",
    "3. **Label** — binary classification (next-N-day return > 0)\n",
    "4. **Walk-forward split** with purging (no leakage)\n",
    "5. **Train** XGBoost classifier per fold\n",
    "6. **Evaluate** with quantstats tearsheet"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "43b4d162",
   "metadata": {},
   "source": [
    "## 1. Config & Imports"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "aab1cebb",
   "metadata": {},
   "outputs": [
    {
     "ename": "ModuleNotFoundError",
     "evalue": "No module named 'numpy'",
     "output_type": "error",
     "traceback": [
      "\u001b[31m---------------------------------------------------------------------------\u001b[39m",
      "\u001b[31mModuleNotFoundError\u001b[39m                       Traceback (most recent call last)",
      "\u001b[36mCell\u001b[39m\u001b[36m \u001b[39m\u001b[32mIn[1]\u001b[39m\u001b[32m, line 6\u001b[39m\n\u001b[32m      3\u001b[39m \u001b[38;5;28;01mimport\u001b[39;00m\u001b[38;5;250m \u001b[39m\u001b[34;01mwarnings\u001b[39;00m\n\u001b[32m      4\u001b[39m warnings.filterwarnings(\u001b[33m\"\u001b[39m\u001b[33mignore\u001b[39m\u001b[33m\"\u001b[39m, category=\u001b[38;5;167;01mFutureWarning\u001b[39;00m)\n\u001b[32m----> \u001b[39m\u001b[32m6\u001b[39m \u001b[38;5;28;01mimport\u001b[39;00m\u001b[38;5;250m \u001b[39m\u001b[34;01mnumpy\u001b[39;00m\u001b[38;5;250m \u001b[39m\u001b[38;5;28;01mas\u001b[39;00m\u001b[38;5;250m \u001b[39m\u001b[34;01mnp\u001b[39;00m\n\u001b[32m      7\u001b[39m \u001b[38;5;28;01mimport\u001b[39;00m\u001b[38;5;250m \u001b[39m\u001b[34;01mpandas\u001b[39;00m\u001b[38;5;250m \u001b[39m\u001b[38;5;28;01mas\u001b[39;00m\u001b[38;5;250m \u001b[39m\u001b[34;01mpd\u001b[39;00m\n\u001b[32m      8\u001b[39m \u001b[38;5;28;01mimport\u001b[39;00m\u001b[38;5;250m \u001b[39m\u001b[34;01mpandas_ta\u001b[39;00m\u001b[38;5;250m \u001b[39m\u001b[38;5;28;01mas\u001b[39;00m\u001b[38;5;250m \u001b[39m\u001b[34;01mta\u001b[39;00m\n",
      "\u001b[31mModuleNotFoundError\u001b[39m: No module named 'numpy'"
     ]
    }
   ],
   "source": [
    "from __future__ import annotations\n",
    "\n",
    "import warnings\n",
    "warnings.filterwarnings(\"ignore\", category=FutureWarning)\n",
    "\n",
    "import numpy as np\n",
    "import pandas as pd\n",
    "import pandas_ta as ta\n",
    "import yfinance as yf\n",
    "import plotly.graph_objects as go\n",
    "from plotly.subplots import make_subplots\n",
    "from sklearn.model_selection import TimeSeriesSplit\n",
    "from sklearn.metrics import accuracy_score, classification_report\n",
    "from xgboost import XGBClassifier\n",
    "import quantstats as qs\n",
    "\n",
    "# ── Config ──────────────────────────────────────────────────────\n",
    "TICKER = \"SPY\"\n",
    "START = \"2015-01-01\"\n",
    "END = \"2025-12-31\"\n",
    "HORIZON = 5          # predict N-day forward return\n",
    "PURGE_GAP = 5        # gap between train/test to prevent leakage\n",
    "N_SPLITS = 5         # walk-forward folds\n",
    "TRAIN_MIN = 504      # ~2 years minimum training window\n",
    "\n",
    "print(f\"Config: {TICKER} | {START}→{END} | horizon={HORIZON}d | {N_SPLITS} folds\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "28af2cae",
   "metadata": {},
   "source": [
    "## 2. Data Ingestion"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b4d755da",
   "metadata": {},
   "outputs": [],
   "source": [
    "raw = yf.download(TICKER, start=START, end=END, auto_adjust=True)\n",
    "# yfinance may return MultiIndex columns for single ticker — flatten\n",
    "if isinstance(raw.columns, pd.MultiIndex):\n",
    "    raw.columns = raw.columns.droplevel(\"Ticker\")\n",
    "raw.index = pd.DatetimeIndex(raw.index)\n",
    "df = raw.copy()\n",
    "print(f\"Downloaded {len(df)} bars: {df.index[0].date()} → {df.index[-1].date()}\")\n",
    "df.tail(3)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e9b1bad5",
   "metadata": {},
   "source": [
    "## 3. Feature Engineering — Technical Indicators\n",
    "\n",
    "We compute features across 4 categories:\n",
    "- **Momentum**: RSI, MACD, Stochastic, Williams %R, ROC\n",
    "- **Trend**: SMA/EMA crossovers, ADX, Ichimoku\n",
    "- **Volatility**: Bollinger Bands, ATR, Keltner Channels\n",
    "- **Volume**: OBV, MFI, Accumulation/Distribution"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a83bf612",
   "metadata": {},
   "outputs": [],
   "source": [
    "# ── Momentum ────────────────────────────────────────────────────\n",
    "df[\"rsi_14\"] = ta.rsi(df[\"Close\"], length=14)\n",
    "df[\"rsi_7\"] = ta.rsi(df[\"Close\"], length=7)\n",
    "\n",
    "macd = ta.macd(df[\"Close\"], fast=12, slow=26, signal=9)\n",
    "df[\"macd\"] = macd.iloc[:, 0]          # MACD line\n",
    "df[\"macd_signal\"] = macd.iloc[:, 1]   # signal line\n",
    "df[\"macd_hist\"] = macd.iloc[:, 2]     # histogram\n",
    "\n",
    "stoch = ta.stoch(df[\"High\"], df[\"Low\"], df[\"Close\"])\n",
    "df[\"stoch_k\"] = stoch.iloc[:, 0]\n",
    "df[\"stoch_d\"] = stoch.iloc[:, 1]\n",
    "\n",
    "df[\"willr_14\"] = ta.willr(df[\"High\"], df[\"Low\"], df[\"Close\"], length=14)\n",
    "df[\"roc_10\"] = ta.roc(df[\"Close\"], length=10)\n",
    "df[\"roc_21\"] = ta.roc(df[\"Close\"], length=21)\n",
    "df[\"mom_10\"] = ta.mom(df[\"Close\"], length=10)\n",
    "\n",
    "# ── Trend ───────────────────────────────────────────────────────\n",
    "df[\"sma_20\"] = ta.sma(df[\"Close\"], length=20)\n",
    "df[\"sma_50\"] = ta.sma(df[\"Close\"], length=50)\n",
    "df[\"sma_200\"] = ta.sma(df[\"Close\"], length=200)\n",
    "df[\"ema_12\"] = ta.ema(df[\"Close\"], length=12)\n",
    "df[\"ema_26\"] = ta.ema(df[\"Close\"], length=26)\n",
    "\n",
    "# crossover features (price relative to MAs)\n",
    "df[\"close_over_sma20\"] = (df[\"Close\"] / df[\"sma_20\"]) - 1\n",
    "df[\"close_over_sma50\"] = (df[\"Close\"] / df[\"sma_50\"]) - 1\n",
    "df[\"close_over_sma200\"] = (df[\"Close\"] / df[\"sma_200\"]) - 1\n",
    "df[\"sma20_over_sma50\"] = (df[\"sma_20\"] / df[\"sma_50\"]) - 1\n",
    "df[\"sma50_over_sma200\"] = (df[\"sma_50\"] / df[\"sma_200\"]) - 1\n",
    "\n",
    "adx = ta.adx(df[\"High\"], df[\"Low\"], df[\"Close\"], length=14)\n",
    "df[\"adx\"] = adx.iloc[:, 0]\n",
    "df[\"di_plus\"] = adx.iloc[:, 1]\n",
    "df[\"di_minus\"] = adx.iloc[:, 2]\n",
    "\n",
    "# ── Volatility ──────────────────────────────────────────────────\n",
    "bbands = ta.bbands(df[\"Close\"], length=20, std=2)\n",
    "df[\"bb_upper\"] = bbands.iloc[:, 0]\n",
    "df[\"bb_mid\"] = bbands.iloc[:, 1]\n",
    "df[\"bb_lower\"] = bbands.iloc[:, 2]\n",
    "df[\"bb_width\"] = bbands.iloc[:, 3]\n",
    "df[\"bb_pctb\"] = bbands.iloc[:, 4]   # %B: where price is within bands\n",
    "\n",
    "df[\"atr_14\"] = ta.atr(df[\"High\"], df[\"Low\"], df[\"Close\"], length=14)\n",
    "df[\"atr_pct\"] = df[\"atr_14\"] / df[\"Close\"]  # normalized ATR\n",
    "\n",
    "kc = ta.kc(df[\"High\"], df[\"Low\"], df[\"Close\"], length=20)\n",
    "df[\"kc_upper\"] = kc.iloc[:, 0]\n",
    "df[\"kc_lower\"] = kc.iloc[:, 1]\n",
    "\n",
    "# volatility: rolling std of returns\n",
    "df[\"vol_10\"] = df[\"Close\"].pct_change().rolling(10).std()\n",
    "df[\"vol_21\"] = df[\"Close\"].pct_change().rolling(21).std()\n",
    "\n",
    "# ── Volume ──────────────────────────────────────────────────────\n",
    "df[\"obv\"] = ta.obv(df[\"Close\"], df[\"Volume\"])\n",
    "df[\"obv_sma20\"] = ta.sma(df[\"obv\"], length=20)\n",
    "df[\"mfi_14\"] = ta.mfi(df[\"High\"], df[\"Low\"], df[\"Close\"], df[\"Volume\"], length=14)\n",
    "ad = ta.ad(df[\"High\"], df[\"Low\"], df[\"Close\"], df[\"Volume\"])\n",
    "df[\"ad_line\"] = ad\n",
    "\n",
    "# volume relative to average\n",
    "df[\"vol_ratio_20\"] = df[\"Volume\"] / df[\"Volume\"].rolling(20).mean()\n",
    "\n",
    "# ── Returns features ────────────────────────────────────────────\n",
    "df[\"ret_1d\"] = df[\"Close\"].pct_change(1)\n",
    "df[\"ret_5d\"] = df[\"Close\"].pct_change(5)\n",
    "df[\"ret_10d\"] = df[\"Close\"].pct_change(10)\n",
    "df[\"ret_21d\"] = df[\"Close\"].pct_change(21)\n",
    "\n",
    "print(f\"Total columns after feature engineering: {len(df.columns)}\")\n",
    "df.tail(3)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "907e377c",
   "metadata": {},
   "source": [
    "## 4. Labeling — Forward Return Classification\n",
    "\n",
    "Target: is the N-day forward return positive? (buy signal = 1, sell/hold signal = 0)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "81daaa5f",
   "metadata": {},
   "outputs": [],
   "source": [
    "# forward return (what we're predicting)\n",
    "df[\"fwd_ret\"] = df[\"Close\"].pct_change(HORIZON).shift(-HORIZON)\n",
    "df[\"label\"] = (df[\"fwd_ret\"] > 0).astype(int)\n",
    "\n",
    "# ── Define feature columns (exclude raw OHLCV, target, and non-stationary cols)\n",
    "EXCLUDE = {\n",
    "    \"Open\", \"High\", \"Low\", \"Close\", \"Volume\",\n",
    "    \"fwd_ret\", \"label\",\n",
    "    \"sma_20\", \"sma_50\", \"sma_200\", \"ema_12\", \"ema_26\",  # non-stationary\n",
    "    \"bb_upper\", \"bb_mid\", \"bb_lower\",                     # non-stationary\n",
    "    \"kc_upper\", \"kc_lower\",                                # non-stationary\n",
    "    \"obv\", \"obv_sma20\", \"ad_line\",                         # non-stationary\n",
    "}\n",
    "FEATURES = [c for c in df.columns if c not in EXCLUDE]\n",
    "\n",
    "# drop rows with NaN (from indicator warm-up + forward label)\n",
    "model_df = df[FEATURES + [\"label\", \"fwd_ret\"]].dropna()\n",
    "\n",
    "print(f\"Features: {len(FEATURES)}\")\n",
    "print(f\"Usable rows: {len(model_df)} ({model_df.index[0].date()} → {model_df.index[-1].date()})\")\n",
    "print(f\"Label balance: {model_df['label'].value_counts(normalize=True).to_dict()}\")\n",
    "print(f\"\\nFeature list:\\n{FEATURES}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "28769141",
   "metadata": {},
   "source": [
    "## 5. Walk-Forward Split with Purge Gap\n",
    "\n",
    "Time series data **cannot** use random k-fold — future data would leak into training.\n",
    "\n",
    "We use **expanding-window walk-forward** with a **purge gap** between train/test:\n",
    "\n",
    "```\n",
    "Fold 1: [====TRAIN====]--gap--[TEST]\n",
    "Fold 2: [========TRAIN========]--gap--[TEST]\n",
    "Fold 3: [============TRAIN============]--gap--[TEST]\n",
    "```\n",
    "\n",
    "The gap prevents label leakage from overlapping forward-return windows."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "60594682",
   "metadata": {},
   "outputs": [],
   "source": [
    "def walk_forward_splits(n_samples: int, n_splits: int, test_size: int = 126,\n",
    "                        purge_gap: int = 5, min_train: int = 504):\n",
    "    \"\"\"\n",
    "    Expanding-window walk-forward with purge gap.\n",
    "    \n",
    "    Yields (train_idx, test_idx) index arrays.\n",
    "    test_size: ~6 months of trading days\n",
    "    min_train: ~2 years of trading days\n",
    "    purge_gap: days between train end and test start\n",
    "    \"\"\"\n",
    "    total_test = n_splits * test_size\n",
    "    if min_train + total_test + n_splits * purge_gap > n_samples:\n",
    "        raise ValueError(f\"Not enough data for {n_splits} splits. \"\n",
    "                         f\"Need {min_train + total_test + n_splits * purge_gap}, have {n_samples}\")\n",
    "    \n",
    "    for i in range(n_splits):\n",
    "        test_end = n_samples - (n_splits - 1 - i) * test_size\n",
    "        test_start = test_end - test_size\n",
    "        train_end = test_start - purge_gap\n",
    "        train_start = 0  # expanding window (use max(0, train_end - fixed_window) for sliding)\n",
    "        \n",
    "        train_idx = np.arange(train_start, train_end)\n",
    "        test_idx = np.arange(test_start, test_end)\n",
    "        yield train_idx, test_idx\n",
    "\n",
    "\n",
    "# ── Visualize the splits ────────────────────────────────────────\n",
    "X = model_df[FEATURES].values\n",
    "y = model_df[\"label\"].values\n",
    "dates = model_df.index\n",
    "\n",
    "fig = go.Figure()\n",
    "for fold, (tr_idx, te_idx) in enumerate(walk_forward_splits(len(X), N_SPLITS, purge_gap=PURGE_GAP, min_train=TRAIN_MIN)):\n",
    "    fig.add_trace(go.Scatter(\n",
    "        x=[dates[tr_idx[0]], dates[tr_idx[-1]]], y=[fold, fold],\n",
    "        mode=\"lines\", line=dict(color=\"steelblue\", width=8),\n",
    "        name=f\"Train {fold}\" if fold == 0 else None, showlegend=(fold == 0),\n",
    "    ))\n",
    "    fig.add_trace(go.Scatter(\n",
    "        x=[dates[te_idx[0]], dates[te_idx[-1]]], y=[fold, fold],\n",
    "        mode=\"lines\", line=dict(color=\"coral\", width=8),\n",
    "        name=f\"Test {fold}\" if fold == 0 else None, showlegend=(fold == 0),\n",
    "    ))\n",
    "    print(f\"Fold {fold}: train {dates[tr_idx[0]].date()}→{dates[tr_idx[-1]].date()} \"\n",
    "          f\"({len(tr_idx)}d) | test {dates[te_idx[0]].date()}→{dates[te_idx[-1]].date()} ({len(te_idx)}d)\")\n",
    "\n",
    "fig.update_layout(title=\"Walk-Forward Splits\", yaxis_title=\"Fold\", height=300)\n",
    "fig.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a80d23c9",
   "metadata": {},
   "source": [
    "## 6. Train XGBoost per Fold — Walk-Forward\n",
    "\n",
    "Train on expanding window, predict test fold, collect out-of-sample predictions."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ca9b91e6",
   "metadata": {},
   "outputs": [],
   "source": [
    "oos_preds = []   # out-of-sample predictions\n",
    "oos_proba = []   # predicted probabilities\n",
    "oos_labels = []\n",
    "oos_dates = []\n",
    "oos_fwd_ret = []\n",
    "fold_metrics = []\n",
    "\n",
    "for fold, (tr_idx, te_idx) in enumerate(walk_forward_splits(len(X), N_SPLITS, purge_gap=PURGE_GAP, min_train=TRAIN_MIN)):\n",
    "    X_train, y_train = X[tr_idx], y[tr_idx]\n",
    "    X_test, y_test = X[te_idx], y[te_idx]\n",
    "    \n",
    "    model = XGBClassifier(\n",
    "        n_estimators=300,\n",
    "        max_depth=4,\n",
    "        learning_rate=0.05,\n",
    "        subsample=0.8,\n",
    "        colsample_bytree=0.8,\n",
    "        reg_alpha=0.1,\n",
    "        reg_lambda=1.0,\n",
    "        random_state=42,\n",
    "        eval_metric=\"logloss\",\n",
    "        early_stopping_rounds=30,\n",
    "    )\n",
    "    model.fit(\n",
    "        X_train, y_train,\n",
    "        eval_set=[(X_test, y_test)],\n",
    "        verbose=False,\n",
    "    )\n",
    "    \n",
    "    preds = model.predict(X_test)\n",
    "    proba = model.predict_proba(X_test)[:, 1]\n",
    "    acc = accuracy_score(y_test, preds)\n",
    "    \n",
    "    oos_preds.extend(preds)\n",
    "    oos_proba.extend(proba)\n",
    "    oos_labels.extend(y_test)\n",
    "    oos_dates.extend(dates[te_idx])\n",
    "    oos_fwd_ret.extend(model_df[\"fwd_ret\"].values[te_idx])\n",
    "    \n",
    "    fold_metrics.append({\"fold\": fold, \"accuracy\": acc, \"train_size\": len(tr_idx), \"test_size\": len(te_idx)})\n",
    "    print(f\"Fold {fold}: acc={acc:.3f} | train={len(tr_idx)} | test={len(te_idx)}\")\n",
    "\n",
    "print(f\"\\nOverall OOS accuracy: {accuracy_score(oos_labels, oos_preds):.3f}\")\n",
    "print(classification_report(oos_labels, oos_preds, target_names=[\"SELL/HOLD\", \"BUY\"]))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ea7d30fb",
   "metadata": {},
   "source": [
    "## 7. Feature Importance (Last Fold)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "06f941b8",
   "metadata": {},
   "outputs": [],
   "source": [
    "imp = pd.Series(model.feature_importances_, index=FEATURES).sort_values(ascending=True)\n",
    "fig = go.Figure(go.Bar(x=imp.tail(20), y=imp.tail(20).index, orientation=\"h\"))\n",
    "fig.update_layout(title=\"Top 20 Feature Importances (last fold)\", height=500, margin=dict(l=150))\n",
    "fig.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1112fdda",
   "metadata": {},
   "source": [
    "## 8. Strategy Simulation — Signal → Returns\n",
    "\n",
    "Convert model predictions to a strategy equity curve:\n",
    "- **Signal = 1 (BUY)**: go long (earn the market return)\n",
    "- **Signal = 0 (SELL/HOLD)**: stay in cash (earn 0)\n",
    "\n",
    "Compare against buy-and-hold benchmark."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "0893ddb0",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Build strategy returns series from OOS predictions\n",
    "strat = pd.DataFrame({\n",
    "    \"date\": oos_dates,\n",
    "    \"signal\": oos_preds,\n",
    "    \"proba\": oos_proba,\n",
    "    \"fwd_ret\": oos_fwd_ret,\n",
    "}).set_index(\"date\")\n",
    "\n",
    "# daily returns: we use daily close-to-close returns, masked by signal\n",
    "# align with actual daily returns (not forward returns) for proper equity curve\n",
    "daily_ret = df[\"Close\"].pct_change().reindex(strat.index)\n",
    "\n",
    "# strategy return: market return when signal=1, 0 when signal=0\n",
    "strat[\"strat_ret\"] = daily_ret * strat[\"signal\"]\n",
    "strat[\"bench_ret\"] = daily_ret\n",
    "\n",
    "# cumulative\n",
    "strat[\"strat_equity\"] = (1 + strat[\"strat_ret\"]).cumprod()\n",
    "strat[\"bench_equity\"] = (1 + strat[\"bench_ret\"]).cumprod()\n",
    "\n",
    "# plot\n",
    "fig = go.Figure()\n",
    "fig.add_trace(go.Scatter(x=strat.index, y=strat[\"strat_equity\"], name=\"Strategy\", line=dict(color=\"steelblue\")))\n",
    "fig.add_trace(go.Scatter(x=strat.index, y=strat[\"bench_equity\"], name=\"Buy & Hold\", line=dict(color=\"gray\", dash=\"dot\")))\n",
    "\n",
    "# shade buy signals\n",
    "in_market = strat[\"signal\"] == 1\n",
    "changes = in_market.astype(int).diff().fillna(0)\n",
    "entries = strat.index[changes == 1]\n",
    "exits = strat.index[changes == -1]\n",
    "# align: if first signal is 1, start from beginning\n",
    "if in_market.iloc[0]:\n",
    "    entries = entries.insert(0, strat.index[0])\n",
    "if in_market.iloc[-1]:\n",
    "    exits = exits.append(pd.DatetimeIndex([strat.index[-1]]))\n",
    "for ent, ext in zip(entries, exits):\n",
    "    fig.add_vrect(x0=ent, x1=ext, fillcolor=\"green\", opacity=0.07, line_width=0)\n",
    "\n",
    "fig.update_layout(\n",
    "    title=\"Strategy vs Buy & Hold (OOS)\",\n",
    "    yaxis_title=\"Equity ($1 start)\", height=450,\n",
    ")\n",
    "fig.show()\n",
    "\n",
    "print(f\"Strategy final: ${strat['strat_equity'].iloc[-1]:.2f}\")\n",
    "print(f\"Benchmark final: ${strat['bench_equity'].iloc[-1]:.2f}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d757116a",
   "metadata": {},
   "source": [
    "## 9. QuantStats Tearsheet\n",
    "\n",
    "Full performance report: Sharpe, Sortino, max drawdown, rolling metrics, monthly heatmap."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "34fdc588",
   "metadata": {},
   "outputs": [],
   "source": [
    "# quantstats expects a returns series with datetime index\n",
    "strategy_returns = strat[\"strat_ret\"].copy()\n",
    "strategy_returns.index = pd.DatetimeIndex(strategy_returns.index)\n",
    "benchmark_returns = strat[\"bench_ret\"].copy()\n",
    "benchmark_returns.index = pd.DatetimeIndex(benchmark_returns.index)\n",
    "\n",
    "qs.extend_pandas()\n",
    "\n",
    "# key metrics\n",
    "print(\"=\" * 50)\n",
    "print(\"STRATEGY METRICS (out-of-sample)\")\n",
    "print(\"=\" * 50)\n",
    "print(f\"Sharpe:       {qs.stats.sharpe(strategy_returns):.2f}\")\n",
    "print(f\"Sortino:      {qs.stats.sortino(strategy_returns):.2f}\")\n",
    "print(f\"Max Drawdown: {qs.stats.max_drawdown(strategy_returns):.2%}\")\n",
    "print(f\"CAGR:         {qs.stats.cagr(strategy_returns):.2%}\")\n",
    "print(f\"Calmar:       {qs.stats.calmar(strategy_returns):.2f}\")\n",
    "print(f\"Win Rate:     {qs.stats.win_rate(strategy_returns):.2%}\")\n",
    "print(f\"Volatility:   {qs.stats.volatility(strategy_returns):.2%}\")\n",
    "print(f\"Avg Win:      {qs.stats.avg_win(strategy_returns):.4f}\")\n",
    "print(f\"Avg Loss:     {qs.stats.avg_loss(strategy_returns):.4f}\")\n",
    "print(f\"Profit Factor:{qs.stats.profit_factor(strategy_returns):.2f}\")\n",
    "print(\"=\" * 50)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6799c588",
   "metadata": {},
   "outputs": [],
   "source": [
    "# full HTML tearsheet — saved to file + displayed inline\n",
    "qs.reports.html(strategy_returns, benchmark=benchmark_returns,\n",
    "                title=f\"{TICKER} ML Signal Strategy (OOS Walk-Forward)\",\n",
    "                output=\"tearsheet.html\")\n",
    "print(\"Tearsheet saved to tearsheet.html\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4bb838bb",
   "metadata": {},
   "source": [
    "## 10. Signal Dashboard — Price + Indicators + Buy/Sell Signals"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "67cae2a4",
   "metadata": {},
   "outputs": [],
   "source": [
    "# show last fold's test period with signals overlaid on price\n",
    "last_test_dates = strat.index[-126:]  # last ~6 months\n",
    "viz = df.loc[last_test_dates].copy()\n",
    "sig = strat.loc[last_test_dates]\n",
    "\n",
    "fig = make_subplots(\n",
    "    rows=4, cols=1, shared_xaxes=True,\n",
    "    row_heights=[0.4, 0.2, 0.2, 0.2],\n",
    "    vertical_spacing=0.03,\n",
    "    subplot_titles=[\"Price + Bollinger Bands + Signals\", \"RSI(14)\", \"MACD\", \"Volume\"]\n",
    ")\n",
    "\n",
    "# Row 1: Candlestick + BB + signals\n",
    "fig.add_trace(go.Candlestick(\n",
    "    x=viz.index, open=viz[\"Open\"], high=viz[\"High\"], low=viz[\"Low\"], close=viz[\"Close\"],\n",
    "    name=\"OHLC\", increasing_line_color=\"steelblue\", decreasing_line_color=\"salmon\",\n",
    "), row=1, col=1)\n",
    "fig.add_trace(go.Scatter(x=viz.index, y=viz[\"bb_upper\"], line=dict(color=\"gray\", width=1, dash=\"dot\"), name=\"BB Upper\"), row=1, col=1)\n",
    "fig.add_trace(go.Scatter(x=viz.index, y=viz[\"bb_lower\"], line=dict(color=\"gray\", width=1, dash=\"dot\"), name=\"BB Lower\", fill=\"tonexty\", fillcolor=\"rgba(128,128,128,0.05)\"), row=1, col=1)\n",
    "fig.add_trace(go.Scatter(x=viz.index, y=viz[\"sma_50\"], line=dict(color=\"orange\", width=1), name=\"SMA 50\"), row=1, col=1)\n",
    "\n",
    "# buy/sell markers\n",
    "buy_mask = sig[\"signal\"] == 1\n",
    "changes = buy_mask.astype(int).diff()\n",
    "buy_entries = sig.index[changes == 1]\n",
    "sell_entries = sig.index[changes == -1]\n",
    "if len(buy_entries):\n",
    "    fig.add_trace(go.Scatter(x=buy_entries, y=viz.loc[buy_entries, \"Low\"] * 0.995,\n",
    "        mode=\"markers\", marker=dict(symbol=\"triangle-up\", size=10, color=\"green\"), name=\"BUY\"), row=1, col=1)\n",
    "if len(sell_entries):\n",
    "    fig.add_trace(go.Scatter(x=sell_entries, y=viz.loc[sell_entries, \"High\"] * 1.005,\n",
    "        mode=\"markers\", marker=dict(symbol=\"triangle-down\", size=10, color=\"red\"), name=\"SELL\"), row=1, col=1)\n",
    "\n",
    "# Row 2: RSI\n",
    "fig.add_trace(go.Scatter(x=viz.index, y=viz[\"rsi_14\"], line=dict(color=\"purple\", width=1.5), name=\"RSI 14\"), row=2, col=1)\n",
    "fig.add_hline(y=70, line_dash=\"dash\", line_color=\"red\", opacity=0.5, row=2, col=1)\n",
    "fig.add_hline(y=30, line_dash=\"dash\", line_color=\"green\", opacity=0.5, row=2, col=1)\n",
    "\n",
    "# Row 3: MACD\n",
    "fig.add_trace(go.Scatter(x=viz.index, y=viz[\"macd\"], line=dict(color=\"blue\", width=1.5), name=\"MACD\"), row=3, col=1)\n",
    "fig.add_trace(go.Scatter(x=viz.index, y=viz[\"macd_signal\"], line=dict(color=\"orange\", width=1), name=\"Signal\"), row=3, col=1)\n",
    "colors = [\"green\" if v >= 0 else \"red\" for v in viz[\"macd_hist\"]]\n",
    "fig.add_trace(go.Bar(x=viz.index, y=viz[\"macd_hist\"], marker_color=colors, name=\"Hist\", opacity=0.5), row=3, col=1)\n",
    "\n",
    "# Row 4: Volume\n",
    "fig.add_trace(go.Bar(x=viz.index, y=viz[\"Volume\"], marker_color=\"steelblue\", name=\"Volume\", opacity=0.5), row=4, col=1)\n",
    "fig.add_trace(go.Scatter(x=viz.index, y=viz[\"Volume\"].rolling(20).mean(), line=dict(color=\"orange\", width=1), name=\"Vol SMA20\"), row=4, col=1)\n",
    "\n",
    "fig.update_layout(height=900, title=f\"{TICKER} — Last Test Fold Signal Dashboard\", xaxis_rangeslider_visible=False, showlegend=False)\n",
    "fig.update_xaxes(rangeslider_visible=False)\n",
    "fig.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5b25b6c4",
   "metadata": {},
   "source": [
    "## Next Steps\n",
    "\n",
    "Things to iterate on from here:\n",
    "\n",
    "1. **Multi-asset**: swap `TICKER` to BTC-USD, QQQ, GLD, etc. or loop over a universe\n",
    "2. **Probability threshold**: instead of binary 0/1, use `proba > 0.6` for higher-conviction signals\n",
    "3. **Position sizing**: Kelly criterion via `PyPortfolioOpt` based on predicted probability\n",
    "4. **Regime filter**: add ADX/volatility regime detection — only trade in trending regimes\n",
    "5. **Transaction costs**: subtract realistic slippage (e.g., 5bps per trade) from returns\n",
    "6. **Alternative splitters you have installed**:\n",
    "   - `from tscv import GapWalkForward` — sklearn-compatible, handles gap + purge natively\n",
    "   - `from sktime.split import ExpandingWindowSplitter, SlidingWindowSplitter`\n",
    "   - `from sklearn.model_selection import TimeSeriesSplit` — basic but solid\n",
    "7. **LightGBM**: drop-in replacement for XGBoost, often faster on large feature sets\n",
    "8. **Meta-labeling** (Lopez de Prado): train a secondary model on whether the primary model's signals are correct"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.12.12"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}