b5bf689e72
- Add yfinance.org and defeatbeta-api.org reference docs - Fix defeatbeta_mapping.org: deprecated yfinance property names (quarterly_financials→quarterly_income_stmt, financials→income_stmt), longName vs longBusinessSummary conceptual mismatch, cashflow note typo - Add Mapping Limitations section with live verification results (AAPL): DuckDB 1.4.3 incompatibility, format differences, coverage gaps - Add docs/test_mapping.py as runnable mapping verification script - Add offline.py, persistent_cache.py, download_data.py, warmup_cache.py for offline/cached defeatbeta usage - Add aapl_yfinance.py exploration script and quant.py scaffold - Add .envrc (uv layout) and update pyproject.toml + uv.lock Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
637 lines
27 KiB
Plaintext
637 lines
27 KiB
Plaintext
{
|
|
"cells": [
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "0229f5ae",
|
|
"metadata": {},
|
|
"source": [
|
|
"# Quant Trading Scaffold\n",
|
|
"## Data Ingestion → Indicators → Walk-Forward ML → Backtesting → Tearsheet\n",
|
|
"\n",
|
|
"Pipeline:\n",
|
|
"1. **Ingest** OHLCV data via yfinance\n",
|
|
"2. **Engineer features** — momentum, trend, volatility, volume indicators\n",
|
|
"3. **Label** — binary classification (next-N-day return > 0)\n",
|
|
"4. **Walk-forward split** with purging (no leakage)\n",
|
|
"5. **Train** XGBoost classifier per fold\n",
|
|
"6. **Evaluate** with quantstats tearsheet"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "43b4d162",
|
|
"metadata": {},
|
|
"source": [
|
|
"## 1. Config & Imports"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 1,
|
|
"id": "aab1cebb",
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"ename": "ModuleNotFoundError",
|
|
"evalue": "No module named 'numpy'",
|
|
"output_type": "error",
|
|
"traceback": [
|
|
"\u001b[31m---------------------------------------------------------------------------\u001b[39m",
|
|
"\u001b[31mModuleNotFoundError\u001b[39m Traceback (most recent call last)",
|
|
"\u001b[36mCell\u001b[39m\u001b[36m \u001b[39m\u001b[32mIn[1]\u001b[39m\u001b[32m, line 6\u001b[39m\n\u001b[32m 3\u001b[39m \u001b[38;5;28;01mimport\u001b[39;00m\u001b[38;5;250m \u001b[39m\u001b[34;01mwarnings\u001b[39;00m\n\u001b[32m 4\u001b[39m warnings.filterwarnings(\u001b[33m\"\u001b[39m\u001b[33mignore\u001b[39m\u001b[33m\"\u001b[39m, category=\u001b[38;5;167;01mFutureWarning\u001b[39;00m)\n\u001b[32m----> \u001b[39m\u001b[32m6\u001b[39m \u001b[38;5;28;01mimport\u001b[39;00m\u001b[38;5;250m \u001b[39m\u001b[34;01mnumpy\u001b[39;00m\u001b[38;5;250m \u001b[39m\u001b[38;5;28;01mas\u001b[39;00m\u001b[38;5;250m \u001b[39m\u001b[34;01mnp\u001b[39;00m\n\u001b[32m 7\u001b[39m \u001b[38;5;28;01mimport\u001b[39;00m\u001b[38;5;250m \u001b[39m\u001b[34;01mpandas\u001b[39;00m\u001b[38;5;250m \u001b[39m\u001b[38;5;28;01mas\u001b[39;00m\u001b[38;5;250m \u001b[39m\u001b[34;01mpd\u001b[39;00m\n\u001b[32m 8\u001b[39m \u001b[38;5;28;01mimport\u001b[39;00m\u001b[38;5;250m \u001b[39m\u001b[34;01mpandas_ta\u001b[39;00m\u001b[38;5;250m \u001b[39m\u001b[38;5;28;01mas\u001b[39;00m\u001b[38;5;250m \u001b[39m\u001b[34;01mta\u001b[39;00m\n",
|
|
"\u001b[31mModuleNotFoundError\u001b[39m: No module named 'numpy'"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"from __future__ import annotations\n",
|
|
"\n",
|
|
"import warnings\n",
|
|
"warnings.filterwarnings(\"ignore\", category=FutureWarning)\n",
|
|
"\n",
|
|
"import numpy as np\n",
|
|
"import pandas as pd\n",
|
|
"import pandas_ta as ta\n",
|
|
"import yfinance as yf\n",
|
|
"import plotly.graph_objects as go\n",
|
|
"from plotly.subplots import make_subplots\n",
|
|
"from sklearn.model_selection import TimeSeriesSplit\n",
|
|
"from sklearn.metrics import accuracy_score, classification_report\n",
|
|
"from xgboost import XGBClassifier\n",
|
|
"import quantstats as qs\n",
|
|
"\n",
|
|
"# ── Config ──────────────────────────────────────────────────────\n",
|
|
"TICKER = \"SPY\"\n",
|
|
"START = \"2015-01-01\"\n",
|
|
"END = \"2025-12-31\"\n",
|
|
"HORIZON = 5 # predict N-day forward return\n",
|
|
"PURGE_GAP = 5 # gap between train/test to prevent leakage\n",
|
|
"N_SPLITS = 5 # walk-forward folds\n",
|
|
"TRAIN_MIN = 504 # ~2 years minimum training window\n",
|
|
"\n",
|
|
"print(f\"Config: {TICKER} | {START}→{END} | horizon={HORIZON}d | {N_SPLITS} folds\")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "28af2cae",
|
|
"metadata": {},
|
|
"source": [
|
|
"## 2. Data Ingestion"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "b4d755da",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"raw = yf.download(TICKER, start=START, end=END, auto_adjust=True)\n",
|
|
"# yfinance may return MultiIndex columns for single ticker — flatten\n",
|
|
"if isinstance(raw.columns, pd.MultiIndex):\n",
|
|
" raw.columns = raw.columns.droplevel(\"Ticker\")\n",
|
|
"raw.index = pd.DatetimeIndex(raw.index)\n",
|
|
"df = raw.copy()\n",
|
|
"print(f\"Downloaded {len(df)} bars: {df.index[0].date()} → {df.index[-1].date()}\")\n",
|
|
"df.tail(3)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "e9b1bad5",
|
|
"metadata": {},
|
|
"source": [
|
|
"## 3. Feature Engineering — Technical Indicators\n",
|
|
"\n",
|
|
"We compute features across 4 categories:\n",
|
|
"- **Momentum**: RSI, MACD, Stochastic, Williams %R, ROC\n",
|
|
"- **Trend**: SMA/EMA crossovers, ADX, Ichimoku\n",
|
|
"- **Volatility**: Bollinger Bands, ATR, Keltner Channels\n",
|
|
"- **Volume**: OBV, MFI, Accumulation/Distribution"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "a83bf612",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# ── Momentum ────────────────────────────────────────────────────\n",
|
|
"df[\"rsi_14\"] = ta.rsi(df[\"Close\"], length=14)\n",
|
|
"df[\"rsi_7\"] = ta.rsi(df[\"Close\"], length=7)\n",
|
|
"\n",
|
|
"macd = ta.macd(df[\"Close\"], fast=12, slow=26, signal=9)\n",
|
|
"df[\"macd\"] = macd.iloc[:, 0] # MACD line\n",
|
|
"df[\"macd_signal\"] = macd.iloc[:, 1] # signal line\n",
|
|
"df[\"macd_hist\"] = macd.iloc[:, 2] # histogram\n",
|
|
"\n",
|
|
"stoch = ta.stoch(df[\"High\"], df[\"Low\"], df[\"Close\"])\n",
|
|
"df[\"stoch_k\"] = stoch.iloc[:, 0]\n",
|
|
"df[\"stoch_d\"] = stoch.iloc[:, 1]\n",
|
|
"\n",
|
|
"df[\"willr_14\"] = ta.willr(df[\"High\"], df[\"Low\"], df[\"Close\"], length=14)\n",
|
|
"df[\"roc_10\"] = ta.roc(df[\"Close\"], length=10)\n",
|
|
"df[\"roc_21\"] = ta.roc(df[\"Close\"], length=21)\n",
|
|
"df[\"mom_10\"] = ta.mom(df[\"Close\"], length=10)\n",
|
|
"\n",
|
|
"# ── Trend ───────────────────────────────────────────────────────\n",
|
|
"df[\"sma_20\"] = ta.sma(df[\"Close\"], length=20)\n",
|
|
"df[\"sma_50\"] = ta.sma(df[\"Close\"], length=50)\n",
|
|
"df[\"sma_200\"] = ta.sma(df[\"Close\"], length=200)\n",
|
|
"df[\"ema_12\"] = ta.ema(df[\"Close\"], length=12)\n",
|
|
"df[\"ema_26\"] = ta.ema(df[\"Close\"], length=26)\n",
|
|
"\n",
|
|
"# crossover features (price relative to MAs)\n",
|
|
"df[\"close_over_sma20\"] = (df[\"Close\"] / df[\"sma_20\"]) - 1\n",
|
|
"df[\"close_over_sma50\"] = (df[\"Close\"] / df[\"sma_50\"]) - 1\n",
|
|
"df[\"close_over_sma200\"] = (df[\"Close\"] / df[\"sma_200\"]) - 1\n",
|
|
"df[\"sma20_over_sma50\"] = (df[\"sma_20\"] / df[\"sma_50\"]) - 1\n",
|
|
"df[\"sma50_over_sma200\"] = (df[\"sma_50\"] / df[\"sma_200\"]) - 1\n",
|
|
"\n",
|
|
"adx = ta.adx(df[\"High\"], df[\"Low\"], df[\"Close\"], length=14)\n",
|
|
"df[\"adx\"] = adx.iloc[:, 0]\n",
|
|
"df[\"di_plus\"] = adx.iloc[:, 1]\n",
|
|
"df[\"di_minus\"] = adx.iloc[:, 2]\n",
|
|
"\n",
|
|
"# ── Volatility ──────────────────────────────────────────────────\n",
|
|
"bbands = ta.bbands(df[\"Close\"], length=20, std=2)\n",
|
|
"df[\"bb_upper\"] = bbands.iloc[:, 0]\n",
|
|
"df[\"bb_mid\"] = bbands.iloc[:, 1]\n",
|
|
"df[\"bb_lower\"] = bbands.iloc[:, 2]\n",
|
|
"df[\"bb_width\"] = bbands.iloc[:, 3]\n",
|
|
"df[\"bb_pctb\"] = bbands.iloc[:, 4] # %B: where price is within bands\n",
|
|
"\n",
|
|
"df[\"atr_14\"] = ta.atr(df[\"High\"], df[\"Low\"], df[\"Close\"], length=14)\n",
|
|
"df[\"atr_pct\"] = df[\"atr_14\"] / df[\"Close\"] # normalized ATR\n",
|
|
"\n",
|
|
"kc = ta.kc(df[\"High\"], df[\"Low\"], df[\"Close\"], length=20)\n",
|
|
"df[\"kc_upper\"] = kc.iloc[:, 0]\n",
|
|
"df[\"kc_lower\"] = kc.iloc[:, 1]\n",
|
|
"\n",
|
|
"# volatility: rolling std of returns\n",
|
|
"df[\"vol_10\"] = df[\"Close\"].pct_change().rolling(10).std()\n",
|
|
"df[\"vol_21\"] = df[\"Close\"].pct_change().rolling(21).std()\n",
|
|
"\n",
|
|
"# ── Volume ──────────────────────────────────────────────────────\n",
|
|
"df[\"obv\"] = ta.obv(df[\"Close\"], df[\"Volume\"])\n",
|
|
"df[\"obv_sma20\"] = ta.sma(df[\"obv\"], length=20)\n",
|
|
"df[\"mfi_14\"] = ta.mfi(df[\"High\"], df[\"Low\"], df[\"Close\"], df[\"Volume\"], length=14)\n",
|
|
"ad = ta.ad(df[\"High\"], df[\"Low\"], df[\"Close\"], df[\"Volume\"])\n",
|
|
"df[\"ad_line\"] = ad\n",
|
|
"\n",
|
|
"# volume relative to average\n",
|
|
"df[\"vol_ratio_20\"] = df[\"Volume\"] / df[\"Volume\"].rolling(20).mean()\n",
|
|
"\n",
|
|
"# ── Returns features ────────────────────────────────────────────\n",
|
|
"df[\"ret_1d\"] = df[\"Close\"].pct_change(1)\n",
|
|
"df[\"ret_5d\"] = df[\"Close\"].pct_change(5)\n",
|
|
"df[\"ret_10d\"] = df[\"Close\"].pct_change(10)\n",
|
|
"df[\"ret_21d\"] = df[\"Close\"].pct_change(21)\n",
|
|
"\n",
|
|
"print(f\"Total columns after feature engineering: {len(df.columns)}\")\n",
|
|
"df.tail(3)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "907e377c",
|
|
"metadata": {},
|
|
"source": [
|
|
"## 4. Labeling — Forward Return Classification\n",
|
|
"\n",
|
|
"Target: is the N-day forward return positive? (buy signal = 1, sell/hold signal = 0)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "81daaa5f",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# forward return (what we're predicting)\n",
|
|
"df[\"fwd_ret\"] = df[\"Close\"].pct_change(HORIZON).shift(-HORIZON)\n",
|
|
"df[\"label\"] = (df[\"fwd_ret\"] > 0).astype(int)\n",
|
|
"\n",
|
|
"# ── Define feature columns (exclude raw OHLCV, target, and non-stationary cols)\n",
|
|
"EXCLUDE = {\n",
|
|
" \"Open\", \"High\", \"Low\", \"Close\", \"Volume\",\n",
|
|
" \"fwd_ret\", \"label\",\n",
|
|
" \"sma_20\", \"sma_50\", \"sma_200\", \"ema_12\", \"ema_26\", # non-stationary\n",
|
|
" \"bb_upper\", \"bb_mid\", \"bb_lower\", # non-stationary\n",
|
|
" \"kc_upper\", \"kc_lower\", # non-stationary\n",
|
|
" \"obv\", \"obv_sma20\", \"ad_line\", # non-stationary\n",
|
|
"}\n",
|
|
"FEATURES = [c for c in df.columns if c not in EXCLUDE]\n",
|
|
"\n",
|
|
"# drop rows with NaN (from indicator warm-up + forward label)\n",
|
|
"model_df = df[FEATURES + [\"label\", \"fwd_ret\"]].dropna()\n",
|
|
"\n",
|
|
"print(f\"Features: {len(FEATURES)}\")\n",
|
|
"print(f\"Usable rows: {len(model_df)} ({model_df.index[0].date()} → {model_df.index[-1].date()})\")\n",
|
|
"print(f\"Label balance: {model_df['label'].value_counts(normalize=True).to_dict()}\")\n",
|
|
"print(f\"\\nFeature list:\\n{FEATURES}\")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "28769141",
|
|
"metadata": {},
|
|
"source": [
|
|
"## 5. Walk-Forward Split with Purge Gap\n",
|
|
"\n",
|
|
"Time series data **cannot** use random k-fold — future data would leak into training.\n",
|
|
"\n",
|
|
"We use **expanding-window walk-forward** with a **purge gap** between train/test:\n",
|
|
"\n",
|
|
"```\n",
|
|
"Fold 1: [====TRAIN====]--gap--[TEST]\n",
|
|
"Fold 2: [========TRAIN========]--gap--[TEST]\n",
|
|
"Fold 3: [============TRAIN============]--gap--[TEST]\n",
|
|
"```\n",
|
|
"\n",
|
|
"The gap prevents label leakage from overlapping forward-return windows."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "60594682",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"def walk_forward_splits(n_samples: int, n_splits: int, test_size: int = 126,\n",
|
|
" purge_gap: int = 5, min_train: int = 504):\n",
|
|
" \"\"\"\n",
|
|
" Expanding-window walk-forward with purge gap.\n",
|
|
" \n",
|
|
" Yields (train_idx, test_idx) index arrays.\n",
|
|
" test_size: ~6 months of trading days\n",
|
|
" min_train: ~2 years of trading days\n",
|
|
" purge_gap: days between train end and test start\n",
|
|
" \"\"\"\n",
|
|
" total_test = n_splits * test_size\n",
|
|
" if min_train + total_test + n_splits * purge_gap > n_samples:\n",
|
|
" raise ValueError(f\"Not enough data for {n_splits} splits. \"\n",
|
|
" f\"Need {min_train + total_test + n_splits * purge_gap}, have {n_samples}\")\n",
|
|
" \n",
|
|
" for i in range(n_splits):\n",
|
|
" test_end = n_samples - (n_splits - 1 - i) * test_size\n",
|
|
" test_start = test_end - test_size\n",
|
|
" train_end = test_start - purge_gap\n",
|
|
" train_start = 0 # expanding window (use max(0, train_end - fixed_window) for sliding)\n",
|
|
" \n",
|
|
" train_idx = np.arange(train_start, train_end)\n",
|
|
" test_idx = np.arange(test_start, test_end)\n",
|
|
" yield train_idx, test_idx\n",
|
|
"\n",
|
|
"\n",
|
|
"# ── Visualize the splits ────────────────────────────────────────\n",
|
|
"X = model_df[FEATURES].values\n",
|
|
"y = model_df[\"label\"].values\n",
|
|
"dates = model_df.index\n",
|
|
"\n",
|
|
"fig = go.Figure()\n",
|
|
"for fold, (tr_idx, te_idx) in enumerate(walk_forward_splits(len(X), N_SPLITS, purge_gap=PURGE_GAP, min_train=TRAIN_MIN)):\n",
|
|
" fig.add_trace(go.Scatter(\n",
|
|
" x=[dates[tr_idx[0]], dates[tr_idx[-1]]], y=[fold, fold],\n",
|
|
" mode=\"lines\", line=dict(color=\"steelblue\", width=8),\n",
|
|
" name=f\"Train {fold}\" if fold == 0 else None, showlegend=(fold == 0),\n",
|
|
" ))\n",
|
|
" fig.add_trace(go.Scatter(\n",
|
|
" x=[dates[te_idx[0]], dates[te_idx[-1]]], y=[fold, fold],\n",
|
|
" mode=\"lines\", line=dict(color=\"coral\", width=8),\n",
|
|
" name=f\"Test {fold}\" if fold == 0 else None, showlegend=(fold == 0),\n",
|
|
" ))\n",
|
|
" print(f\"Fold {fold}: train {dates[tr_idx[0]].date()}→{dates[tr_idx[-1]].date()} \"\n",
|
|
" f\"({len(tr_idx)}d) | test {dates[te_idx[0]].date()}→{dates[te_idx[-1]].date()} ({len(te_idx)}d)\")\n",
|
|
"\n",
|
|
"fig.update_layout(title=\"Walk-Forward Splits\", yaxis_title=\"Fold\", height=300)\n",
|
|
"fig.show()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "a80d23c9",
|
|
"metadata": {},
|
|
"source": [
|
|
"## 6. Train XGBoost per Fold — Walk-Forward\n",
|
|
"\n",
|
|
"Train on expanding window, predict test fold, collect out-of-sample predictions."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "ca9b91e6",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"oos_preds = [] # out-of-sample predictions\n",
|
|
"oos_proba = [] # predicted probabilities\n",
|
|
"oos_labels = []\n",
|
|
"oos_dates = []\n",
|
|
"oos_fwd_ret = []\n",
|
|
"fold_metrics = []\n",
|
|
"\n",
|
|
"for fold, (tr_idx, te_idx) in enumerate(walk_forward_splits(len(X), N_SPLITS, purge_gap=PURGE_GAP, min_train=TRAIN_MIN)):\n",
|
|
" X_train, y_train = X[tr_idx], y[tr_idx]\n",
|
|
" X_test, y_test = X[te_idx], y[te_idx]\n",
|
|
" \n",
|
|
" model = XGBClassifier(\n",
|
|
" n_estimators=300,\n",
|
|
" max_depth=4,\n",
|
|
" learning_rate=0.05,\n",
|
|
" subsample=0.8,\n",
|
|
" colsample_bytree=0.8,\n",
|
|
" reg_alpha=0.1,\n",
|
|
" reg_lambda=1.0,\n",
|
|
" random_state=42,\n",
|
|
" eval_metric=\"logloss\",\n",
|
|
" early_stopping_rounds=30,\n",
|
|
" )\n",
|
|
" model.fit(\n",
|
|
" X_train, y_train,\n",
|
|
" eval_set=[(X_test, y_test)],\n",
|
|
" verbose=False,\n",
|
|
" )\n",
|
|
" \n",
|
|
" preds = model.predict(X_test)\n",
|
|
" proba = model.predict_proba(X_test)[:, 1]\n",
|
|
" acc = accuracy_score(y_test, preds)\n",
|
|
" \n",
|
|
" oos_preds.extend(preds)\n",
|
|
" oos_proba.extend(proba)\n",
|
|
" oos_labels.extend(y_test)\n",
|
|
" oos_dates.extend(dates[te_idx])\n",
|
|
" oos_fwd_ret.extend(model_df[\"fwd_ret\"].values[te_idx])\n",
|
|
" \n",
|
|
" fold_metrics.append({\"fold\": fold, \"accuracy\": acc, \"train_size\": len(tr_idx), \"test_size\": len(te_idx)})\n",
|
|
" print(f\"Fold {fold}: acc={acc:.3f} | train={len(tr_idx)} | test={len(te_idx)}\")\n",
|
|
"\n",
|
|
"print(f\"\\nOverall OOS accuracy: {accuracy_score(oos_labels, oos_preds):.3f}\")\n",
|
|
"print(classification_report(oos_labels, oos_preds, target_names=[\"SELL/HOLD\", \"BUY\"]))"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "ea7d30fb",
|
|
"metadata": {},
|
|
"source": [
|
|
"## 7. Feature Importance (Last Fold)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "06f941b8",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"imp = pd.Series(model.feature_importances_, index=FEATURES).sort_values(ascending=True)\n",
|
|
"fig = go.Figure(go.Bar(x=imp.tail(20), y=imp.tail(20).index, orientation=\"h\"))\n",
|
|
"fig.update_layout(title=\"Top 20 Feature Importances (last fold)\", height=500, margin=dict(l=150))\n",
|
|
"fig.show()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "1112fdda",
|
|
"metadata": {},
|
|
"source": [
|
|
"## 8. Strategy Simulation — Signal → Returns\n",
|
|
"\n",
|
|
"Convert model predictions to a strategy equity curve:\n",
|
|
"- **Signal = 1 (BUY)**: go long (earn the market return)\n",
|
|
"- **Signal = 0 (SELL/HOLD)**: stay in cash (earn 0)\n",
|
|
"\n",
|
|
"Compare against buy-and-hold benchmark."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "0893ddb0",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# Build strategy returns series from OOS predictions\n",
|
|
"strat = pd.DataFrame({\n",
|
|
" \"date\": oos_dates,\n",
|
|
" \"signal\": oos_preds,\n",
|
|
" \"proba\": oos_proba,\n",
|
|
" \"fwd_ret\": oos_fwd_ret,\n",
|
|
"}).set_index(\"date\")\n",
|
|
"\n",
|
|
"# daily returns: we use daily close-to-close returns, masked by signal\n",
|
|
"# align with actual daily returns (not forward returns) for proper equity curve\n",
|
|
"daily_ret = df[\"Close\"].pct_change().reindex(strat.index)\n",
|
|
"\n",
|
|
"# strategy return: market return when signal=1, 0 when signal=0\n",
|
|
"strat[\"strat_ret\"] = daily_ret * strat[\"signal\"]\n",
|
|
"strat[\"bench_ret\"] = daily_ret\n",
|
|
"\n",
|
|
"# cumulative\n",
|
|
"strat[\"strat_equity\"] = (1 + strat[\"strat_ret\"]).cumprod()\n",
|
|
"strat[\"bench_equity\"] = (1 + strat[\"bench_ret\"]).cumprod()\n",
|
|
"\n",
|
|
"# plot\n",
|
|
"fig = go.Figure()\n",
|
|
"fig.add_trace(go.Scatter(x=strat.index, y=strat[\"strat_equity\"], name=\"Strategy\", line=dict(color=\"steelblue\")))\n",
|
|
"fig.add_trace(go.Scatter(x=strat.index, y=strat[\"bench_equity\"], name=\"Buy & Hold\", line=dict(color=\"gray\", dash=\"dot\")))\n",
|
|
"\n",
|
|
"# shade buy signals\n",
|
|
"in_market = strat[\"signal\"] == 1\n",
|
|
"changes = in_market.astype(int).diff().fillna(0)\n",
|
|
"entries = strat.index[changes == 1]\n",
|
|
"exits = strat.index[changes == -1]\n",
|
|
"# align: if first signal is 1, start from beginning\n",
|
|
"if in_market.iloc[0]:\n",
|
|
" entries = entries.insert(0, strat.index[0])\n",
|
|
"if in_market.iloc[-1]:\n",
|
|
" exits = exits.append(pd.DatetimeIndex([strat.index[-1]]))\n",
|
|
"for ent, ext in zip(entries, exits):\n",
|
|
" fig.add_vrect(x0=ent, x1=ext, fillcolor=\"green\", opacity=0.07, line_width=0)\n",
|
|
"\n",
|
|
"fig.update_layout(\n",
|
|
" title=\"Strategy vs Buy & Hold (OOS)\",\n",
|
|
" yaxis_title=\"Equity ($1 start)\", height=450,\n",
|
|
")\n",
|
|
"fig.show()\n",
|
|
"\n",
|
|
"print(f\"Strategy final: ${strat['strat_equity'].iloc[-1]:.2f}\")\n",
|
|
"print(f\"Benchmark final: ${strat['bench_equity'].iloc[-1]:.2f}\")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "d757116a",
|
|
"metadata": {},
|
|
"source": [
|
|
"## 9. QuantStats Tearsheet\n",
|
|
"\n",
|
|
"Full performance report: Sharpe, Sortino, max drawdown, rolling metrics, monthly heatmap."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "34fdc588",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# quantstats expects a returns series with datetime index\n",
|
|
"strategy_returns = strat[\"strat_ret\"].copy()\n",
|
|
"strategy_returns.index = pd.DatetimeIndex(strategy_returns.index)\n",
|
|
"benchmark_returns = strat[\"bench_ret\"].copy()\n",
|
|
"benchmark_returns.index = pd.DatetimeIndex(benchmark_returns.index)\n",
|
|
"\n",
|
|
"qs.extend_pandas()\n",
|
|
"\n",
|
|
"# key metrics\n",
|
|
"print(\"=\" * 50)\n",
|
|
"print(\"STRATEGY METRICS (out-of-sample)\")\n",
|
|
"print(\"=\" * 50)\n",
|
|
"print(f\"Sharpe: {qs.stats.sharpe(strategy_returns):.2f}\")\n",
|
|
"print(f\"Sortino: {qs.stats.sortino(strategy_returns):.2f}\")\n",
|
|
"print(f\"Max Drawdown: {qs.stats.max_drawdown(strategy_returns):.2%}\")\n",
|
|
"print(f\"CAGR: {qs.stats.cagr(strategy_returns):.2%}\")\n",
|
|
"print(f\"Calmar: {qs.stats.calmar(strategy_returns):.2f}\")\n",
|
|
"print(f\"Win Rate: {qs.stats.win_rate(strategy_returns):.2%}\")\n",
|
|
"print(f\"Volatility: {qs.stats.volatility(strategy_returns):.2%}\")\n",
|
|
"print(f\"Avg Win: {qs.stats.avg_win(strategy_returns):.4f}\")\n",
|
|
"print(f\"Avg Loss: {qs.stats.avg_loss(strategy_returns):.4f}\")\n",
|
|
"print(f\"Profit Factor:{qs.stats.profit_factor(strategy_returns):.2f}\")\n",
|
|
"print(\"=\" * 50)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "6799c588",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# full HTML tearsheet — saved to file + displayed inline\n",
|
|
"qs.reports.html(strategy_returns, benchmark=benchmark_returns,\n",
|
|
" title=f\"{TICKER} ML Signal Strategy (OOS Walk-Forward)\",\n",
|
|
" output=\"tearsheet.html\")\n",
|
|
"print(\"Tearsheet saved to tearsheet.html\")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "4bb838bb",
|
|
"metadata": {},
|
|
"source": [
|
|
"## 10. Signal Dashboard — Price + Indicators + Buy/Sell Signals"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "67cae2a4",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# show last fold's test period with signals overlaid on price\n",
|
|
"last_test_dates = strat.index[-126:] # last ~6 months\n",
|
|
"viz = df.loc[last_test_dates].copy()\n",
|
|
"sig = strat.loc[last_test_dates]\n",
|
|
"\n",
|
|
"fig = make_subplots(\n",
|
|
" rows=4, cols=1, shared_xaxes=True,\n",
|
|
" row_heights=[0.4, 0.2, 0.2, 0.2],\n",
|
|
" vertical_spacing=0.03,\n",
|
|
" subplot_titles=[\"Price + Bollinger Bands + Signals\", \"RSI(14)\", \"MACD\", \"Volume\"]\n",
|
|
")\n",
|
|
"\n",
|
|
"# Row 1: Candlestick + BB + signals\n",
|
|
"fig.add_trace(go.Candlestick(\n",
|
|
" x=viz.index, open=viz[\"Open\"], high=viz[\"High\"], low=viz[\"Low\"], close=viz[\"Close\"],\n",
|
|
" name=\"OHLC\", increasing_line_color=\"steelblue\", decreasing_line_color=\"salmon\",\n",
|
|
"), row=1, col=1)\n",
|
|
"fig.add_trace(go.Scatter(x=viz.index, y=viz[\"bb_upper\"], line=dict(color=\"gray\", width=1, dash=\"dot\"), name=\"BB Upper\"), row=1, col=1)\n",
|
|
"fig.add_trace(go.Scatter(x=viz.index, y=viz[\"bb_lower\"], line=dict(color=\"gray\", width=1, dash=\"dot\"), name=\"BB Lower\", fill=\"tonexty\", fillcolor=\"rgba(128,128,128,0.05)\"), row=1, col=1)\n",
|
|
"fig.add_trace(go.Scatter(x=viz.index, y=viz[\"sma_50\"], line=dict(color=\"orange\", width=1), name=\"SMA 50\"), row=1, col=1)\n",
|
|
"\n",
|
|
"# buy/sell markers\n",
|
|
"buy_mask = sig[\"signal\"] == 1\n",
|
|
"changes = buy_mask.astype(int).diff()\n",
|
|
"buy_entries = sig.index[changes == 1]\n",
|
|
"sell_entries = sig.index[changes == -1]\n",
|
|
"if len(buy_entries):\n",
|
|
" fig.add_trace(go.Scatter(x=buy_entries, y=viz.loc[buy_entries, \"Low\"] * 0.995,\n",
|
|
" mode=\"markers\", marker=dict(symbol=\"triangle-up\", size=10, color=\"green\"), name=\"BUY\"), row=1, col=1)\n",
|
|
"if len(sell_entries):\n",
|
|
" fig.add_trace(go.Scatter(x=sell_entries, y=viz.loc[sell_entries, \"High\"] * 1.005,\n",
|
|
" mode=\"markers\", marker=dict(symbol=\"triangle-down\", size=10, color=\"red\"), name=\"SELL\"), row=1, col=1)\n",
|
|
"\n",
|
|
"# Row 2: RSI\n",
|
|
"fig.add_trace(go.Scatter(x=viz.index, y=viz[\"rsi_14\"], line=dict(color=\"purple\", width=1.5), name=\"RSI 14\"), row=2, col=1)\n",
|
|
"fig.add_hline(y=70, line_dash=\"dash\", line_color=\"red\", opacity=0.5, row=2, col=1)\n",
|
|
"fig.add_hline(y=30, line_dash=\"dash\", line_color=\"green\", opacity=0.5, row=2, col=1)\n",
|
|
"\n",
|
|
"# Row 3: MACD\n",
|
|
"fig.add_trace(go.Scatter(x=viz.index, y=viz[\"macd\"], line=dict(color=\"blue\", width=1.5), name=\"MACD\"), row=3, col=1)\n",
|
|
"fig.add_trace(go.Scatter(x=viz.index, y=viz[\"macd_signal\"], line=dict(color=\"orange\", width=1), name=\"Signal\"), row=3, col=1)\n",
|
|
"colors = [\"green\" if v >= 0 else \"red\" for v in viz[\"macd_hist\"]]\n",
|
|
"fig.add_trace(go.Bar(x=viz.index, y=viz[\"macd_hist\"], marker_color=colors, name=\"Hist\", opacity=0.5), row=3, col=1)\n",
|
|
"\n",
|
|
"# Row 4: Volume\n",
|
|
"fig.add_trace(go.Bar(x=viz.index, y=viz[\"Volume\"], marker_color=\"steelblue\", name=\"Volume\", opacity=0.5), row=4, col=1)\n",
|
|
"fig.add_trace(go.Scatter(x=viz.index, y=viz[\"Volume\"].rolling(20).mean(), line=dict(color=\"orange\", width=1), name=\"Vol SMA20\"), row=4, col=1)\n",
|
|
"\n",
|
|
"fig.update_layout(height=900, title=f\"{TICKER} — Last Test Fold Signal Dashboard\", xaxis_rangeslider_visible=False, showlegend=False)\n",
|
|
"fig.update_xaxes(rangeslider_visible=False)\n",
|
|
"fig.show()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "5b25b6c4",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Next Steps\n",
|
|
"\n",
|
|
"Things to iterate on from here:\n",
|
|
"\n",
|
|
"1. **Multi-asset**: swap `TICKER` to BTC-USD, QQQ, GLD, etc. or loop over a universe\n",
|
|
"2. **Probability threshold**: instead of binary 0/1, use `proba > 0.6` for higher-conviction signals\n",
|
|
"3. **Position sizing**: Kelly criterion via `PyPortfolioOpt` based on predicted probability\n",
|
|
"4. **Regime filter**: add ADX/volatility regime detection — only trade in trending regimes\n",
|
|
"5. **Transaction costs**: subtract realistic slippage (e.g., 5bps per trade) from returns\n",
|
|
"6. **Alternative splitters you have installed**:\n",
|
|
" - `from tscv import GapWalkForward` — sklearn-compatible, handles gap + purge natively\n",
|
|
" - `from sktime.split import ExpandingWindowSplitter, SlidingWindowSplitter`\n",
|
|
" - `from sklearn.model_selection import TimeSeriesSplit` — basic but solid\n",
|
|
"7. **LightGBM**: drop-in replacement for XGBoost, often faster on large feature sets\n",
|
|
"8. **Meta-labeling** (Lopez de Prado): train a secondary model on whether the primary model's signals are correct"
|
|
]
|
|
}
|
|
],
|
|
"metadata": {
|
|
"kernelspec": {
|
|
"display_name": "Python 3 (ipykernel)",
|
|
"language": "python",
|
|
"name": "python3"
|
|
},
|
|
"language_info": {
|
|
"codemirror_mode": {
|
|
"name": "ipython",
|
|
"version": 3
|
|
},
|
|
"file_extension": ".py",
|
|
"mimetype": "text/x-python",
|
|
"name": "python",
|
|
"nbconvert_exporter": "python",
|
|
"pygments_lexer": "ipython3",
|
|
"version": "3.12.12"
|
|
}
|
|
},
|
|
"nbformat": 4,
|
|
"nbformat_minor": 5
|
|
}
|