docs: add API references, mapping corrections, and verification script

- Add yfinance.org and defeatbeta-api.org reference docs
- Fix defeatbeta_mapping.org: deprecated yfinance property names
  (quarterly_financials→quarterly_income_stmt, financials→income_stmt),
  longName vs longBusinessSummary conceptual mismatch, cashflow note typo
- Add Mapping Limitations section with live verification results (AAPL):
  DuckDB 1.4.3 incompatibility, format differences, coverage gaps
- Add docs/test_mapping.py as runnable mapping verification script
- Add offline.py, persistent_cache.py, download_data.py, warmup_cache.py
  for offline/cached defeatbeta usage
- Add aapl_yfinance.py exploration script and quant.py scaffold
- Add .envrc (uv layout) and update pyproject.toml + uv.lock

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-04-26 15:33:21 +08:00
parent b71a8e77b0
commit b5bf689e72
16 changed files with 3650 additions and 141 deletions
+203
View File
@@ -0,0 +1,203 @@
#+TITLE: defeatbeta-api Reference
#+AUTHOR: Wong Ding Feng
#+DATE: 2026-04-25
* How Data Retrieval Works
** NOT a full download
Uses *DuckDB + ~cache_httpfs~ extension* querying *remote Parquet files on HuggingFace*
(~defeatbeta/yahoo-finance-data~). Every query runs SQL directly against remote files:
#+begin_src sql
SELECT * FROM 'https://huggingface.co/.../stock_prices.parquet' WHERE symbol = 'AAPL'
#+end_src
Parquet's columnar format + DuckDB *predicate pushdown* = only the row-groups matching
your ticker are fetched over HTTP range requests. Not the full 3-4 GB file.
** On-disk cache
- Default 1 GB cache at ~~/.defeatbeta/cache/~
- Stores fetched blocks so repeated queries are fast
- On startup: checks ~spec.json~ on HuggingFace, clears stale cache if dataset was updated
* Getting All Available Tickers
#+begin_src python
from defeatbeta_api.data.company_meta import CompanyMeta
meta = CompanyMeta()
all_tickers = meta.get_all_tickers() # List[str]
all_companies = meta.get_all_companies_info() # List[dict]: symbol, name, cik, currency
#+end_src
Reads ~company_tickers.json~ from HuggingFace — a small JSON, not the big Parquet files.
* Single Ticker API — ~Ticker("AAPL")~
#+begin_src python
from defeatbeta_api.data.ticker import Ticker
t = Ticker("AAPL")
#+end_src
** Company Info
| Method | Returns | What it gives |
|----------------------------+-------------+---------------------------------------------------------|
| ~info()~ | DataFrame | Profile: name, sector, industry, description, headcount |
| ~officers()~ | DataFrame | Executive officers |
| ~sec_filing()~ | DataFrame | SEC filings list |
| ~news()~ | ~News~ object | Latest news articles |
| ~earning_call_transcripts()~ | ~Transcripts~ | Earnings call transcripts |
| ~calendar()~ | DataFrame | Upcoming earnings dates |
** Prices & Basic Finance
| Method | Returns | What it gives |
|------------------------------------+-----------+------------------------------|
| ~price()~ | DataFrame | Historical OHLCV prices |
| ~splits()~ | DataFrame | Stock split events |
| ~dividends()~ | DataFrame | Dividend payment history |
| ~shares()~ | DataFrame | Shares outstanding over time |
| ~beta(period="5y", benchmark="SPY")~ | DataFrame | Calculated beta vs benchmark |
| ~currency(symbol)~ | DataFrame | Exchange rate history |
| ~ttm_eps()~ | DataFrame | Trailing 12-month EPS |
** Financial Statements
| Method | Returns | What it gives |
|------------------------------+-----------+-------------------------|
| ~quarterly_income_statement()~ | ~Statement~ | Quarterly P&L |
| ~annual_income_statement()~ | ~Statement~ | Annual P&L |
| ~quarterly_balance_sheet()~ | ~Statement~ | Quarterly balance sheet |
| ~annual_balance_sheet()~ | ~Statement~ | Annual balance sheet |
| ~quarterly_cash_flow()~ | ~Statement~ | Quarterly cash flow |
| ~annual_cash_flow()~ | ~Statement~ | Annual cash flow |
** TTM Aggregates
| Method | Returns | What it gives |
|----------------------------------------+-----------+----------------------------------|
| ~ttm_revenue()~ | DataFrame | Trailing 12-month revenue |
| ~ttm_fcf()~ | DataFrame | Trailing 12-month free cash flow |
| ~ttm_ebitda()~ | DataFrame | Trailing 12-month EBITDA |
| ~ttm_net_income_common_stockholders()~ | DataFrame | Trailing 12-month net income |
| ~ttm_pe()~ | DataFrame | Trailing P/E (price / ttm_eps) |
** Revenue Breakdown
| Method | Returns | What it gives |
|------------------------+-----------+-----------------------------|
| ~revenue_by_segment()~ | DataFrame | Revenue by business segment |
| ~revenue_by_geography()~ | DataFrame | Revenue by region |
| ~revenue_by_product()~ | DataFrame | Revenue by product line |
** Valuation Multiples
| Method | Returns | What it gives |
|-------------------------+-----------+------------------------------|
| ~market_capitalization()~ | DataFrame | Historical market cap |
| ~ps_ratio()~ | DataFrame | Price/Sales ratio |
| ~pb_ratio()~ | DataFrame | Price/Book ratio |
| ~peg_ratio()~ | DataFrame | PEG ratio |
| ~enterprise_value()~ | DataFrame | Enterprise value |
| ~enterprise_to_revenue()~ | DataFrame | EV/Revenue |
| ~enterprise_to_ebitda()~ | DataFrame | EV/EBITDA |
| ~debt_to_equity()~ | DataFrame | D/E ratio |
| ~net_debt_ttm()~ | DataFrame | Net debt (TTM) |
| ~wacc()~ | DataFrame | Weighted avg cost of capital |
** Profitability Returns
| Method | Returns | What it gives |
|---------------------+-----------+------------------------------------|
| ~roe()~ | DataFrame | Return on equity |
| ~roa()~ | DataFrame | Return on assets |
| ~roic()~ | DataFrame | Return on invested capital |
| ~roce()~ | DataFrame | Return on capital employed |
| ~equity_multiplier()~ | DataFrame | Financial leverage (assets/equity) |
| ~asset_turnover()~ | DataFrame | Revenue/assets efficiency |
** Margins
| Method | Returns | What it gives |
|------------------------------+-----------+--------------------|
| ~quarterly_gross_margin()~ | DataFrame | Gross margin % |
| ~annual_gross_margin()~ | DataFrame | Gross margin % |
| ~quarterly_operating_margin()~ | DataFrame | Operating margin % |
| ~annual_operating_margin()~ | DataFrame | Operating margin % |
| ~quarterly_net_margin()~ | DataFrame | Net margin % |
| ~annual_net_margin()~ | DataFrame | Net margin % |
| ~quarterly_ebitda_margin()~ | DataFrame | EBITDA margin % |
| ~annual_ebitda_margin()~ | DataFrame | EBITDA margin % |
| ~quarterly_fcf_margin()~ | DataFrame | FCF margin % |
| ~annual_fcf_margin()~ | DataFrame | FCF margin % |
** YoY Growth
| Method | Returns | What it gives |
|-----------------------------------------+-----------+---------------------|
| ~quarterly_revenue_yoy_growth()~ | DataFrame | Revenue growth % |
| ~annual_revenue_yoy_growth()~ | DataFrame | Revenue growth % |
| ~quarterly_operating_income_yoy_growth()~ | DataFrame | Op. income growth % |
| ~annual_operating_income_yoy_growth()~ | DataFrame | Op. income growth % |
| ~quarterly_ebitda_yoy_growth()~ | DataFrame | EBITDA growth % |
| ~annual_ebitda_yoy_growth()~ | DataFrame | EBITDA growth % |
| ~quarterly_net_income_yoy_growth()~ | DataFrame | Net income growth % |
| ~annual_net_income_yoy_growth()~ | DataFrame | Net income growth % |
| ~quarterly_fcf_yoy_growth()~ | DataFrame | FCF growth % |
| ~annual_fcf_yoy_growth()~ | DataFrame | FCF growth % |
| ~quarterly_eps_yoy_growth()~ | DataFrame | EPS growth % |
| ~quarterly_ttm_eps_yoy_growth()~ | DataFrame | TTM EPS growth % |
** Industry Comparisons
Uses the ticker's own industry to benchmark against peers.
| Method | Returns | What it gives |
|------------------------------------+-----------+--------------------------|
| ~industry_ttm_pe()~ | DataFrame | Avg P/E across industry |
| ~industry_ps_ratio()~ | DataFrame | Industry P/S |
| ~industry_pb_ratio()~ | DataFrame | Industry P/B |
| ~industry_roe()~ | DataFrame | Industry ROE |
| ~industry_roa()~ | DataFrame | Industry ROA |
| ~industry_roic()~ | DataFrame | Industry ROIC |
| ~industry_equity_multiplier()~ | DataFrame | Industry leverage |
| ~industry_asset_turnover()~ | DataFrame | Industry efficiency |
| ~industry_quarterly_gross_margin()~ | DataFrame | Industry gross margin % |
| ~industry_quarterly_ebitda_margin()~ | DataFrame | Industry EBITDA margin % |
| ~industry_quarterly_net_margin()~ | DataFrame | Industry net margin % |
** DCF / Advanced
| Method | Returns | What it gives |
|-----------------------------+---------+----------------------------------------|
| ~dcf_data()~ | dict | All raw inputs for a DCF model |
| ~dcf()~ | dict | Full DCF valuation + exports ~.xlsx~ |
| ~download_data_performance()~ | str | Timing summary of data fetch durations |
* Multi-Ticker API — ~Tickers(["AAPL", "NVDA"])~
#+begin_src python
from defeatbeta_api.data.tickers import Tickers
t = Tickers(["AAPL", "NVDA"])
t = Tickers(["AAPL", "NVDA"], max_workers=2) # limit parallelism
#+end_src
Wraps all ~Ticker~ methods, running them in *parallel threads*.
- Methods returning simple data → *combined DataFrame* (all tickers in one table)
- Methods returning complex objects (statements, news, transcripts) → ~{symbol: result}~ dict
Same method names as ~Ticker~, plus industry comparison methods operate per unique
industry represented across the list.
#+begin_src python
t.info() # → DataFrame (combined)
t.price() # → DataFrame (combined)
t.annual_income_statement() # → {'AAPL': Statement(...), 'NVDA': Statement(...)}
t.news() # → {'AAPL': News(...), 'NVDA': News(...)}
t.earning_call_transcripts() # → {'AAPL': Transcripts(...), 'NVDA': Transcripts(...)}
t.industry_roe() # → DataFrame (one row per unique industry)
#+end_src