docs: README.org and setup notes in docs/setup.org
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
+123
@@ -0,0 +1,123 @@
|
|||||||
|
#+title: RAGs — Private Learning Tool
|
||||||
|
#+author: df
|
||||||
|
#+date: 2026-04-19
|
||||||
|
|
||||||
|
* Overview
|
||||||
|
|
||||||
|
Two local RAG systems for a private learning tool with Anki export.
|
||||||
|
|
||||||
|
| Project | Purpose |
|
||||||
|
|----------+--------------------------------------------------|
|
||||||
|
| LightRAG | Graph-based RAG — ingest docs, query concepts |
|
||||||
|
| Graphiti | Temporal knowledge graph — track what you learned and when |
|
||||||
|
|
||||||
|
Both run fully local via Ollama. No cloud, no API keys.
|
||||||
|
|
||||||
|
* Prerequisites
|
||||||
|
|
||||||
|
** Ollama
|
||||||
|
|
||||||
|
Install Ollama and pull the required models:
|
||||||
|
|
||||||
|
#+begin_src sh
|
||||||
|
ollama pull qwen2.5:7b
|
||||||
|
ollama pull nomic-embed-text
|
||||||
|
#+end_src
|
||||||
|
|
||||||
|
Ollama must be running before starting either service.
|
||||||
|
|
||||||
|
** Nix
|
||||||
|
|
||||||
|
Flakes must be enabled. Add to your NixOS config or =~/.config/nix/nix.conf=:
|
||||||
|
|
||||||
|
#+begin_src
|
||||||
|
experimental-features = nix-command flakes
|
||||||
|
#+end_src
|
||||||
|
|
||||||
|
* Usage
|
||||||
|
|
||||||
|
** LightRAG
|
||||||
|
|
||||||
|
Ingest documents and query them as a knowledge graph.
|
||||||
|
|
||||||
|
#+begin_src sh
|
||||||
|
nix develop .#lightrag
|
||||||
|
lightrag-server
|
||||||
|
#+end_src
|
||||||
|
|
||||||
|
Server runs at =http://localhost:9621=.
|
||||||
|
|
||||||
|
Configure in =.env.lightrag=. Default storage is =./lightrag/rag_storage/=.
|
||||||
|
|
||||||
|
** Graphiti
|
||||||
|
|
||||||
|
Temporal memory graph — tracks concepts and when you learned them.
|
||||||
|
|
||||||
|
Start Neo4j first (in a separate terminal):
|
||||||
|
|
||||||
|
#+begin_src sh
|
||||||
|
nix run .#neo4j-start
|
||||||
|
#+end_src
|
||||||
|
|
||||||
|
Then enter the shell:
|
||||||
|
|
||||||
|
#+begin_src sh
|
||||||
|
nix develop .#graphiti
|
||||||
|
#+end_src
|
||||||
|
|
||||||
|
Configure in =.env.graphiti=.
|
||||||
|
|
||||||
|
** Neo4j Management
|
||||||
|
|
||||||
|
#+begin_src sh
|
||||||
|
nix run .#neo4j-start # start daemon
|
||||||
|
nix run .#neo4j-stop # stop daemon
|
||||||
|
#+end_src
|
||||||
|
|
||||||
|
Data persists in =./data/neo4j/=. Web UI at =http://localhost:7474=.
|
||||||
|
|
||||||
|
* Configuration
|
||||||
|
|
||||||
|
** .env.lightrag
|
||||||
|
|
||||||
|
| Variable | Default | Notes |
|
||||||
|
|----------------------+----------------------+--------------------------|
|
||||||
|
| =LLM_BINDING= | =ollama= | |
|
||||||
|
| =LLM_MODEL= | =qwen2.5:7b= | Change to any Ollama model |
|
||||||
|
| =EMBEDDING_MODEL= | =nomic-embed-text= | |
|
||||||
|
| =EMBEDDING_DIM= | =768= | Must match model |
|
||||||
|
| =RAG_DIR= | =./rag_storage= | Where graph data lives |
|
||||||
|
| =PORT= | =9621= | |
|
||||||
|
|
||||||
|
** .env.graphiti
|
||||||
|
|
||||||
|
| Variable | Default | Notes |
|
||||||
|
|-------------------+------------------------------+-------------------------------|
|
||||||
|
| =NEO4J_URI= | =bolt://localhost:7687= | |
|
||||||
|
| =OPENAI_BASE_URL= | =http://localhost:11434/v1= | Ollama OpenAI-compatible API |
|
||||||
|
| =OPENAI_API_KEY= | =ollama= | Dummy value, required by SDK |
|
||||||
|
| =MODEL_NAME= | =qwen2.5:7b= | |
|
||||||
|
| =EMBEDDING_MODEL= | =nomic-embed-text= | |
|
||||||
|
| =EMBEDDING_DIM= | =768= | Must match model |
|
||||||
|
|
||||||
|
* Structure
|
||||||
|
|
||||||
|
#+begin_src
|
||||||
|
rags/
|
||||||
|
├── flake.nix — Nix devShells and neo4j apps
|
||||||
|
├── flake.lock
|
||||||
|
├── .env.lightrag — LightRAG runtime config
|
||||||
|
├── .env.graphiti — Graphiti runtime config
|
||||||
|
├── lightrag/ — submodule: hkuds/lightrag
|
||||||
|
├── graphiti/ — submodule: getzep/graphiti
|
||||||
|
├── data/
|
||||||
|
│ └── neo4j/ — Neo4j data (gitignored)
|
||||||
|
└── docs/
|
||||||
|
└── setup.org — How this was set up
|
||||||
|
#+end_src
|
||||||
|
|
||||||
|
* Submodules
|
||||||
|
|
||||||
|
#+begin_src sh
|
||||||
|
git submodule update --init --recursive
|
||||||
|
#+end_src
|
||||||
+151
@@ -0,0 +1,151 @@
|
|||||||
|
#+title: Setup Notes
|
||||||
|
#+date: 2026-04-19
|
||||||
|
|
||||||
|
* What We're Building and Why
|
||||||
|
|
||||||
|
Private learning tool. Ingest study materials → query concepts → export to Anki.
|
||||||
|
|
||||||
|
Five RAG frameworks were considered: LightRAG, Graphiti, Morphik, R2R, Agentset.
|
||||||
|
|
||||||
|
** Why LightRAG
|
||||||
|
|
||||||
|
Graph-based RAG — it builds a knowledge graph from your documents, not just a
|
||||||
|
flat vector index. Queries traverse relationships between concepts, which maps
|
||||||
|
naturally to Anki's card/tag structure. File-based storage, minimal deps, works
|
||||||
|
with Ollama.
|
||||||
|
|
||||||
|
** Why Graphiti
|
||||||
|
|
||||||
|
Temporal knowledge graph designed for agent memory. Tracks *when* facts were
|
||||||
|
learned and how they change over time. Complements LightRAG: LightRAG indexes
|
||||||
|
your source material, Graphiti tracks your evolving understanding of it.
|
||||||
|
|
||||||
|
** What Was Skipped and Why
|
||||||
|
|
||||||
|
| Project | Reason skipped |
|
||||||
|
|----------+-------------------------------------------------------------|
|
||||||
|
| Morphik | Multimodal (ColPali) — only useful if materials have images |
|
||||||
|
| R2R | 6+ services (MinIO, RabbitMQ, Hatchet, 2x Postgres) |
|
||||||
|
| Agentset | Bun/TypeScript monorepo, needs Supabase + Trigger.dev |
|
||||||
|
|
||||||
|
* Project Structure
|
||||||
|
|
||||||
|
Git repo with two submodules:
|
||||||
|
|
||||||
|
#+begin_src sh
|
||||||
|
git init
|
||||||
|
git submodule add https://github.com/hkuds/lightrag lightrag
|
||||||
|
git submodule add https://github.com/getzep/graphiti graphiti
|
||||||
|
#+end_src
|
||||||
|
|
||||||
|
* Nix Flake Design
|
||||||
|
|
||||||
|
** Goal: impure but reproducible shells
|
||||||
|
|
||||||
|
Packaging Python with Nix properly (=buildPythonPackage=, wheels in the nix
|
||||||
|
store) is slow and often breaks on native extensions. The tradeoff chosen:
|
||||||
|
|
||||||
|
- Nix provides the runtime: Python 3.12, uv, Neo4j, curl
|
||||||
|
- =uv sync= installs PyPI deps into a =.venv= outside the nix store at shell entry
|
||||||
|
- =.venv= dirs are gitignored, recreated on first =nix develop=
|
||||||
|
|
||||||
|
This is impure — the =.venv= contents aren't pinned by Nix — but =uv.lock= in
|
||||||
|
each submodule pins the exact PyPI versions, so it's reproducible enough.
|
||||||
|
|
||||||
|
** Two devShells
|
||||||
|
|
||||||
|
#+begin_src nix
|
||||||
|
devShells.${system} = {
|
||||||
|
lightrag = pkgs.mkShell { ... };
|
||||||
|
graphiti = pkgs.mkShell { ... };
|
||||||
|
};
|
||||||
|
#+end_src
|
||||||
|
|
||||||
|
Each shell:
|
||||||
|
1. Sets =UV_PYTHON= to the nix-provided Python 3.12 binary
|
||||||
|
2. Sets =UV_PROJECT_ENVIRONMENT= so uv puts the venv in the project dir
|
||||||
|
3. Sets =LD_LIBRARY_PATH= for native wheel compatibility (see below)
|
||||||
|
4. Runs =uv sync= on first entry
|
||||||
|
5. Sources =.env.<project>= for runtime config
|
||||||
|
|
||||||
|
** Neo4j as nix apps
|
||||||
|
|
||||||
|
#+begin_src nix
|
||||||
|
apps.${system} = {
|
||||||
|
neo4j-start = { type = "app"; program = "${startNeo4j}"; };
|
||||||
|
neo4j-stop = { type = "app"; program = "${stopNeo4j}"; };
|
||||||
|
};
|
||||||
|
#+end_src
|
||||||
|
|
||||||
|
=pkgs.neo4j= (version 2026.02.2) is in nixpkgs. The startup script writes a
|
||||||
|
=neo4j.conf= to =./data/neo4j/conf/= at runtime and sets =NEO4J_CONF= to point
|
||||||
|
there. Neo4j respects =NEO4J_CONF= as a directory containing =neo4j.conf=.
|
||||||
|
|
||||||
|
Auth is disabled (=dbms.security.auth_enabled=false=) for local dev.
|
||||||
|
|
||||||
|
* Problems Solved
|
||||||
|
|
||||||
|
** Wrong Python version (3.14 instead of 3.12)
|
||||||
|
|
||||||
|
The system Python on this machine is 3.14. =uv= was picking it up instead of
|
||||||
|
the nix-provided =python312=. Fix: pin =UV_PYTHON= explicitly in the shellHook:
|
||||||
|
|
||||||
|
#+begin_src nix
|
||||||
|
export UV_PYTHON = "${pkgs.python312}/bin/python3.12";
|
||||||
|
#+end_src
|
||||||
|
|
||||||
|
** libstdc++.so.6 not found
|
||||||
|
|
||||||
|
PyPI wheels for numpy and other native extensions link against =libstdc++.so.6=.
|
||||||
|
On NixOS this library isn't in standard paths. Fix: add to =LD_LIBRARY_PATH= in
|
||||||
|
the shellHook:
|
||||||
|
|
||||||
|
#+begin_src nix
|
||||||
|
export LD_LIBRARY_PATH = "${pkgs.lib.makeLibraryPath [
|
||||||
|
pkgs.stdenv.cc.cc
|
||||||
|
pkgs.zlib
|
||||||
|
]}:$LD_LIBRARY_PATH";
|
||||||
|
#+end_src
|
||||||
|
|
||||||
|
** LightRAG server missing fastapi
|
||||||
|
|
||||||
|
=uv sync= alone doesn't install the API server deps — they're behind an optional
|
||||||
|
extra. Fix: use =uv sync --extra api= in the lightrag shellHook.
|
||||||
|
|
||||||
|
** Runtime paths in shellHook
|
||||||
|
|
||||||
|
=builtins.toString ./.= in a Nix flake evaluates to the flake's path in the
|
||||||
|
*nix store*, not the user's working directory. Using it for =cd= and venv paths
|
||||||
|
would point into =/nix/store/...=. Fix: use =$PWD= (the directory where the
|
||||||
|
user runs =nix develop=) for all runtime paths:
|
||||||
|
|
||||||
|
#+begin_src bash
|
||||||
|
RAGS_ROOT="$PWD"
|
||||||
|
export VIRTUAL_ENV="$RAGS_ROOT/lightrag/.venv"
|
||||||
|
cd "$RAGS_ROOT/lightrag"
|
||||||
|
#+end_src
|
||||||
|
|
||||||
|
* Graphiti + Ollama
|
||||||
|
|
||||||
|
Graphiti's LLM and embedder clients are OpenAI SDK wrappers. Ollama exposes an
|
||||||
|
OpenAI-compatible API at =http://localhost:11434/v1=. So Graphiti can use Ollama
|
||||||
|
by setting:
|
||||||
|
|
||||||
|
#+begin_src sh
|
||||||
|
OPENAI_BASE_URL=http://localhost:11434/v1
|
||||||
|
OPENAI_API_KEY=ollama # SDK requires a non-empty value; Ollama ignores it
|
||||||
|
#+end_src
|
||||||
|
|
||||||
|
The embedder uses =nomic-embed-text= (768 dimensions). =EMBEDDING_DIM= must be
|
||||||
|
set to match or Graphiti's index creation will use the wrong size.
|
||||||
|
|
||||||
|
* Testing Done
|
||||||
|
|
||||||
|
| Test | Result |
|
||||||
|
|-----------------------------------------+--------|
|
||||||
|
| =import lightrag= in Python 3.12 | ok |
|
||||||
|
| =lightrag-server= starts, binds port | ok |
|
||||||
|
| =import graphiti_core= in Python 3.12 | ok |
|
||||||
|
| Neo4j starts, responds on port 7474 | ok |
|
||||||
|
| Graphiti connects to Neo4j via bolt | ok |
|
||||||
|
| Neo4j stops cleanly | ok |
|
||||||
Reference in New Issue
Block a user