From 6b4d54bfac00f4024b751e50564f0c2465f902a5 Mon Sep 17 00:00:00 2001 From: Wong Ding Feng Date: Sun, 19 Apr 2026 12:22:46 +0800 Subject: [PATCH] docs: README.org and setup notes in docs/setup.org Co-Authored-By: Claude Sonnet 4.6 --- README.org | 123 ++++++++++++++++++++++++++++++++++++++++ docs/setup.org | 151 +++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 274 insertions(+) create mode 100644 README.org create mode 100644 docs/setup.org diff --git a/README.org b/README.org new file mode 100644 index 0000000..dba5aaf --- /dev/null +++ b/README.org @@ -0,0 +1,123 @@ +#+title: RAGs — Private Learning Tool +#+author: df +#+date: 2026-04-19 + +* Overview + +Two local RAG systems for a private learning tool with Anki export. + +| Project | Purpose | +|----------+--------------------------------------------------| +| LightRAG | Graph-based RAG — ingest docs, query concepts | +| Graphiti | Temporal knowledge graph — track what you learned and when | + +Both run fully local via Ollama. No cloud, no API keys. + +* Prerequisites + +** Ollama + +Install Ollama and pull the required models: + +#+begin_src sh +ollama pull qwen2.5:7b +ollama pull nomic-embed-text +#+end_src + +Ollama must be running before starting either service. + +** Nix + +Flakes must be enabled. Add to your NixOS config or =~/.config/nix/nix.conf=: + +#+begin_src +experimental-features = nix-command flakes +#+end_src + +* Usage + +** LightRAG + +Ingest documents and query them as a knowledge graph. + +#+begin_src sh +nix develop .#lightrag +lightrag-server +#+end_src + +Server runs at =http://localhost:9621=. + +Configure in =.env.lightrag=. Default storage is =./lightrag/rag_storage/=. + +** Graphiti + +Temporal memory graph — tracks concepts and when you learned them. + +Start Neo4j first (in a separate terminal): + +#+begin_src sh +nix run .#neo4j-start +#+end_src + +Then enter the shell: + +#+begin_src sh +nix develop .#graphiti +#+end_src + +Configure in =.env.graphiti=. + +** Neo4j Management + +#+begin_src sh +nix run .#neo4j-start # start daemon +nix run .#neo4j-stop # stop daemon +#+end_src + +Data persists in =./data/neo4j/=. Web UI at =http://localhost:7474=. + +* Configuration + +** .env.lightrag + +| Variable | Default | Notes | +|----------------------+----------------------+--------------------------| +| =LLM_BINDING= | =ollama= | | +| =LLM_MODEL= | =qwen2.5:7b= | Change to any Ollama model | +| =EMBEDDING_MODEL= | =nomic-embed-text= | | +| =EMBEDDING_DIM= | =768= | Must match model | +| =RAG_DIR= | =./rag_storage= | Where graph data lives | +| =PORT= | =9621= | | + +** .env.graphiti + +| Variable | Default | Notes | +|-------------------+------------------------------+-------------------------------| +| =NEO4J_URI= | =bolt://localhost:7687= | | +| =OPENAI_BASE_URL= | =http://localhost:11434/v1= | Ollama OpenAI-compatible API | +| =OPENAI_API_KEY= | =ollama= | Dummy value, required by SDK | +| =MODEL_NAME= | =qwen2.5:7b= | | +| =EMBEDDING_MODEL= | =nomic-embed-text= | | +| =EMBEDDING_DIM= | =768= | Must match model | + +* Structure + +#+begin_src +rags/ +├── flake.nix — Nix devShells and neo4j apps +├── flake.lock +├── .env.lightrag — LightRAG runtime config +├── .env.graphiti — Graphiti runtime config +├── lightrag/ — submodule: hkuds/lightrag +├── graphiti/ — submodule: getzep/graphiti +├── data/ +│ └── neo4j/ — Neo4j data (gitignored) +└── docs/ + └── setup.org — How this was set up +#+end_src + +* Submodules + +#+begin_src sh +git submodule update --init --recursive +#+end_src diff --git a/docs/setup.org b/docs/setup.org new file mode 100644 index 0000000..00e6c71 --- /dev/null +++ b/docs/setup.org @@ -0,0 +1,151 @@ +#+title: Setup Notes +#+date: 2026-04-19 + +* What We're Building and Why + +Private learning tool. Ingest study materials → query concepts → export to Anki. + +Five RAG frameworks were considered: LightRAG, Graphiti, Morphik, R2R, Agentset. + +** Why LightRAG + +Graph-based RAG — it builds a knowledge graph from your documents, not just a +flat vector index. Queries traverse relationships between concepts, which maps +naturally to Anki's card/tag structure. File-based storage, minimal deps, works +with Ollama. + +** Why Graphiti + +Temporal knowledge graph designed for agent memory. Tracks *when* facts were +learned and how they change over time. Complements LightRAG: LightRAG indexes +your source material, Graphiti tracks your evolving understanding of it. + +** What Was Skipped and Why + +| Project | Reason skipped | +|----------+-------------------------------------------------------------| +| Morphik | Multimodal (ColPali) — only useful if materials have images | +| R2R | 6+ services (MinIO, RabbitMQ, Hatchet, 2x Postgres) | +| Agentset | Bun/TypeScript monorepo, needs Supabase + Trigger.dev | + +* Project Structure + +Git repo with two submodules: + +#+begin_src sh +git init +git submodule add https://github.com/hkuds/lightrag lightrag +git submodule add https://github.com/getzep/graphiti graphiti +#+end_src + +* Nix Flake Design + +** Goal: impure but reproducible shells + +Packaging Python with Nix properly (=buildPythonPackage=, wheels in the nix +store) is slow and often breaks on native extensions. The tradeoff chosen: + +- Nix provides the runtime: Python 3.12, uv, Neo4j, curl +- =uv sync= installs PyPI deps into a =.venv= outside the nix store at shell entry +- =.venv= dirs are gitignored, recreated on first =nix develop= + +This is impure — the =.venv= contents aren't pinned by Nix — but =uv.lock= in +each submodule pins the exact PyPI versions, so it's reproducible enough. + +** Two devShells + +#+begin_src nix +devShells.${system} = { + lightrag = pkgs.mkShell { ... }; + graphiti = pkgs.mkShell { ... }; +}; +#+end_src + +Each shell: +1. Sets =UV_PYTHON= to the nix-provided Python 3.12 binary +2. Sets =UV_PROJECT_ENVIRONMENT= so uv puts the venv in the project dir +3. Sets =LD_LIBRARY_PATH= for native wheel compatibility (see below) +4. Runs =uv sync= on first entry +5. Sources =.env.= for runtime config + +** Neo4j as nix apps + +#+begin_src nix +apps.${system} = { + neo4j-start = { type = "app"; program = "${startNeo4j}"; }; + neo4j-stop = { type = "app"; program = "${stopNeo4j}"; }; +}; +#+end_src + +=pkgs.neo4j= (version 2026.02.2) is in nixpkgs. The startup script writes a +=neo4j.conf= to =./data/neo4j/conf/= at runtime and sets =NEO4J_CONF= to point +there. Neo4j respects =NEO4J_CONF= as a directory containing =neo4j.conf=. + +Auth is disabled (=dbms.security.auth_enabled=false=) for local dev. + +* Problems Solved + +** Wrong Python version (3.14 instead of 3.12) + +The system Python on this machine is 3.14. =uv= was picking it up instead of +the nix-provided =python312=. Fix: pin =UV_PYTHON= explicitly in the shellHook: + +#+begin_src nix +export UV_PYTHON = "${pkgs.python312}/bin/python3.12"; +#+end_src + +** libstdc++.so.6 not found + +PyPI wheels for numpy and other native extensions link against =libstdc++.so.6=. +On NixOS this library isn't in standard paths. Fix: add to =LD_LIBRARY_PATH= in +the shellHook: + +#+begin_src nix +export LD_LIBRARY_PATH = "${pkgs.lib.makeLibraryPath [ + pkgs.stdenv.cc.cc + pkgs.zlib +]}:$LD_LIBRARY_PATH"; +#+end_src + +** LightRAG server missing fastapi + +=uv sync= alone doesn't install the API server deps — they're behind an optional +extra. Fix: use =uv sync --extra api= in the lightrag shellHook. + +** Runtime paths in shellHook + +=builtins.toString ./.= in a Nix flake evaluates to the flake's path in the +*nix store*, not the user's working directory. Using it for =cd= and venv paths +would point into =/nix/store/...=. Fix: use =$PWD= (the directory where the +user runs =nix develop=) for all runtime paths: + +#+begin_src bash +RAGS_ROOT="$PWD" +export VIRTUAL_ENV="$RAGS_ROOT/lightrag/.venv" +cd "$RAGS_ROOT/lightrag" +#+end_src + +* Graphiti + Ollama + +Graphiti's LLM and embedder clients are OpenAI SDK wrappers. Ollama exposes an +OpenAI-compatible API at =http://localhost:11434/v1=. So Graphiti can use Ollama +by setting: + +#+begin_src sh +OPENAI_BASE_URL=http://localhost:11434/v1 +OPENAI_API_KEY=ollama # SDK requires a non-empty value; Ollama ignores it +#+end_src + +The embedder uses =nomic-embed-text= (768 dimensions). =EMBEDDING_DIM= must be +set to match or Graphiti's index creation will use the wrong size. + +* Testing Done + +| Test | Result | +|-----------------------------------------+--------| +| =import lightrag= in Python 3.12 | ok | +| =lightrag-server= starts, binds port | ok | +| =import graphiti_core= in Python 3.12 | ok | +| Neo4j starts, responds on port 7474 | ok | +| Graphiti connects to Neo4j via bolt | ok | +| Neo4j stops cleanly | ok |