rags/docs/setup.org

#+title: Setup Notes
#+date: 2026-04-19

* What We're Building and Why

Private learning tool. Ingest study materials → query concepts → export to Anki.

Five RAG frameworks were considered: LightRAG, Graphiti, Morphik, R2R, Agentset.

** Why LightRAG

Graph-based RAG — it builds a knowledge graph from your documents, not just a
flat vector index. Queries traverse relationships between concepts, which maps
naturally to Anki's card/tag structure. File-based storage, minimal deps, works
with Ollama.

** Why Graphiti

Temporal knowledge graph designed for agent memory. Tracks *when* facts were
learned and how they change over time. Complements LightRAG: LightRAG indexes
your source material, Graphiti tracks your evolving understanding of it.

** What Was Skipped and Why

| Project  | Reason skipped                                              |
|----------+-------------------------------------------------------------|
| Morphik  | Multimodal (ColPali) — only useful if materials have images |
| R2R      | 6+ services (MinIO, RabbitMQ, Hatchet, 2x Postgres)        |
| Agentset | Bun/TypeScript monorepo, needs Supabase + Trigger.dev       |

* Project Structure

Git repo with two submodules:

#+begin_src sh
git init
git submodule add https://github.com/hkuds/lightrag lightrag
git submodule add https://github.com/getzep/graphiti graphiti
#+end_src

* Nix Flake Design

** Goal: impure but reproducible shells

Packaging Python with Nix properly (=buildPythonPackage=, wheels in the nix
store) is slow and often breaks on native extensions. The tradeoff chosen:

- Nix provides the runtime: Python 3.12, uv, Neo4j, curl
- =uv sync= installs PyPI deps into a =.venv= outside the nix store at shell entry
- =.venv= dirs are gitignored, recreated on first =nix develop=

This is impure — the =.venv= contents aren't pinned by Nix — but =uv.lock= in
each submodule pins the exact PyPI versions, so it's reproducible enough.

** Two devShells

#+begin_src nix
devShells.${system} = {
  lightrag = pkgs.mkShell { ... };
  graphiti = pkgs.mkShell { ... };
};
#+end_src

Each shell:
1. Sets =UV_PYTHON= to the nix-provided Python 3.12 binary
2. Sets =UV_PROJECT_ENVIRONMENT= so uv puts the venv in the project dir
3. Sets =LD_LIBRARY_PATH= for native wheel compatibility (see below)
4. Runs =uv sync= on first entry
5. Sources =.env.<project>= for runtime config

** Neo4j as nix apps

#+begin_src nix
apps.${system} = {
  neo4j-start = { type = "app"; program = "${startNeo4j}"; };
  neo4j-stop  = { type = "app"; program = "${stopNeo4j}"; };
};
#+end_src

=pkgs.neo4j= (version 2026.02.2) is in nixpkgs. The startup script writes a
=neo4j.conf= to =./data/neo4j/conf/= at runtime and sets =NEO4J_CONF= to point
there. Neo4j respects =NEO4J_CONF= as a directory containing =neo4j.conf=.

Auth is disabled (=dbms.security.auth_enabled=false=) for local dev.

* Problems Solved

** Wrong Python version (3.14 instead of 3.12)

The system Python on this machine is 3.14. =uv= was picking it up instead of
the nix-provided =python312=. Fix: pin =UV_PYTHON= explicitly in the shellHook:

#+begin_src nix
export UV_PYTHON = "${pkgs.python312}/bin/python3.12";
#+end_src

** libstdc++.so.6 not found

PyPI wheels for numpy and other native extensions link against =libstdc++.so.6=.
On NixOS this library isn't in standard paths. Fix: add to =LD_LIBRARY_PATH= in
the shellHook:

#+begin_src nix
export LD_LIBRARY_PATH = "${pkgs.lib.makeLibraryPath [
  pkgs.stdenv.cc.cc
  pkgs.zlib
]}:$LD_LIBRARY_PATH";
#+end_src

** LightRAG server missing fastapi

=uv sync= alone doesn't install the API server deps — they're behind an optional
extra. Fix: use =uv sync --extra api= in the lightrag shellHook.

** Runtime paths in shellHook

=builtins.toString ./.= in a Nix flake evaluates to the flake's path in the
*nix store*, not the user's working directory. Using it for =cd= and venv paths
would point into =/nix/store/...=. Fix: use =$PWD= (the directory where the
user runs =nix develop=) for all runtime paths:

#+begin_src bash
RAGS_ROOT="$PWD"
export VIRTUAL_ENV="$RAGS_ROOT/lightrag/.venv"
cd "$RAGS_ROOT/lightrag"
#+end_src

* Graphiti + Ollama

Graphiti's LLM and embedder clients are OpenAI SDK wrappers. Ollama exposes an
OpenAI-compatible API at =http://localhost:11434/v1=. So Graphiti can use Ollama
by setting:

#+begin_src sh
OPENAI_BASE_URL=http://localhost:11434/v1
OPENAI_API_KEY=ollama  # SDK requires a non-empty value; Ollama ignores it
#+end_src

The embedder uses =nomic-embed-text= (768 dimensions). =EMBEDDING_DIM= must be
set to match or Graphiti's index creation will use the wrong size.

* Testing Done

| Test                                    | Result |
|-----------------------------------------+--------|
| =import lightrag= in Python 3.12        | ok     |
| =lightrag-server= starts, binds port    | ok     |
| =import graphiti_core= in Python 3.12   | ok     |
| Neo4j starts, responds on port 7474     | ok     |
| Graphiti connects to Neo4j via bolt     | ok     |
| Neo4j stops cleanly                     | ok     |