feat: add lightrag-mcp MCP server + agent tooling
- Add AGENTS.md with repo guidelines - Add lightrag-mcp: FastMCP server exposing insert_documents() + query_documents() to LLM agents via stdio transport, talks to LightRAG REST API - Add scripts/patch-vllm-cpu.py for CPU inference patching - Add .env.vllm for vLLM configuration - Update flake.nix with expanded dev shell - Update .env.lightrag - Remove CLAUDE.md (replaced by AGENTS.md)
This commit is contained in:
+9
-8
@@ -1,18 +1,19 @@
|
||||
# LLM via Ollama
|
||||
LLM_BINDING=ollama
|
||||
LLM_MODEL=qwen3:0.6b
|
||||
LLM_BINDING_HOST=http://localhost:11434
|
||||
LLM_BINDING=openai
|
||||
LLM_MODEL=minimax/minimax-m2.7
|
||||
LLM_BINDING_HOST=https://openrouter.ai/api/v1
|
||||
LLM_BINDING_API_KEY=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
|
||||
|
||||
# Embeddings via Ollama
|
||||
# Embeddings via Ollama (port 11434)
|
||||
EMBEDDING_BINDING=ollama
|
||||
EMBEDDING_MODEL=qwen3-embedding:0.6b
|
||||
EMBEDDING_MODEL=qwen3-embedding:4b
|
||||
EMBEDDING_BINDING_HOST=http://localhost:11434
|
||||
EMBEDDING_DIM=1024
|
||||
EMBEDDING_API_KEY=
|
||||
EMBEDDING_DIM=2560
|
||||
|
||||
# Storage (local files)
|
||||
RAG_DIR=./rag_storage
|
||||
|
||||
# Timeouts (in seconds) — increase for large local models
|
||||
# Timeouts (in seconds)
|
||||
EMBEDDING_TIMEOUT=60
|
||||
TIMEOUT=60
|
||||
|
||||
|
||||
@@ -0,0 +1,13 @@
|
||||
# vllm server configuration
|
||||
# Used by: nix run .#vllm-start-llm and nix run .#vllm-start-embed
|
||||
|
||||
# Force CPU backend — no CUDA/ROCm GPU on this machine
|
||||
VLLM_TARGET_DEVICE=cpu
|
||||
|
||||
VLLM_LLM_MODEL=Qwen/Qwen3-0.6B
|
||||
VLLM_LLM_PORT=8000
|
||||
# VLLM_LLM_EXTRA_ARGS=--dtype bfloat16 --max-model-len 4096
|
||||
|
||||
VLLM_EMBED_MODEL=Qwen/Qwen3-Embedding-0.6B
|
||||
VLLM_EMBED_PORT=8001
|
||||
# VLLM_EMBED_EXTRA_ARGS=--dtype bfloat16
|
||||
@@ -0,0 +1,66 @@
|
||||
# RAGS
|
||||
|
||||
Private learning tool. Ingest study materials → knowledge graph → query → export Anki flashcards.
|
||||
|
||||
Two systems:
|
||||
- **LightRAG** (`lightrag/`) — graph-based RAG server (primary interface)
|
||||
- **Graphiti** (`graphiti/`) — temporal knowledge graph library (Python library only, needs Neo4j)
|
||||
|
||||
## Quick Start
|
||||
|
||||
```sh
|
||||
# Ollama must be running first on :11434 with:
|
||||
# qwen3:0.6b (LLM)
|
||||
# qwen3-embedding:0.6b (embeddings)
|
||||
|
||||
# Start LightRAG only (LLM + embeddings handled externally by Ollama)
|
||||
nix run .#start
|
||||
# → http://localhost:9621/webui (React frontend)
|
||||
# → http://localhost:9621/docs (Swagger API)
|
||||
|
||||
# Graphiti needs Neo4j running first
|
||||
nix run .#neo4j-start # separate terminal
|
||||
nix develop .#graphiti
|
||||
```
|
||||
|
||||
**Always enter via `nix develop` from repo root** — never activate venvs directly. The shellHook sources `.env.lightrag` and sets `LD_LIBRARY_PATH`.
|
||||
|
||||
## Configuration
|
||||
|
||||
### `.env.lightrag`
|
||||
**Restart LightRAG after changes.**
|
||||
|
||||
| Var | Value |
|
||||
|-----|-------|
|
||||
| `LLM_BINDING` | `ollama` |
|
||||
| `LLM_MODEL` | `qwen3:0.6b` |
|
||||
| `LLM_BINDING_HOST` | `http://localhost:11434` |
|
||||
| `EMBEDDING_BINDING` | `ollama` |
|
||||
| `EMBEDDING_MODEL` | `qwen3-embedding:0.6b` |
|
||||
| `EMBEDDING_DIM` | `1024` |
|
||||
|
||||
Verify embedding works:
|
||||
```sh
|
||||
curl -s http://localhost:11434/api/embed \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"model":"qwen3-embedding:0.6b","input":"test"}'
|
||||
```
|
||||
|
||||
**Critical:** If `EMBEDDING_DIM` changes, delete `rag_storage/` before restarting — old vectors are incompatible.
|
||||
|
||||
## LightRAG Storage
|
||||
File-based by default (`JsonKVStorage`, `NanoVectorDBStorage`, `NetworkXStorage`). All data in `rag_storage/` (gitignored). Safe to delete to reset.
|
||||
|
||||
## Nix / NixOS Notes
|
||||
- `UV_PYTHON` pinned to nix-provided Python 3.12 (system has 3.14)
|
||||
- `LD_LIBRARY_PATH` set in shellHook for native wheels
|
||||
- LightRAG installs with `--extra api --extra offline-llm`
|
||||
- WebUI (React/Bun) built on first shell entry if `lightrag/lightrag/api/webui/` missing
|
||||
|
||||
## Known Issue: Pipeline Stuck
|
||||
|
||||
After config changes, pipeline may show `busy: true` with pending async locks. Symptoms:
|
||||
- `GET /documents/pipeline_status` returns `busy: true`, `request_pending: true`
|
||||
- New inserts stay at `status: pending`
|
||||
|
||||
Fix: delete `rag_storage/`, restart. Or `POST /documents/cancel_pipeline`.
|
||||
@@ -1,80 +0,0 @@
|
||||
# CLAUDE.md
|
||||
|
||||
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
|
||||
|
||||
## Purpose
|
||||
|
||||
Private learning tool. Ingest study materials → build a knowledge graph → query concepts → export flashcards to Anki.
|
||||
|
||||
Two systems:
|
||||
- **LightRAG** (`lightrag/` submodule) — graph-based RAG server. Ingests documents, builds a knowledge graph, answers queries. Primary interface.
|
||||
- **Graphiti** (`graphiti/` submodule) — temporal knowledge graph library. Tracks *when* concepts were learned and how understanding evolves. Used as a Python library, not a server.
|
||||
|
||||
Both run fully local via Ollama. No cloud dependencies.
|
||||
|
||||
## Running Things
|
||||
|
||||
**Always enter via `nix develop` from the repo root — never activate the venv directly.** The shellHook sources `.env.lightrag` / `.env.graphiti` and sets `LD_LIBRARY_PATH` needed for native wheels on NixOS.
|
||||
|
||||
```sh
|
||||
# LightRAG server (API + WebUI)
|
||||
nix develop .#lightrag
|
||||
lightrag-server
|
||||
# → http://localhost:9621/webui (React frontend)
|
||||
# → http://localhost:9621/docs (Swagger API)
|
||||
|
||||
# Graphiti (library, no server)
|
||||
nix run .#neo4j-start # required first, separate terminal
|
||||
nix develop .#graphiti
|
||||
|
||||
# Neo4j management
|
||||
nix run .#neo4j-start
|
||||
nix run .#neo4j-stop
|
||||
```
|
||||
|
||||
## Current Models (Ollama)
|
||||
|
||||
| Role | Model | Dim |
|
||||
|------|-------|-----|
|
||||
| LLM | `qwen3:0.6b` | — |
|
||||
| Embeddings | `qwen3-embedding:0.6b` | 1024 |
|
||||
|
||||
**Critical:** if the embedding model or `EMBEDDING_DIM` changes, `rag_storage/` must be deleted before restarting — old vectors are incompatible.
|
||||
|
||||
## Configuration
|
||||
|
||||
`.env.lightrag` is sourced by the shellHook and read by `lightrag-server` at startup. **Changes require a server restart** — the server does not hot-reload env vars.
|
||||
|
||||
Key vars:
|
||||
- `LLM_MODEL` / `EMBEDDING_MODEL` — Ollama model tags
|
||||
- `EMBEDDING_DIM` — must exactly match what the embedding model outputs (verify with `curl -s http://localhost:11434/api/embed -d '{"model":"<name>","input":"test"}' | python3 -c "import sys,json; d=json.load(sys.stdin); print(len(d['embeddings'][0]))"`)
|
||||
- `EMBEDDING_TIMEOUT` / `TIMEOUT` — in seconds; worker execution timeout is `2× EMBEDDING_TIMEOUT`
|
||||
- `RAG_DIR` — resolved relative to where `lightrag-server` is invoked (inside `lightrag/` subdir due to shellHook `cd`)
|
||||
|
||||
## Infrastructure Notes
|
||||
|
||||
### Nix / NixOS
|
||||
- Impure devShells: Nix provides Python 3.12 + uv; `uv sync` installs PyPI deps into `lightrag/.venv` or `graphiti/.venv` at shell entry.
|
||||
- `LD_LIBRARY_PATH` is set in shellHook for `libstdc++.so.6` — required for numpy and other native wheels on NixOS.
|
||||
- `UV_PYTHON` is pinned to the nix-provided Python 3.12 binary to prevent uv from picking up the system Python (3.14 on this machine).
|
||||
- LightRAG installs with `--extra api --extra offline-llm` (the `ollama` Python package lives in `offline-llm`, not `api`).
|
||||
- WebUI (React/Bun) is built on first shell entry if `lightrag/lightrag/api/webui/` doesn't exist.
|
||||
|
||||
### Ollama
|
||||
- Configured in `~/nix-config/machines/n1n1/services/ollama.nix`
|
||||
- Uses `pkgs.ollama-rocm` (AMD ROCm) — iGPU is detected and used by default
|
||||
- `OLLAMA_NUM_GPU=0` is set in NixOS config to force CPU-only mode (iGPU was consuming shared RAM)
|
||||
- Ollama CORS origin includes `http://127.0.0.1:8080` (open-webui) and `https://ollama.jibai.dev`
|
||||
|
||||
### LightRAG Storage
|
||||
File-based by default (`JsonKVStorage`, `NanoVectorDBStorage`, `NetworkXStorage`). All data lives in `rag_storage/` (gitignored). Safe to delete entirely to reset.
|
||||
|
||||
## Known Issues / Active Debugging
|
||||
|
||||
**LightRAG pipeline getting stuck**: After a server restart following config changes, the pipeline shows `busy: true` with pending async locks but doesn't process documents. Symptoms:
|
||||
- `GET /documents/pipeline_status` returns `busy: true`, `request_pending: true`
|
||||
- `keyed_locks.pending_async_cleanup` > 0
|
||||
- New inserts stay at `status: pending` indefinitely
|
||||
- `POST /documents/cancel_pipeline` may be needed to unblock
|
||||
|
||||
The root cause is not yet determined. Suspicion: stale lock state inherited from previous failed runs persisted in `rag_storage/` JSON files. Try deleting `rag_storage/` and restarting the server fresh.
|
||||
@@ -3,11 +3,17 @@
|
||||
|
||||
inputs.nixpkgs.url = "github:NixOS/nixpkgs/nixos-unstable";
|
||||
|
||||
outputs = { self, nixpkgs }:
|
||||
outputs =
|
||||
{ self, nixpkgs }:
|
||||
let
|
||||
system = "x86_64-linux";
|
||||
pkgs = nixpkgs.legacyPackages.${system};
|
||||
|
||||
stdLibs = pkgs.lib.makeLibraryPath [
|
||||
pkgs.stdenv.cc.cc
|
||||
pkgs.zlib
|
||||
];
|
||||
|
||||
startNeo4j = pkgs.writeShellScript "start-neo4j" ''
|
||||
set -e
|
||||
: "''${RAGS_ROOT:=$PWD}"
|
||||
@@ -41,18 +47,77 @@
|
||||
${pkgs.neo4j}/bin/neo4j stop
|
||||
'';
|
||||
|
||||
in {
|
||||
startAll = pkgs.writeShellScript "start-all" ''
|
||||
set -e
|
||||
: "''${RAGS_ROOT:=$PWD}"
|
||||
|
||||
if [ -f "$RAGS_ROOT/.env.lightrag" ]; then
|
||||
set -a; source "$RAGS_ROOT/.env.lightrag"; set +a
|
||||
fi
|
||||
|
||||
LIGHTRAG_BIN="$RAGS_ROOT/lightrag/.venv/bin/lightrag-server"
|
||||
LOG_DIR="$RAGS_ROOT/logs"
|
||||
mkdir -p "$LOG_DIR"
|
||||
|
||||
LIGHTRAG_PID=""
|
||||
cleanup() {
|
||||
echo ""
|
||||
echo "Shutting down..."
|
||||
[ -n "$LIGHTRAG_PID" ] && kill "$LIGHTRAG_PID" 2>/dev/null || true
|
||||
wait 2>/dev/null || true
|
||||
}
|
||||
trap cleanup EXIT INT TERM
|
||||
|
||||
echo "Starting LightRAG server..."
|
||||
"$LIGHTRAG_BIN" >> "$LOG_DIR/lightrag.log" 2>&1 &
|
||||
LIGHTRAG_PID=$!
|
||||
|
||||
wait_for() {
|
||||
local label=$1 url=$2 tries=0
|
||||
printf " Waiting for %s" "$label"
|
||||
while ! ${pkgs.curl}/bin/curl -so /dev/null --max-time 2 "$url" 2>/dev/null; do
|
||||
tries=$((tries+1))
|
||||
[ $tries -ge 300 ] && { echo " TIMEOUT — check logs/$label.log"; exit 1; }
|
||||
printf "."
|
||||
sleep 1
|
||||
done
|
||||
echo " ready"
|
||||
}
|
||||
|
||||
wait_for "lightrag" "http://localhost:9621/docs"
|
||||
|
||||
echo ""
|
||||
echo "All services up:"
|
||||
echo " LightRAG webui: http://localhost:9621/webui"
|
||||
echo " LightRAG API: http://localhost:9621/docs"
|
||||
echo " Ollama LLM: http://localhost:11434 (external)"
|
||||
echo " Ollama embed: http://localhost:11434/api/embed (external)"
|
||||
echo " logs: $LOG_DIR/"
|
||||
echo ""
|
||||
echo "Ctrl+C to stop everything."
|
||||
echo ""
|
||||
|
||||
tail -f "$LOG_DIR/lightrag.log"
|
||||
'';
|
||||
|
||||
in
|
||||
{
|
||||
devShells.${system} = {
|
||||
|
||||
lightrag = pkgs.mkShell {
|
||||
packages = [ pkgs.uv pkgs.python312 pkgs.curl pkgs.bun ];
|
||||
packages = [
|
||||
pkgs.uv
|
||||
pkgs.python312
|
||||
pkgs.curl
|
||||
pkgs.bun
|
||||
];
|
||||
|
||||
shellHook = ''
|
||||
RAGS_ROOT="$PWD"
|
||||
export VIRTUAL_ENV="$RAGS_ROOT/lightrag/.venv"
|
||||
export UV_PROJECT_ENVIRONMENT="$VIRTUAL_ENV"
|
||||
export UV_PYTHON="${pkgs.python312}/bin/python3.12"
|
||||
export LD_LIBRARY_PATH="${pkgs.lib.makeLibraryPath [ pkgs.stdenv.cc.cc pkgs.zlib ]}:$LD_LIBRARY_PATH"
|
||||
export LD_LIBRARY_PATH="${stdLibs}:$LD_LIBRARY_PATH"
|
||||
|
||||
echo "Syncing lightrag venv..."
|
||||
(cd "$RAGS_ROOT/lightrag" && uv sync --extra api --extra offline-llm --quiet)
|
||||
@@ -69,22 +134,27 @@
|
||||
|
||||
echo ""
|
||||
echo "LightRAG shell ready."
|
||||
echo " start: lightrag-server"
|
||||
echo " start server: lightrag-server"
|
||||
echo " start all: nix run .#start"
|
||||
echo " config: $RAGS_ROOT/.env.lightrag"
|
||||
echo " needs: ollama with qwen3:0.6b + qwen3-embedding:0.6b"
|
||||
echo ""
|
||||
'';
|
||||
};
|
||||
|
||||
graphiti = pkgs.mkShell {
|
||||
packages = [ pkgs.uv pkgs.python312 pkgs.neo4j pkgs.curl ];
|
||||
packages = [
|
||||
pkgs.uv
|
||||
pkgs.python312
|
||||
pkgs.neo4j
|
||||
pkgs.curl
|
||||
];
|
||||
|
||||
shellHook = ''
|
||||
RAGS_ROOT="$PWD"
|
||||
export VIRTUAL_ENV="$RAGS_ROOT/graphiti/.venv"
|
||||
export UV_PROJECT_ENVIRONMENT="$VIRTUAL_ENV"
|
||||
export UV_PYTHON="${pkgs.python312}/bin/python3.12"
|
||||
export LD_LIBRARY_PATH="${pkgs.lib.makeLibraryPath [ pkgs.stdenv.cc.cc pkgs.zlib ]}:$LD_LIBRARY_PATH"
|
||||
export LD_LIBRARY_PATH="${stdLibs}:$LD_LIBRARY_PATH"
|
||||
cd "$RAGS_ROOT/graphiti"
|
||||
|
||||
echo "Syncing graphiti venv..."
|
||||
@@ -99,7 +169,6 @@
|
||||
echo "Graphiti shell ready."
|
||||
echo " neo4j: nix run .#neo4j-start (in another terminal, run first)"
|
||||
echo " config: $RAGS_ROOT/.env.graphiti"
|
||||
echo " needs: ollama with qwen3:0.6b + qwen3-embedding:0.6b"
|
||||
echo ""
|
||||
'';
|
||||
};
|
||||
@@ -107,8 +176,18 @@
|
||||
};
|
||||
|
||||
apps.${system} = {
|
||||
neo4j-start = { type = "app"; program = "${startNeo4j}"; };
|
||||
neo4j-stop = { type = "app"; program = "${stopNeo4j}"; };
|
||||
start = {
|
||||
type = "app";
|
||||
program = "${startAll}";
|
||||
};
|
||||
neo4j-start = {
|
||||
type = "app";
|
||||
program = "${startNeo4j}";
|
||||
};
|
||||
neo4j-stop = {
|
||||
type = "app";
|
||||
program = "${stopNeo4j}";
|
||||
};
|
||||
};
|
||||
};
|
||||
}
|
||||
|
||||
@@ -0,0 +1,3 @@
|
||||
OPENAI_API_KEY=your-openai-api-key-here
|
||||
LIGHTRAG_WORKING_DIR=./lightrag_workspace
|
||||
LIGHTRAG_EMBEDDING_MODEL=text-embedding-3-small
|
||||
@@ -0,0 +1 @@
|
||||
3.10
|
||||
@@ -0,0 +1,57 @@
|
||||
import os
|
||||
import httpx
|
||||
from fastmcp import FastMCP
|
||||
|
||||
LIGHTRAG_URL = os.getenv("LIGHTRAG_URL", "http://localhost:9621")
|
||||
|
||||
mcp = FastMCP("LightRAG")
|
||||
|
||||
|
||||
@mcp.tool
|
||||
async def insert_documents(documents: list[str]) -> str:
|
||||
"""Insert text documents into LightRAG for indexing.
|
||||
|
||||
Args:
|
||||
documents: List of document strings to index. Each string is treated as a separate document.
|
||||
|
||||
Returns:
|
||||
Tracking ID for the insertion operation.
|
||||
"""
|
||||
async with httpx.AsyncClient(timeout=120.0) as client:
|
||||
r = await client.post(
|
||||
f"{LIGHTRAG_URL}/documents/texts",
|
||||
json={"texts": documents},
|
||||
)
|
||||
r.raise_for_status()
|
||||
data = r.json()
|
||||
return data.get("track_id", data.get("message", "unknown"))
|
||||
|
||||
|
||||
@mcp.tool
|
||||
async def query_documents(query: str, mode: str = "mix", top_k: int = 60) -> dict:
|
||||
"""Query LightRAG and retrieve relevant context without LLM generation.
|
||||
|
||||
Args:
|
||||
query: The search query string.
|
||||
mode: Retrieval mode - "local", "global", "hybrid", "naive", "mix" (default: "mix").
|
||||
top_k: Number of top results to retrieve (default: 60).
|
||||
|
||||
Returns:
|
||||
Structured retrieval data including entities, relationships, and text chunks.
|
||||
"""
|
||||
async with httpx.AsyncClient(timeout=120.0) as client:
|
||||
r = await client.post(
|
||||
f"{LIGHTRAG_URL}/query/data",
|
||||
json={
|
||||
"query": query,
|
||||
"mode": mode,
|
||||
"only_need_context": True,
|
||||
"top_k": top_k,
|
||||
},
|
||||
)
|
||||
r.raise_for_status()
|
||||
return r.json()
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
mcp.run()
|
||||
@@ -0,0 +1,11 @@
|
||||
[project]
|
||||
name = "lightrag-mcp"
|
||||
version = "0.1.0"
|
||||
description = "Add your description here"
|
||||
readme = "README.md"
|
||||
requires-python = ">=3.10"
|
||||
dependencies = [
|
||||
"fastmcp>=3.2.4",
|
||||
"httpx>=0.28.1",
|
||||
"lightrag-hku>=1.4.15",
|
||||
]
|
||||
@@ -0,0 +1,83 @@
|
||||
import asyncio
|
||||
import os
|
||||
from mcp import ClientSession, StdioServerParameters
|
||||
from mcp.client.stdio import stdio_client
|
||||
|
||||
|
||||
async def main():
|
||||
server = StdioServerParameters(
|
||||
command="uv",
|
||||
args=[
|
||||
"run",
|
||||
"--directory",
|
||||
"/home/df/projects/rags/lightrag-mcp",
|
||||
"python",
|
||||
"main.py",
|
||||
],
|
||||
)
|
||||
|
||||
async with stdio_client(server) as (read, write):
|
||||
async with ClientSession(read, write) as session:
|
||||
await session.initialize()
|
||||
|
||||
print("--- INSERT ---")
|
||||
result = await session.call_tool(
|
||||
"insert_documents",
|
||||
arguments={
|
||||
"documents": [
|
||||
"Python is a high-level programming language known for its simplicity and readability.",
|
||||
"JavaScript was created in 1995 by Brendan Eich at Netscape.",
|
||||
"Machine learning is a subset of artificial intelligence that enables systems to learn from data.",
|
||||
]
|
||||
},
|
||||
)
|
||||
print(f"Insert result: {result.content[0].text[:200]}")
|
||||
|
||||
print("\n--- QUERY (mix) ---")
|
||||
result = await session.call_tool(
|
||||
"query_documents",
|
||||
arguments={
|
||||
"query": "Tell me about programming languages",
|
||||
"mode": "mix",
|
||||
"top_k": 60,
|
||||
},
|
||||
)
|
||||
import json
|
||||
|
||||
data = json.loads(result.content[0].text)
|
||||
d = data.get("data", {})
|
||||
print(f"Entities: {len(d.get('entities', []))}")
|
||||
print(f"Relationships: {len(d.get('relationships', []))}")
|
||||
print(f"Chunks: {len(d.get('chunks', []))}")
|
||||
for c in d.get("chunks", [])[:2]:
|
||||
print(f" - {c.get('content', '')[:100]}")
|
||||
|
||||
print("\n--- QUERY (local) ---")
|
||||
result = await session.call_tool(
|
||||
"query_documents",
|
||||
arguments={"query": "What is Python?", "mode": "local", "top_k": 60},
|
||||
)
|
||||
data = json.loads(result.content[0].text)
|
||||
d = data.get("data", {})
|
||||
print(f"Entities: {len(d.get('entities', []))}")
|
||||
print(f"Chunks: {len(d.get('chunks', []))}")
|
||||
|
||||
print("\n--- QUERY (global) ---")
|
||||
result = await session.call_tool(
|
||||
"query_documents",
|
||||
arguments={
|
||||
"query": "What topics are covered?",
|
||||
"mode": "global",
|
||||
"top_k": 60,
|
||||
},
|
||||
)
|
||||
data = json.loads(result.content[0].text)
|
||||
d = data.get("data", {})
|
||||
print(f"Entities: {len(d.get('entities', []))}")
|
||||
print(f"Relationships: {len(d.get('relationships', []))}")
|
||||
|
||||
print("\nDone!")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
asyncio.run(main())
|
||||
@@ -0,0 +1,77 @@
|
||||
import httpx
|
||||
import asyncio
|
||||
|
||||
|
||||
async def main():
|
||||
base_url = "http://localhost:9621"
|
||||
|
||||
async with httpx.AsyncClient(timeout=120.0) as client:
|
||||
print("--- INSERT ---")
|
||||
docs = [
|
||||
"Python is a high-level programming language known for its simplicity and readability.",
|
||||
"JavaScript was created in 1995 by Brendan Eich at Netscape.",
|
||||
"Machine learning is a subset of artificial intelligence that enables systems to learn from data.",
|
||||
"LightRAG combines knowledge graph and vector retrieval for enhanced RAG applications.",
|
||||
"FastMCP is a framework for building MCP servers in Python.",
|
||||
]
|
||||
r = await client.post(f"{base_url}/documents/texts", json={"texts": docs})
|
||||
r.raise_for_status()
|
||||
print(f"Inserted: {r.json()}")
|
||||
|
||||
print("\n--- QUERY (mix mode) ---")
|
||||
r = await client.post(
|
||||
f"{base_url}/query/data",
|
||||
json={
|
||||
"query": "Tell me about programming languages",
|
||||
"mode": "mix",
|
||||
"only_need_context": True,
|
||||
"top_k": 60,
|
||||
},
|
||||
)
|
||||
r.raise_for_status()
|
||||
result = r.json()
|
||||
print(f"mode=mix keys: {list(result.keys())}")
|
||||
if "chunks" in result:
|
||||
print(f" chunks: {len(result['chunks'])} returned")
|
||||
for c in result["chunks"][:2]:
|
||||
print(f" - {c.get('content', '')[:100]}...")
|
||||
|
||||
print("\n--- QUERY (local mode) ---")
|
||||
r = await client.post(
|
||||
f"{base_url}/query/data",
|
||||
json={
|
||||
"query": "What is Python?",
|
||||
"mode": "local",
|
||||
"only_need_context": True,
|
||||
"top_k": 60,
|
||||
},
|
||||
)
|
||||
r.raise_for_status()
|
||||
result = r.json()
|
||||
print(f"mode=local keys: {list(result.keys())}")
|
||||
if "chunks" in result:
|
||||
print(f" chunks: {len(result['chunks'])} returned")
|
||||
|
||||
print("\n--- QUERY (global mode) ---")
|
||||
r = await client.post(
|
||||
f"{base_url}/query/data",
|
||||
json={
|
||||
"query": "What topics are covered?",
|
||||
"mode": "global",
|
||||
"only_need_context": True,
|
||||
"top_k": 60,
|
||||
},
|
||||
)
|
||||
r.raise_for_status()
|
||||
result = r.json()
|
||||
print(f"mode=global keys: {list(result.keys())}")
|
||||
if "entities" in result:
|
||||
print(f" entities: {len(result['entities'])} returned")
|
||||
if "relationships" in result:
|
||||
print(f" relationships: {len(result['relationships'])} returned")
|
||||
|
||||
print("\nDone!")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
asyncio.run(main())
|
||||
Generated
+3048
File diff suppressed because it is too large
Load Diff
@@ -0,0 +1,51 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Patch vllm's cpu_platform_plugin to respect VLLM_TARGET_DEVICE=cpu.
|
||||
|
||||
The upstream CUDA build only activates the CPU platform on macOS or when
|
||||
the version string contains "cpu" (source builds). This patch adds a third
|
||||
condition: if VLLM_TARGET_DEVICE=cpu is set in the environment.
|
||||
|
||||
Run after every `uv pip install vllm` — idempotent.
|
||||
"""
|
||||
import pathlib
|
||||
import sys
|
||||
|
||||
venv = pathlib.Path(__file__).parent.parent / "vllm" / ".venv"
|
||||
target = venv / "lib" / "python3.12" / "site-packages" / "vllm" / "platforms" / "__init__.py"
|
||||
|
||||
if not target.exists():
|
||||
print(f"vllm not installed at {target}, skipping patch")
|
||||
sys.exit(0)
|
||||
|
||||
content = target.read_text()
|
||||
|
||||
if "VLLM_TARGET_DEVICE" in content:
|
||||
print("patch already applied")
|
||||
sys.exit(0)
|
||||
|
||||
old = '''\
|
||||
if not is_cpu:
|
||||
import sys
|
||||
|
||||
is_cpu = sys.platform.startswith("darwin")
|
||||
if is_cpu:
|
||||
logger.debug(
|
||||
"Confirmed CPU platform is available because the machine is MacOS."
|
||||
)'''
|
||||
|
||||
new = old + '''
|
||||
|
||||
if not is_cpu:
|
||||
is_cpu = os.environ.get("VLLM_TARGET_DEVICE", "").lower() == "cpu"
|
||||
if is_cpu:
|
||||
logger.debug(
|
||||
"Confirmed CPU platform is available because VLLM_TARGET_DEVICE=cpu."
|
||||
)'''
|
||||
|
||||
if old not in content:
|
||||
print("ERROR: patch target not found — vllm version may have changed", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
|
||||
target.write_text(content.replace(old, new, 1))
|
||||
print("patched cpu_platform_plugin to respect VLLM_TARGET_DEVICE=cpu")
|
||||
Reference in New Issue
Block a user