feat: add UFDS flashcards and project learnings infrastructure

2026-05-04 09:01:28 +08:00
parent 69676a84be
commit 3ab8ba001d
4 changed files with 498 additions and 0 deletions
@@ -31,3 +31,12 @@ code

 ## Self-Improvement
 Periodically review this file and suggest improvements to the user if you notice gaps, inconsistencies, or missing conventions.
+
+## Active Context
+<!-- AI assistant maintains this section. Keep under 20 lines. -->
+<!-- Updated automatically by /self-improve. Remove stale entries. -->
+- Branch: `master`, 1 commit ahead of origin (unpushed)
+- Untracked files: `org/cpp/dsa/` and `org/cpp/ufds.org` (not yet committed)
+- Current work: UFDS flashcards (402-line proper card set) + DSA subdirectory
+- Inbox items: binary search, `using` keyword — need cards created
+- Possible cleanup: `org/cpp/dsa/udfs.org` may be a superseded draft of `org/cpp/ufds.org`
@@ -0,0 +1,46 @@
+# Project Learnings
+
+> Auto-maintained by the self-improve skill. Read at session start, updated at session end.
+
+## Patterns That Work
+<!-- Approaches that produced good results -->
+
+## Mistakes to Avoid
+<!-- Failed approaches and why they failed -->
+
+## Codebase Conventions
+
+**[2026-05-04] — File organization**
+- Observation: Most flashcard files are flat in `org/cpp/`, but subdirectories exist (`tricks/`, `dsa/`) for topic grouping. AGENTS.md only documents the flat convention.
+- Action: When creating new cards, use flat `org/cpp/topic.org` for STL/language features; subdirectories for broader categories (DSA, tricks). Propose updating AGENTS.md if this solidifies.
+- Confidence: medium
+
+**[2026-05-04] — Card format variance**
+- Observation: `org/cpp/dsa/udfs.org` uses raw `#+title:` + code blocks without ANKI properties. `org/cpp/ufds.org` follows the proper Anki card format. The proper format (with ANKI_NOTE_TYPE, Front/Back sections) is what gets exported.
+- Action: Always use the full Anki card format from AGENTS.md when creating flashcards. Raw code files in dsa/ may be scratch/reference, not export-ready cards.
+- Confidence: medium
+
+**[2026-05-04] — Naming: UFDS not UDFS**
+- Observation: `org/cpp/dsa/udfs.org` is a typo — the data structure is "Union-Find Disjoint Set" = UFDS. The properly-formatted file `org/cpp/ufds.org` uses the correct name.
+- Action: Use "ufds" spelling. The dsa/udfs.org appears to be an earlier draft.
+- Confidence: high
+
+## Environment & Config
+
+**[2026-05-04] — Git state**
+- Observation: Single branch `master` with remote `origin/master`. No branching workflow — commits go directly to master.
+- Action: Commit directly to master. Push when work is complete.
+- Confidence: high
+
+## Business Context
+
+**[2026-05-04] — Study focus**
+- Observation: Recent commits cover STL containers (deque, array, set, map, iterators) and DSA (UFDS). Inbox has LeetCode solutions (two sum, max consecutive ones) with notes to learn binary search and `using` keyword.
+- Action: Current study trajectory is STL containers + competitive programming DSA. Prioritize cards for these topics.
+- Confidence: high
+
+## Open Questions
+
+**[2026-05-04] — Duplicate UFDS files**
+- Question: Are both `org/cpp/ufds.org` (402 lines, proper format) and `org/cpp/dsa/udfs.org` (41 lines, raw code) needed? The former seems to supersede the latter.
+- Action: Ask user if `dsa/udfs.org` should be removed or merged.
@@ -0,0 +1,41 @@
+#+title: Udfs
+* impl
+#+begin_src cpp
+#include <vector>
+#include <numeric>
+#include <algorithm>
+class Ufs {
+  private:
+  std::vector<int> p;
+  std::vector<int> s;
+  std::vector<int> r;
+  public:
+  Ufds(int n) {
+    p.resize(n);
+    std::iota(p.begin(), p.end(), 0);
+    s.assign(n, 1); // test on equality with the assign fill and iota explain the difference
+    std::fill(r.begin(),r.end(),0);
+    numSets = n;
+  }
+  ~Ufds() {}
+  ~Ufds() = default;
+  int find(int x) {
+    if (p[x] == x) return x;
+    return p[x] = find(p[x]);
+  }
+  int find(int x) {
+    int px = p[x];
+    if (px != x) {
+      px = find(px);
+    }
+    p[x] = px;
+    return px;
+  }
+};
+#+end_src
+
+should ask what new returns? delete? what happens? why are we deleting it ? heap allocated?
+should show equivalent version of using std::array to vector
+should ask what resize does?
+
+what about a true dynammic version where we create it upon calling find()
@@ -0,0 +1,402 @@
+* Ufds: Union-Find Disjoint Set :cpp:datastructure:
+:PROPERTIES:
+:ANKI_NOTE_TYPE: Basic
+:END:
+** Front
+Write a minimal Ufds class with vector parent, constructor initializing parents to self
+** Back
+#+begin_src c++
+#include <vector>
+#include <numeric>
+
+class Ufds {
+private:
+  std::vector<int> parent;
+public:
+  Ufds(int n) : parent(n) {
+    std::iota(parent.begin(), parent.end(), 0);
+  }
+};
+#+end_src
+
+* Ufds find() with path compression :cpp:datastructure:
+:PROPERTIES:
+:ANKI_NOTE_TYPE: Basic
+:END:
+** Front
+Write the ~find()~ method for Ufds with path compression
+** Back
+#+begin_src c++
+int find(int x) {
+    if (parent[x] != x) {
+        parent[x] = find(parent[x]);
+    }
+    return parent[x];
+}
+#+end_src
+
+* Concise Ufds find() :cpp:datastructure:
+:PROPERTIES:
+:ANKI_NOTE_TYPE: Basic
+:END:
+** Front
+Is this correct? What does it do?
+#+begin_src c++
+int find(int x) {
+    if (parent[x] == x) return x;
+    return parent[x] = find(parent[x]);
+}
+#+end_src
+** Back
+Yes, correct and equivalent to the longer version.
+
+- Base case: if ~x~ is its own parent, return ~x~
+- Recursive case: find root of parent, then assign and return
+
+The assignment ~parent[x] = find(parent[x])~ returns the result while compressing the path.
+
+* Ufds constructor: initializer list vs body :cpp:cpp:
+:PROPERTIES:
+:ANKI_NOTE_TYPE: Basic
+:END:
+** Front
+What is the difference between these two Ufds constructors?
+#+begin_src c++
+// Initializer list
+Ufds(int n) : parent(n) {
+  std::iota(parent.begin(), parent.end(), 0);
+}
+
+// Body style
+Ufds(int n) {
+  parent.resize(n);
+  std::iota(parent.begin(), parent.end(), 0);
+}
+#+end_src
+** Back
+Initializer list: direct constructs ~parent~ with size ~n~ (one allocation)
+
+Body style: default constructs ~parent~, then resize (potential double allocation)
+
+Both produce same result, initializer list is more efficient.
+
+* What does std::vector::resize do? :cpp:cpp:
+:PROPERTIES:
+:ANKI_NOTE_TYPE: Basic
+:END:
+** Front
+What does ~resize(n)~ do on a ~std::vector<int>~?
+** Back
+Resizes the vector to have ~n~ elements:
+
+- If ~n~ > current size: adds elements (value-initialized, usually 0)
+- If ~n~ < current size: truncates the vector
+- Reallocates if capacity < n
+
+#+begin_src c++
+std::vector<int> v;
+v.resize(5);  // v = {0, 0, 0, 0, 0}
+#+end_src
+
+* Ufds destructor: when to omit :cpp:cpp:
+:PROPERTIES:
+:ANKI_NOTE_TYPE: Basic
+:END:
+** Front
+Should Ufds have a destructor? ~Ufds() {}~ vs ~Ufds() = default~
+** Back
+Omit entirely if only using ~std::vector~ (auto-cleanup)
+
+If you must write one, prefer ~= default~ to explicitly show intent
+
+Empty ~{}~ works but ~= default~ is clearer intent for future maintenance
+
+* Correct Ufds constructor: is this correct? :cpp:cpp:
+:PROPERTIES:
+:ANKI_NOTE_TYPE: Basic
+:END:
+** Front
+Is this correct?
+#+begin_src c++
+Ufds(int n) {
+  p.resize(n);
+  std::iota(p.begin(), p.end(), 0);
+}
+#+end_src
+** Back
+Yes, correct. This is the body-style initialization.
+
+One potential issue: ~p~ is default-constructed first, then ~resize~ may reallocate.
+
+For efficiency, prefer initializer list: ~Ufds(int n) : p(n)~
+
+* Incorrect Ufds constructor: correct or wrong? :cpp:cpp:
+:PROPERTIES:
+:ANKI_NOTE_TYPE: Basic
+:END:
+** Front
+Is this correct?
+#+begin_src c++
+std::vector<int> p;
+Ufds(int n) {
+  p = new std::vector<int>(n);
+}
+#+end_src
+** Back
+Wrong.
+
+~new std::vector<int>(n)~ returns a ~vector<int>*~ (pointer), but ~p~ is a ~vector<int>~ (not a pointer).
+
+Can't assign a pointer to a vector.
+
+* What does new return? :cpp:cpp:
+:PROPERTIES:
+:ANKI_NOTE_TYPE: Basic
+:END:
+** Front
+What does ~new Type~ return? And ~new Type[n]~?
+** Back
+~new Type~ returns a pointer to that type: ~Type*~
+~new Type[n]~ returns a pointer to an array: ~Type*~
+
+#+begin_src c++
+int* p1 = new int;      // single int
+int* p2 = new int[10];  // array of 10 ints
+#+end_src
+
+* What does delete do? :cpp:cpp:
+:PROPERTIES:
+:ANKI_NOTE_TYPE: Basic
+:END:
+** Front
+What does ~delete~ do? ~delete[]~?
+** Back
+~delete~ frees a single object allocated with ~new~
+~delete[]~ frees an array allocated with ~new[]~
+
+#+begin_src c++
+int* p1 = new int;
+int* p2 = new int[10];
+delete p1;   // free single object
+delete[] p2; // free array
+#+end_src
+
+Mismatching ~delete~ with ~new[]~ causes undefined behavior.
+
+* Is this correct Ufds with C-style array? :cpp:cpp:
+:PROPERTIES:
+:ANKI_NOTE_TYPE: Basic
+:END:
+** Front
+Is this correct?
+#+begin_src c++
+class Ufds {
+private:
+  int* parent;
+public:
+  Ufds(int n) : parent(new int[n]) {}
+  ~Ufds() { delete[] parent; }
+};
+#+end_src
+** Back
+Correct. This manually manages the heap-allocated array.
+
+- ~new int[n]~ allocates array of ~n~ ints on heap
+- ~delete[] parent~ in destructor frees it
+
+This works but requires manual cleanup — error prone compared to ~std::vector~.
+
+* Equivalent std::array version: correct or wrong? :cpp:cpp:
+:PROPERTIES:
+:ANKI_NOTE_TYPE: Basic
+:END:
+** Front
+Is this correct?
+#+begin_src c++
+#include <array>
+#include <numeric>
+
+class Ufds {
+private:
+  std::array<int, 10> parent;
+public:
+  Ufds() {
+    std::iota(parent.begin(), parent.end(), 0);
+  }
+};
+#+end_src
+** Back
+Correct for compile-time fixed size.
+
+But size ~10~ is hardcoded — not parameterized.
+
+~std::array<T, N>~ requires ~N~ be a compile-time constant.
+
+For runtime size, must use ~std::vector~.
+
+* std::array vs vector vs C-style array :cpp:cpp:
+:PROPERTIES:
+:ANKI_NOTE_TYPE: Basic
+:END:
+** Front
+Compare ~std::array<T, N>~, ~std::vector<T>~, and ~T*~ for Ufds parent array
+** Back
+| Type             | Size         | Cleanup           | Use when                   |
+|------------------+--------------+-------------------+----------------------------|
+| ~std::array<T, N>~ | compile-time | automatic         | size known at compile time |
+| ~std::vector<T>~   | runtime      | automatic         | size known at runtime      |
+| ~T* p = new T[n]~  | runtime      | manual (~delete[]~) | legacy code only           |
+
+* Vector vs Array: is vector backed by array? :cpp:cpp:
+:PROPERTIES:
+:ANKI_NOTE_TYPE: Basic
+:END:
+** Front
+Is ~std::vector~ backed by an array? How does it support variable length?
+** Back
+Yes, ~std::vector~ allocates a contiguous memory block (like a heap array).
+
+Typical implementation: three pointers
+- ~start~ — pointer to data
+- ~finish~ — pointer to last element
+- ~end_of_storage~ — pointer to end of allocated capacity
+
+#+begin_src c++
+vector<int> v(5);
+v.push_back(1);  // may trigger reallocation if capacity exceeded
+#+end_src
+
+When capacity is exceeded, vector: allocates larger block, copies elements, deallocates old block.
+
+* Vector size vs capacity :cpp:cpp:
+:PROPERTIES:
+:ANKI_NOTE_TYPE: Basic
+:END:
+** Front
+What is the difference between ~size()~ and ~capacity()~ in ~std::vector~?
+** Back
+- ~size()~ — number of actual elements stored
+- ~capacity()~ — allocated storage space
+
+#+begin_src c++
+std::vector<int> v(3);  // size=3, capacity=3
+v.push_back(1);        // size=4, capacity may be >=4
+v.push_back(1);        // size=5, capacity may be >=5
+#+end_src
+
+~reserve(n)~ pre-allocates capacity without resizing.
+
+* Comparison: which Ufds storage to use? :cpp:cpp:
+:PROPERTIES:
+:ANKI_NOTE_TYPE: Basic
+:END:
+** Front
+When would you choose ~std::array~, ~std::vector~, or ~T* new~ for Ufds parent array?
+** Back
+~std::array<T, N>~ — only if N is known at compile time
+
+~std::vector<T>~ — if size is determined at runtime (typical Ufds case)
+
+~T* new T[n]~ — legacy code only; ~vector~ is safer and equivalent performance
+
+For Ufds: ~std::vector~ is the idiomatic choice because size is runtime-determined.
+
+* C-style array key properties :cpp:cpp:
+:PROPERTIES:
+:ANKI_NOTE_TYPE: Basic
+:END:
+** Front
+What are key properties of C-style arrays (raw arrays)?
+** Back
+#+begin_src c++
+int arr[5];         // fixed size, stack-allocated
+int* p = new int[n]; // dynamic size, heap-allocated
+#+end_src
+
+- Decay to pointer on function call (lose size info)
+- No bounds checking
+- ~sizeof(arr)~ gives bytes, not element count
+- Must manage lifetime manually for heap arrays
+
+C-style arrays in C++ are generally avoided in favor of ~std::array~/~std::vector~.
+
+* std::iota for Ufds initialization :cpp:cpp:
+:PROPERTIES:
+:ANKI_NOTE_TYPE: Basic
+:END:
+** Front
+How to use ~std::iota~ to initialize Ufds parent array to ~[0, 1, 2, ..., n-1]~
+** Back
+#+begin_src c++
+#include <numeric>
+
+std::vector<int> parent(n);
+std::iota(parent.begin(), parent.end(), 0);
+#+end_src
+
+~iota~ fills the range with consecutive values starting from given start value
+
+* Path compression in find() :cpp:datastructure:
+:PROPERTIES:
+:ANKI_NOTE_TYPE: Basic
+:END:
+** Front
+What does "path compression" mean in ~find()~?
+** Back
+After finding the root, point each visited node directly to the root:
+
+#+begin_src c++
+parent[x] = find(parent[x]);  // compresses path
+#+end_src
+
+This flattens the tree structure, making future ~find()~ calls O(α(n)) ≈ O(1)
+
+* Original Ufds mistakes: is this correct? :cpp:cpp:
+:PROPERTIES:
+:ANKI_NOTE_TYPE: Basic
+:END:
+** Front
+Is this correct?
+#+begin_src c++
+class Udfs {
+  int parent[];
+  vector<int> p;
+public:
+  Ufds(int n) {
+    p = new vector<int>;
+  }
+  ~Ufds() {
+    p ? free p;
+  }
+};
+#+end_src
+** Back
+Wrong. Multiple errors:
+
+1. ~int parent[]~ — illegal incomplete array type
+
+2. ~Ufds(int n)~ constructor but class is ~Udfs~ (typo)
+
+3. ~p = new vector<int>~ — can't assign pointer to vector
+
+4. ~free p~ — can't free a vector, and ~p~ is not a pointer
+
+5. ~Ufds~ destructor but class is ~Udfs~ (typo)
+
+* Heap allocation: what and why? :cpp:cpp:
+:PROPERTIES:
+:ANKI_NOTE_TYPE: Basic
+:END:
+** Front
+What does "heap allocated" mean? Why use it?
+** Back
+Heap allocation with ~new~ lives until ~delete~ is called or program ends.
+
+Stack allocation (local variables) is自动 freed when out of scope.
+
+#+begin_src c++
+int arr[10];           // stack — freed when function returns
+int* p = new int[10]; // heap — lives until delete[]
+#+end_src
+
+Heap needed when: size unknown at compile time, or object must outlive scope.