feat: add UFDS flashcards and project learnings infrastructure

2026-05-04 09:01:28 +08:00
parent 69676a84be
commit 3ab8ba001d
4 changed files with 498 additions and 0 deletions
@@ -31,3 +31,12 @@ code
 ## Self-Improvement
 Periodically review this file and suggest improvements to the user if you notice gaps, inconsistencies, or missing conventions.
 ## Active Context
 <!-- AI assistant maintains this section. Keep under 20 lines. -->
 <!-- Updated automatically by /self-improve. Remove stale entries. -->
 - Branch: `master`, 1 commit ahead of origin (unpushed)
 - Untracked files: `org/cpp/dsa/` and `org/cpp/ufds.org` (not yet committed)
 - Current work: UFDS flashcards (402-line proper card set) + DSA subdirectory
 - Inbox items: binary search, `using` keyword — need cards created
 - Possible cleanup: `org/cpp/dsa/udfs.org` may be a superseded draft of `org/cpp/ufds.org`
@@ -0,0 +1,46 @@
 # Project Learnings
 > Auto-maintained by the self-improve skill. Read at session start, updated at session end.
 ## Patterns That Work
 <!-- Approaches that produced good results -->
 ## Mistakes to Avoid
 <!-- Failed approaches and why they failed -->
 ## Codebase Conventions
 **[2026-05-04] — File organization**
 - Observation: Most flashcard files are flat in `org/cpp/`, but subdirectories exist (`tricks/`, `dsa/`) for topic grouping. AGENTS.md only documents the flat convention.
 - Action: When creating new cards, use flat `org/cpp/topic.org` for STL/language features; subdirectories for broader categories (DSA, tricks). Propose updating AGENTS.md if this solidifies.
 - Confidence: medium
 **[2026-05-04] — Card format variance**
 - Observation: `org/cpp/dsa/udfs.org` uses raw `#+title:` + code blocks without ANKI properties. `org/cpp/ufds.org` follows the proper Anki card format. The proper format (with ANKI_NOTE_TYPE, Front/Back sections) is what gets exported.
 - Action: Always use the full Anki card format from AGENTS.md when creating flashcards. Raw code files in dsa/ may be scratch/reference, not export-ready cards.
 - Confidence: medium
 **[2026-05-04] — Naming: UFDS not UDFS**
 - Observation: `org/cpp/dsa/udfs.org` is a typo — the data structure is "Union-Find Disjoint Set" = UFDS. The properly-formatted file `org/cpp/ufds.org` uses the correct name.
 - Action: Use "ufds" spelling. The dsa/udfs.org appears to be an earlier draft.
 - Confidence: high
 ## Environment & Config
 **[2026-05-04] — Git state**
 - Observation: Single branch `master` with remote `origin/master`. No branching workflow — commits go directly to master.
 - Action: Commit directly to master. Push when work is complete.
 - Confidence: high
 ## Business Context
 **[2026-05-04] — Study focus**
 - Observation: Recent commits cover STL containers (deque, array, set, map, iterators) and DSA (UFDS). Inbox has LeetCode solutions (two sum, max consecutive ones) with notes to learn binary search and `using` keyword.
 - Action: Current study trajectory is STL containers + competitive programming DSA. Prioritize cards for these topics.
 - Confidence: high
 ## Open Questions
 **[2026-05-04] — Duplicate UFDS files**
 - Question: Are both `org/cpp/ufds.org` (402 lines, proper format) and `org/cpp/dsa/udfs.org` (41 lines, raw code) needed? The former seems to supersede the latter.
 - Action: Ask user if `dsa/udfs.org` should be removed or merged.
@@ -0,0 +1,41 @@
 #+title: Udfs
 * impl
 #+begin_src cpp
 #include <vector>
 #include <numeric>
 #include <algorithm>
 class Ufs {
  private:
  std::vector<int> p;
  std::vector<int> s;
  std::vector<int> r;
  public:
  Ufds(int n) {
    p.resize(n);
    std::iota(p.begin(), p.end(), 0);
    s.assign(n, 1); // test on equality with the assign fill and iota explain the difference
    std::fill(r.begin(),r.end(),0);
    numSets = n;
  }
  ~Ufds() {}
  ~Ufds() = default;
  int find(int x) {
    if (p[x] == x) return x;
    return p[x] = find(p[x]);
  }
  int find(int x) {
    int px = p[x];
    if (px != x) {
      px = find(px);
    }
    p[x] = px;
    return px;
  }
 };
 #+end_src
 should ask what new returns? delete? what happens? why are we deleting it ? heap allocated?
 should show equivalent version of using std::array to vector
 should ask what resize does?
 what about a true dynammic version where we create it upon calling find()
@@ -0,0 +1,402 @@
 * Ufds: Union-Find Disjoint Set :cpp:datastructure:
 :PROPERTIES:
 :ANKI_NOTE_TYPE: Basic
 :END:
 ** Front
 Write a minimal Ufds class with vector parent, constructor initializing parents to self
 ** Back
 #+begin_src c++
 #include <vector>
 #include <numeric>
 class Ufds {
 private:
  std::vector<int> parent;
 public:
  Ufds(int n) : parent(n) {
    std::iota(parent.begin(), parent.end(), 0);
  }
 };
 #+end_src
 * Ufds find() with path compression :cpp:datastructure:
 :PROPERTIES:
 :ANKI_NOTE_TYPE: Basic
 :END:
 ** Front
 Write the ~find()~ method for Ufds with path compression
 ** Back
 #+begin_src c++
 int find(int x) {
    if (parent[x] != x) {
        parent[x] = find(parent[x]);
    }
    return parent[x];
 }
 #+end_src
 * Concise Ufds find() :cpp:datastructure:
 :PROPERTIES:
 :ANKI_NOTE_TYPE: Basic
 :END:
 ** Front
 Is this correct? What does it do?
 #+begin_src c++
 int find(int x) {
    if (parent[x] == x) return x;
    return parent[x] = find(parent[x]);
 }
 #+end_src
 ** Back
 Yes, correct and equivalent to the longer version.
 - Base case: if ~x~ is its own parent, return ~x~
 - Recursive case: find root of parent, then assign and return
 The assignment ~parent[x] = find(parent[x])~ returns the result while compressing the path.
 * Ufds constructor: initializer list vs body :cpp:cpp:
 :PROPERTIES:
 :ANKI_NOTE_TYPE: Basic
 :END:
 ** Front
 What is the difference between these two Ufds constructors?
 #+begin_src c++
 // Initializer list
 Ufds(int n) : parent(n) {
  std::iota(parent.begin(), parent.end(), 0);
 }
 // Body style
 Ufds(int n) {
  parent.resize(n);
  std::iota(parent.begin(), parent.end(), 0);
 }
 #+end_src
 ** Back
 Initializer list: direct constructs ~parent~ with size ~n~ (one allocation)
 Body style: default constructs ~parent~, then resize (potential double allocation)
 Both produce same result, initializer list is more efficient.
 * What does std::vector::resize do? :cpp:cpp:
 :PROPERTIES:
 :ANKI_NOTE_TYPE: Basic
 :END:
 ** Front
 What does ~resize(n)~ do on a ~std::vector<int>~?
 ** Back
 Resizes the vector to have ~n~ elements:
 - If ~n~ > current size: adds elements (value-initialized, usually 0)
 - If ~n~ < current size: truncates the vector
 - Reallocates if capacity < n
 #+begin_src c++
 std::vector<int> v;
 v.resize(5);  // v = {0, 0, 0, 0, 0}
 #+end_src
 * Ufds destructor: when to omit :cpp:cpp:
 :PROPERTIES:
 :ANKI_NOTE_TYPE: Basic
 :END:
 ** Front
 Should Ufds have a destructor? ~Ufds() {}~ vs ~Ufds() = default~
 ** Back
 Omit entirely if only using ~std::vector~ (auto-cleanup)
 If you must write one, prefer ~= default~ to explicitly show intent
 Empty ~{}~ works but ~= default~ is clearer intent for future maintenance
 * Correct Ufds constructor: is this correct? :cpp:cpp:
 :PROPERTIES:
 :ANKI_NOTE_TYPE: Basic
 :END:
 ** Front
 Is this correct?
 #+begin_src c++
 Ufds(int n) {
  p.resize(n);
  std::iota(p.begin(), p.end(), 0);
 }
 #+end_src
 ** Back
 Yes, correct. This is the body-style initialization.
 One potential issue: ~p~ is default-constructed first, then ~resize~ may reallocate.
 For efficiency, prefer initializer list: ~Ufds(int n) : p(n)~
 * Incorrect Ufds constructor: correct or wrong? :cpp:cpp:
 :PROPERTIES:
 :ANKI_NOTE_TYPE: Basic
 :END:
 ** Front
 Is this correct?
 #+begin_src c++
 std::vector<int> p;
 Ufds(int n) {
  p = new std::vector<int>(n);
 }
 #+end_src
 ** Back
 Wrong.
 ~new std::vector<int>(n)~ returns a ~vector<int>*~ (pointer), but ~p~ is a ~vector<int>~ (not a pointer).
 Can't assign a pointer to a vector.
 * What does new return? :cpp:cpp:
 :PROPERTIES:
 :ANKI_NOTE_TYPE: Basic
 :END:
 ** Front
 What does ~new Type~ return? And ~new Type[n]~?
 ** Back
 ~new Type~ returns a pointer to that type: ~Type*~
 ~new Type[n]~ returns a pointer to an array: ~Type*~
 #+begin_src c++
 int* p1 = new int;      // single int
 int* p2 = new int[10];  // array of 10 ints
 #+end_src
 * What does delete do? :cpp:cpp:
 :PROPERTIES:
 :ANKI_NOTE_TYPE: Basic
 :END:
 ** Front
 What does ~delete~ do? ~delete[]~?
 ** Back
 ~delete~ frees a single object allocated with ~new~
 ~delete[]~ frees an array allocated with ~new[]~
 #+begin_src c++
 int* p1 = new int;
 int* p2 = new int[10];
 delete p1;   // free single object
 delete[] p2; // free array
 #+end_src
 Mismatching ~delete~ with ~new[]~ causes undefined behavior.
 * Is this correct Ufds with C-style array? :cpp:cpp:
 :PROPERTIES:
 :ANKI_NOTE_TYPE: Basic
 :END:
 ** Front
 Is this correct?
 #+begin_src c++
 class Ufds {
 private:
  int* parent;
 public:
  Ufds(int n) : parent(new int[n]) {}
  ~Ufds() { delete[] parent; }
 };
 #+end_src
 ** Back
 Correct. This manually manages the heap-allocated array.
 - ~new int[n]~ allocates array of ~n~ ints on heap
 - ~delete[] parent~ in destructor frees it
 This works but requires manual cleanup — error prone compared to ~std::vector~.
 * Equivalent std::array version: correct or wrong? :cpp:cpp:
 :PROPERTIES:
 :ANKI_NOTE_TYPE: Basic
 :END:
 ** Front
 Is this correct?
 #+begin_src c++
 #include <array>
 #include <numeric>
 class Ufds {
 private:
  std::array<int, 10> parent;
 public:
  Ufds() {
    std::iota(parent.begin(), parent.end(), 0);
  }
 };
 #+end_src
 ** Back
 Correct for compile-time fixed size.
 But size ~10~ is hardcoded — not parameterized.
 ~std::array<T, N>~ requires ~N~ be a compile-time constant.
 For runtime size, must use ~std::vector~.
 * std::array vs vector vs C-style array :cpp:cpp:
 :PROPERTIES:
 :ANKI_NOTE_TYPE: Basic
 :END:
 ** Front
 Compare ~std::array<T, N>~, ~std::vector<T>~, and ~T*~ for Ufds parent array
 ** Back
 | Type             | Size         | Cleanup           | Use when                   |
 |------------------+--------------+-------------------+----------------------------|
 | ~std::array<T, N>~ | compile-time | automatic         | size known at compile time |
 | ~std::vector<T>~   | runtime      | automatic         | size known at runtime      |
 | ~T* p = new T[n]~  | runtime      | manual (~delete[]~) | legacy code only           |
 * Vector vs Array: is vector backed by array? :cpp:cpp:
 :PROPERTIES:
 :ANKI_NOTE_TYPE: Basic
 :END:
 ** Front
 Is ~std::vector~ backed by an array? How does it support variable length?
 ** Back
 Yes, ~std::vector~ allocates a contiguous memory block (like a heap array).
 Typical implementation: three pointers
 - ~start~ — pointer to data
 - ~finish~ — pointer to last element
 - ~end_of_storage~ — pointer to end of allocated capacity
 #+begin_src c++
 vector<int> v(5);
 v.push_back(1);  // may trigger reallocation if capacity exceeded
 #+end_src
 When capacity is exceeded, vector: allocates larger block, copies elements, deallocates old block.
 * Vector size vs capacity :cpp:cpp:
 :PROPERTIES:
 :ANKI_NOTE_TYPE: Basic
 :END:
 ** Front
 What is the difference between ~size()~ and ~capacity()~ in ~std::vector~?
 ** Back
 - ~size()~ — number of actual elements stored
 - ~capacity()~ — allocated storage space
 #+begin_src c++
 std::vector<int> v(3);  // size=3, capacity=3
 v.push_back(1);        // size=4, capacity may be >=4
 v.push_back(1);        // size=5, capacity may be >=5
 #+end_src
 ~reserve(n)~ pre-allocates capacity without resizing.
 * Comparison: which Ufds storage to use? :cpp:cpp:
 :PROPERTIES:
 :ANKI_NOTE_TYPE: Basic
 :END:
 ** Front
 When would you choose ~std::array~, ~std::vector~, or ~T* new~ for Ufds parent array?
 ** Back
 ~std::array<T, N>~ — only if N is known at compile time
 ~std::vector<T>~ — if size is determined at runtime (typical Ufds case)
 ~T* new T[n]~ — legacy code only; ~vector~ is safer and equivalent performance
 For Ufds: ~std::vector~ is the idiomatic choice because size is runtime-determined.
 * C-style array key properties :cpp:cpp:
 :PROPERTIES:
 :ANKI_NOTE_TYPE: Basic
 :END:
 ** Front
 What are key properties of C-style arrays (raw arrays)?
 ** Back
 #+begin_src c++
 int arr[5];         // fixed size, stack-allocated
 int* p = new int[n]; // dynamic size, heap-allocated
 #+end_src
 - Decay to pointer on function call (lose size info)
 - No bounds checking
 - ~sizeof(arr)~ gives bytes, not element count
 - Must manage lifetime manually for heap arrays
 C-style arrays in C++ are generally avoided in favor of ~std::array~/~std::vector~.
 * std::iota for Ufds initialization :cpp:cpp:
 :PROPERTIES:
 :ANKI_NOTE_TYPE: Basic
 :END:
 ** Front
 How to use ~std::iota~ to initialize Ufds parent array to ~[0, 1, 2, ..., n-1]~
 ** Back
 #+begin_src c++
 #include <numeric>
 std::vector<int> parent(n);
 std::iota(parent.begin(), parent.end(), 0);
 #+end_src
 ~iota~ fills the range with consecutive values starting from given start value
 * Path compression in find() :cpp:datastructure:
 :PROPERTIES:
 :ANKI_NOTE_TYPE: Basic
 :END:
 ** Front
 What does "path compression" mean in ~find()~?
 ** Back
 After finding the root, point each visited node directly to the root:
 #+begin_src c++
 parent[x] = find(parent[x]);  // compresses path
 #+end_src
 This flattens the tree structure, making future ~find()~ calls O(α(n)) ≈ O(1)
 * Original Ufds mistakes: is this correct? :cpp:cpp:
 :PROPERTIES:
 :ANKI_NOTE_TYPE: Basic
 :END:
 ** Front
 Is this correct?
 #+begin_src c++
 class Udfs {
  int parent[];
  vector<int> p;
 public:
  Ufds(int n) {
    p = new vector<int>;
  }
  ~Ufds() {
    p ? free p;
  }
 };
 #+end_src
 ** Back
 Wrong. Multiple errors:
 1. ~int parent[]~ — illegal incomplete array type
 2. ~Ufds(int n)~ constructor but class is ~Udfs~ (typo)
 3. ~p = new vector<int>~ — can't assign pointer to vector
 4. ~free p~ — can't free a vector, and ~p~ is not a pointer
 5. ~Ufds~ destructor but class is ~Udfs~ (typo)
 * Heap allocation: what and why? :cpp:cpp:
 :PROPERTIES:
 :ANKI_NOTE_TYPE: Basic
 :END:
 ** Front
 What does "heap allocated" mean? Why use it?
 ** Back
 Heap allocation with ~new~ lives until ~delete~ is called or program ends.
 Stack allocation (local variables) is自动 freed when out of scope.
 #+begin_src c++
 int arr[10];           // stack — freed when function returns
 int* p = new int[10]; // heap — lives until delete[]
 #+end_src
 Heap needed when: size unknown at compile time, or object must outlive scope.