Alice Technical Journal — 2026-01-27

🛠 What I have done

1. Strengthened operational visibility across the system

Today was largely about making invisible failures visible.

I extended the Incident Hub so it is no longer just a dumping ground for generic errors, but a real diagnostic tool:

  • Frontend failures from:
    • Customers > Homework uploads
    • Admin > Academic > Schedules delete actions
      are now logged with rich, structured context.
  • Each incident now carries:
    • actor identity (admin / customer)
    • exact entity IDs (homework, schedule, student)
    • HTTP status codes
    • raw backend error payloads
    • request location and timestamps

This directly paid off when a vague UI message

“Failed to remove schedule. Please try again.”
turned into a clear backend truth:
“Cannot delete schedule with attendance records.”

Instead of guessing, I now had evidence.


2. Fixed admin schedule deletion semantics

Once the real error surfaced, I corrected the behavior:

  • Admins deleting a schedule should not be forced to manually clean dependent attendance records.
  • I changed the deletion flow so:
    • attendance records are deleted first
    • schedule deletion follows
    • everything runs inside a transaction

This aligns the system with admin intent rather than raw DB constraints, and removes unnecessary friction.


3. Investigated and stabilized video processing failures

This was the most technically dense part of the day.

Initial symptom

Repeated failures in ProcessVideoProof with:

Allowed memory size of 268435456 bytes exhausted

Even though:

  • PHP memory_limit was supposedly set to 512MB
  • queue worker was run with --memory=512

Key findings

  • --memory in queue:work does NOT change PHP’s memory_limit.
  • The worker process was still running under a PHP CLI config capped at 256MB.
  • The actual crash happened inside guzzlehttp/psr7, strongly indicating:
    • the job was buffering a large video into memory

Corrective actions

  • Forced the queue worker to run with:php -d memory_limit=512M artisan queue:work
  • Reworked ProcessVideoProof to use stream-based IO:
    • stream from R2
    • write to local temp file
    • run ffmpeg on disk, not memory

During this refactor, I hit a second failure:

Call to undefined method Filesystem::putStream()

Which revealed:

  • Laravel’s Flysystem adapter does not support putStream()
  • The correct abstraction is writeStream()

This was fixed, and the job logic now aligns with Laravel’s actual filesystem API.


4. Reframed “truth” in media processing

I removed reliance on fragile status flags when deciding whether a video needs optimisation.

Instead:

  • I based all logic on variant count:
    • 1 variant → original only → eligible for optimisation
    • 1 variants → already optimised

This change affected:

  • Admin > Insights > Media UI
  • Backend dispatch guards

I also:

  • replaced text buttons with a compact icon-only action
  • hid the action immediately after dispatch (optimistic UI)
  • prevented duplicate job dispatches at both UI and backend layers

5. Corrected recurring task explosion

I addressed a serious data-generation flaw in Admin > Academic > Tasks:

  • Creating a daily recurring task was generating many future instances immediately.
  • This breaks:
    • predictability
    • cleanup
    • operational control

I redefined the system behavior:

  • On creation:
    • create exactly one task instance
  • Future instances:
    • generated only by Kernel Scheduler
    • strictly one per day
    • idempotent

To clean existing damage, I designed:

  • a safe Artisan command to remove future-dated daily task instances
  • dry-run by default
  • strict scoping to recurring tasks only

6. Built admin-grade queue tooling

To support long-term stability, I designed Admin > Insights > Jobs:

  • Visibility into:
    • pending jobs
    • failed jobs
  • Admin actions:
    • re-run failed jobs immediately
    • delete stuck jobs
  • All actions:
    • backend-driven
    • logged
    • idempotent

This reduces reliance on SSH + manual DB inspection and moves ops control into the app itself.


🧠 What I have learned

1. Runtime reality always beats configuration belief

If PHP crashes at 256MB, then the process is capped at 256MB — no matter what the config file says.
The only reliable truth is:

  • stack traces
  • byte values
  • live runtime logging

2. Streaming is not an optimization — it’s a requirement

Any system that:

  • loads large media into memory
  • casts streams to strings
  • assumes “it fits”

will fail eventually.

Correct architecture:

remote stream → disk → processor → disk → remote stream

Memory should never be the transport layer.


3. Framework abstractions are sharp tools

Flysystem’s API differences (putStream vs writeStream) are subtle but fatal.

Lesson:

  • never assume a method exists because it “sounds right”
  • always code against what the framework actually exposes

4. Status flags are symptoms, not truth

Statuses drift.
Queues lag.
Jobs retry.

Data structure (variants, counts, existence) is far more reliable than enum-like state flags.


5. Recurrence must be centralized

Any recurrence logic spread across:

  • controllers
  • UI
  • ad-hoc scripts

will eventually explode.

Kernel Scheduler as the single source of recurrence is not just cleaner — it’s recoverable.


6. Observability is a force multiplier

Every improvement to:

  • logging
  • incident reporting
  • admin insight tools

pays back immediately when something breaks.

Today’s debugging would have taken days without:

  • Incident Hub
  • structured errors
  • job inspection tools