🛠 What I have done
1. Strengthened operational visibility across the system
Today was largely about making invisible failures visible.
I extended the Incident Hub so it is no longer just a dumping ground for generic errors, but a real diagnostic tool:
- Frontend failures from:
- Customers > Homework uploads
- Admin > Academic > Schedules delete actions
are now logged with rich, structured context.
- Each incident now carries:
- actor identity (admin / customer)
- exact entity IDs (homework, schedule, student)
- HTTP status codes
- raw backend error payloads
- request location and timestamps
This directly paid off when a vague UI message
“Failed to remove schedule. Please try again.”
turned into a clear backend truth:
“Cannot delete schedule with attendance records.”
Instead of guessing, I now had evidence.
2. Fixed admin schedule deletion semantics
Once the real error surfaced, I corrected the behavior:
- Admins deleting a schedule should not be forced to manually clean dependent attendance records.
- I changed the deletion flow so:
- attendance records are deleted first
- schedule deletion follows
- everything runs inside a transaction
This aligns the system with admin intent rather than raw DB constraints, and removes unnecessary friction.
3. Investigated and stabilized video processing failures
This was the most technically dense part of the day.
Initial symptom
Repeated failures in ProcessVideoProof with:
Allowed memory size of 268435456 bytes exhausted
Even though:
- PHP memory_limit was supposedly set to 512MB
- queue worker was run with
--memory=512
Key findings
--memoryinqueue:workdoes NOT change PHP’s memory_limit.- The worker process was still running under a PHP CLI config capped at 256MB.
- The actual crash happened inside
guzzlehttp/psr7, strongly indicating:- the job was buffering a large video into memory
Corrective actions
- Forced the queue worker to run with:
php -d memory_limit=512M artisan queue:work - Reworked
ProcessVideoProofto use stream-based IO:- stream from R2
- write to local temp file
- run ffmpeg on disk, not memory
During this refactor, I hit a second failure:
Call to undefined method Filesystem::putStream()
Which revealed:
- Laravel’s Flysystem adapter does not support
putStream() - The correct abstraction is
writeStream()
This was fixed, and the job logic now aligns with Laravel’s actual filesystem API.
4. Reframed “truth” in media processing
I removed reliance on fragile status flags when deciding whether a video needs optimisation.
Instead:
- I based all logic on variant count:
- 1 variant → original only → eligible for optimisation
- 1 variants → already optimised
This change affected:
- Admin > Insights > Media UI
- Backend dispatch guards
I also:
- replaced text buttons with a compact icon-only action
- hid the action immediately after dispatch (optimistic UI)
- prevented duplicate job dispatches at both UI and backend layers
5. Corrected recurring task explosion
I addressed a serious data-generation flaw in Admin > Academic > Tasks:
- Creating a daily recurring task was generating many future instances immediately.
- This breaks:
- predictability
- cleanup
- operational control
I redefined the system behavior:
- On creation:
- create exactly one task instance
- Future instances:
- generated only by Kernel Scheduler
- strictly one per day
- idempotent
To clean existing damage, I designed:
- a safe Artisan command to remove future-dated daily task instances
- dry-run by default
- strict scoping to recurring tasks only
6. Built admin-grade queue tooling
To support long-term stability, I designed Admin > Insights > Jobs:
- Visibility into:
- pending jobs
- failed jobs
- Admin actions:
- re-run failed jobs immediately
- delete stuck jobs
- All actions:
- backend-driven
- logged
- idempotent
This reduces reliance on SSH + manual DB inspection and moves ops control into the app itself.
🧠 What I have learned
1. Runtime reality always beats configuration belief
If PHP crashes at 256MB, then the process is capped at 256MB — no matter what the config file says.
The only reliable truth is:
- stack traces
- byte values
- live runtime logging
2. Streaming is not an optimization — it’s a requirement
Any system that:
- loads large media into memory
- casts streams to strings
- assumes “it fits”
will fail eventually.
Correct architecture:
remote stream → disk → processor → disk → remote stream
Memory should never be the transport layer.
3. Framework abstractions are sharp tools
Flysystem’s API differences (putStream vs writeStream) are subtle but fatal.
Lesson:
- never assume a method exists because it “sounds right”
- always code against what the framework actually exposes
4. Status flags are symptoms, not truth
Statuses drift.
Queues lag.
Jobs retry.
Data structure (variants, counts, existence) is far more reliable than enum-like state flags.
5. Recurrence must be centralized
Any recurrence logic spread across:
- controllers
- UI
- ad-hoc scripts
will eventually explode.
Kernel Scheduler as the single source of recurrence is not just cleaner — it’s recoverable.
6. Observability is a force multiplier
Every improvement to:
- logging
- incident reporting
- admin insight tools
pays back immediately when something breaks.
Today’s debugging would have taken days without:
- Incident Hub
- structured errors
- job inspection tools
