Technical Journal — Files System (08/02)

Focus: EXIF re-extraction, colour profile normalization, and making the system observable + consistent across CLI, jobs, and APIs.

What I did

Shipped a bulk EXIF re-extract API
- Built /api/v1/exif/reextract-bulk to accept a list of image UUIDs and re-run EXIF extraction at scale.
- Ensured it reuses the same extraction pipeline as the CLI command instead of duplicating logic.
- Made the endpoint behave like other v1 APIs (removed “internal-only” restrictions, aligned auth/middleware).
Hardened colour profile handling
- Fixed a real production edge case:
  - When ICC-header.ColorSpaceData exists and trims to “RGB”, it must override ExifIFD.ColorSpace(even if EXIF says sRGB).
- Normalized colour profile now correctly reports:
  - space: “RGB”
  - source: “icc_header”
- Raw EXIF remains untouched; only derived data is affected.
Fixed a production TypeError
- Found and resolved a strict return-type bug in ColorProfileFromMetadata.
- The extractor previously assumed colour space always exists — reality disagreed.
- Updated logic to safely handle missing ICC + EXIF colour space without throwing.
- Result: no more 500s when images legitimately have no colour space data.
Unified the main EXIF extractor job
- Updated the existing EXIF extraction job to use the same shared service as:
  - CLI re-extract
  - Bulk re-extract API
- This removed silent divergence between “initial ingest” and “manual re-extract”.
Added a read-only bulk EXIF fetch endpoint
- Implemented a new endpoint with the same payload as reextract-bulk, but read-only.
- Purpose: fetch persisted EXIF + normalized colour profile directly from DB.
- Useful for inspection, debugging, and client-side validation without mutation.
Improved observability without log pollution
- Continued using bordered-investigation.log for:
  - EXIF edge cases
  - missing colour space diagnostics
  - local path / metadata investigations
- Kept production logs clean and meaningful.

What I learned

Type systems don’t protect you from reality
- Strict return types are only correct if your data model matches the real world.
- EXIF data is messy, optional, and inconsistent — code must reflect that truth.
Single source of truth matters more than speed
- The biggest long-term win was forcing CLI, job, and API to share the same extractor.
- Any duplication here would silently rot over time.
“Read” endpoints are as important as “write” endpoints
- Being able to fetch raw EXIF + derived state from DB is critical for debugging and trust.
- Mutation-only APIs make systems opaque and stressful to operate.
Observability needs intent, not volume
- A dedicated investigation log is far more valuable than spamming production logs.
- Knowing where to log is just as important as knowing what to log.
Colour management is full of traps
- ICC headers can be more authoritative than EXIF tags.
- Normalization rules must be explicit, documented, and tested — assumptions will fail.

Overall:

Today was about turning EXIF handling from a “best effort” feature into a reliable, inspectable, and repeatable system. Less magic, more truth.