Focus: EXIF re-extraction, colour profile normalization, and making the system observable + consistent across CLI, jobs, and APIs.
What I did
- Shipped a bulk EXIF re-extract API
- Built /api/v1/exif/reextract-bulk to accept a list of image UUIDs and re-run EXIF extraction at scale.
- Ensured it reuses the same extraction pipeline as the CLI command instead of duplicating logic.
- Made the endpoint behave like other v1 APIs (removed “internal-only” restrictions, aligned auth/middleware).
- Hardened colour profile handling
- Fixed a real production edge case:
- When ICC-header.ColorSpaceData exists and trims to “RGB”, it must override ExifIFD.ColorSpace(even if EXIF says sRGB).
- Normalized colour profile now correctly reports:
- space: “RGB”
- source: “icc_header”
- Raw EXIF remains untouched; only derived data is affected.
- Fixed a real production edge case:
- Fixed a production TypeError
- Found and resolved a strict return-type bug in ColorProfileFromMetadata.
- The extractor previously assumed colour space always exists — reality disagreed.
- Updated logic to safely handle missing ICC + EXIF colour space without throwing.
- Result: no more 500s when images legitimately have no colour space data.
- Unified the main EXIF extractor job
- Updated the existing EXIF extraction job to use the same shared service as:
- CLI re-extract
- Bulk re-extract API
- This removed silent divergence between “initial ingest” and “manual re-extract”.
- Updated the existing EXIF extraction job to use the same shared service as:
- Added a read-only bulk EXIF fetch endpoint
- Implemented a new endpoint with the same payload as reextract-bulk, but read-only.
- Purpose: fetch persisted EXIF + normalized colour profile directly from DB.
- Useful for inspection, debugging, and client-side validation without mutation.
- Improved observability without log pollution
- Continued using bordered-investigation.log for:
- EXIF edge cases
- missing colour space diagnostics
- local path / metadata investigations
- Kept production logs clean and meaningful.
- Continued using bordered-investigation.log for:
What I learned
- Type systems don’t protect you from reality
- Strict return types are only correct if your data model matches the real world.
- EXIF data is messy, optional, and inconsistent — code must reflect that truth.
- Single source of truth matters more than speed
- The biggest long-term win was forcing CLI, job, and API to share the same extractor.
- Any duplication here would silently rot over time.
- “Read” endpoints are as important as “write” endpoints
- Being able to fetch raw EXIF + derived state from DB is critical for debugging and trust.
- Mutation-only APIs make systems opaque and stressful to operate.
- Observability needs intent, not volume
- A dedicated investigation log is far more valuable than spamming production logs.
- Knowing where to log is just as important as knowing what to log.
- Colour management is full of traps
- ICC headers can be more authoritative than EXIF tags.
- Normalization rules must be explicit, documented, and tested — assumptions will fail.
Overall:
Today was about turning EXIF handling from a “best effort” feature into a reliable, inspectable, and repeatable system. Less magic, more truth.
