Docker image-bloat rebuild playbook — find the dominant dep before multi-stage

Image-bloat rebuild playbook — find the single dep that dominates before multi-stage

For any Docker image >3 GB, the correct FIRST move is docker exec <container> du -sh /path/to/site-packages/* — bloat is almost always concentrated in one accidental dep, not spread across the layer set. Multi-stage is the vehicle; the lever is usually one dep pin or one extra removal.

Three patterns from 20-Apr-2026 Wave 1 rebuild (16.21 GB reclaimed across three images)

Root cause	Platform	Before	After	Fix lever
`camel-ai` pulled torch-CUDA default	MiroFish	5.80 GB	1.41 GB (76% ↓)	Pre-install `torch` from `https://download.pytorch.org/whl/cpu` BEFORE `uv pip install -r pyproject.toml` — camel-ai sees torch present and skips CUDA resolve
`--extra providers` pulled sentence-transformers → torch → CUDA	Graphiti MCP	5.65 GB	0.41 GB (93% ↓)	Drop `--extra providers`, install only the actually-used provider (`google-genai`). Sentence-transformers unused because embeddings go through Gemini API, not local model
Build toolchain retained in runtime + dev-only node_modules survived Vite build	ORACLE Hermes	4.89 GB	2.10 GB (57% ↓)	Multi-stage: builder keeps `build-essential`/`python3-dev`/`libffi-dev`/`gcc`; runtime drops them. Strip `web/node_modules` post-build

The general pattern

Profile first: docker exec <name> du -sh /opt/<app>/.venv/lib/python*/site-packages/* | sort -rh | head -15
Identify the outlier: typically ONE package is 50%+ of site-packages — torch+CUDA (GPU stack on CPU VPS), sentence-transformers (unused embedder), playwright-with-browsers (if only API used)
Check the dep chain: importlib.metadata.requires("<bloat-package>") — confirms what pulls it
Pin or drop: either install a CPU/slim variant via --index-url before the main resolve, OR remove the offending extra, OR cherry-pick the one sub-package you actually use
Multi-stage after: only now does splitting builder/runtime matter; it’s the 10-30% polish on top of the 70-90% win from the dep fix
Backup anchor: cp Dockerfile Dockerfile.pre-slim.<YYYYMMDD> before edit — 7-day rollback window per Clause B of the dead-weight rule

When this does NOT apply

Image is already small (<1.5 GB) — gains are diminishing, multi-stage still helps if toolchain is in runtime
All deps are legitimately used at runtime — then the bloat isn’t “dead weight”, it’s the actual cost of the capability (playwright chromium + ffmpeg + node in oracle-hermes is the right shape at 2 GB)
Legacy pip-editable install — .egg-link files reference source paths, so site-packages AND source must move together across stages (oracle-hermes pattern)

Smoke-test protocol before live swap

Tag slim build with -test suffix: docker build -t <image>:slim-test .
Run on alt port with real env vars: docker run -d --name <name>-slim-smoke -p 127.0.0.1:<alt>:<port> <image>:slim-test
Health probe: curl http://127.0.0.1:<alt>/health — if 200, image starts
If the runtime log reaches initialize_server() or equivalent startup point without ImportError, Python deps are complete. Env-var failures at runtime (wrong DB host, missing auth) are NOT image issues and can be ignored for the image-level smoke
ONLY AFTER smoke passes: retag to :latest, then docker compose up -d --force-recreate
Run docker image prune -f after compose recreate — reclaims the dangling orphan of the old large image

Second Brain

Explorer

docker-image-bloat-rebuild-playbook-find-the-dominant-dep-be

Docker image-bloat rebuild playbook — find the dominant dep before multi-stage

Image-bloat rebuild playbook — find the single dep that dominates before multi-stage

Three patterns from 20-Apr-2026 Wave 1 rebuild (16.21 GB reclaimed across three images)

The general pattern

When this does NOT apply

Smoke-test protocol before live swap

Graph View

Table of Contents

Second Brain

Explorer

docker-image-bloat-rebuild-playbook-find-the-dominant-dep-be

Docker image-bloat rebuild playbook — find the dominant dep before multi-stage

Image-bloat rebuild playbook — find the single dep that dominates before multi-stage

Three patterns from 20-Apr-2026 Wave 1 rebuild (16.21 GB reclaimed across three images)

The general pattern

When this does NOT apply

Smoke-test protocol before live swap

Related

Graph View

Table of Contents