Great article! Sounds very familiar.
We experienced quite the same issues in Natural Intelligence.
I want to emphasize The Warm-Up Script.
We chose to use Pod Lifecycle, like:
lifecycle:
postStart:
exec:
command:
- '/bin/sh'
- '-c'
- './warm_up.sh || true'
The lifecycle mechanism is more robust in terms of an init script such as the warm-up.
The true flag seen above is meant to determine that the pod will always be launched in order to avoid a potential production issue. The warm-up script can fail due to a lack of connection to download some dependencies. This is also an issue in itself, however we still want to have the deployment independently.