Silent installer abort right after first whiptail dialog (set -e + AND-list) #1

Closed
opened 2026-05-20 13:01:53 +02:00 by vr6syncro · 0 comments
Owner

Symptom

Reported on real PVE host 2026-05-19 by operator: after running

bash -c "$(curl -fsSL …/ct/teddytafforge.sh)"

the script presents the early dialogs correctly, but the moment default_settings() is meant to run, it silently returns to the shell. No error message visible. Behaviour identical for sidecar and all-in-one. advanced_settings() is never reached either, so it's something inside default_settings() — but no output makes it to the terminal.

Root cause

misc/build.func:220 ended with

[ -z "$STORAGE" ] && STORAGE="local-lvm"

…as the last statement of default_settings(). The script runs with set -euo pipefail (misc/build.func:12). On a real PVE host pvesm status -content rootdir always returns at least one storage, so STORAGE is already non-empty when this line runs:

  1. [ -z "$STORAGE" ] → exits 1
  2. The &&-compound short-circuits → exit 1
  3. As the function's last command, that exit code is the function's return value
  4. The function invocation in start() is a "simple command" — no &&/||/if exception applies — so set -e fires
  5. ERR-trap runs, prints msg_error, calls exit 1
  6. But by then whiptail has switched the terminal into the alt-screen buffer; on exit the terminal repaint eats the red error line before the user sees it

So: silent abort, no diagnostic, exact same UX whichever scope was picked. Sources confirming the set -e mechanics: GNU Bash manual — Set Builtin, BashFAQ/105.

A latent sibling bug in list_bridges()

Same class, different mechanism. misc/build.func:117-125 had:

for f in /etc/network/interfaces.d/*; do
    [ -f "$f" ] && awk '/^iface vmbr[0-9]+/ {print $2}' "$f"
done

If interfaces.d/ is empty (true on a fresh PVE install), the glob stays literal → [ -f ] returns 1 → for-loop exits 1 → { … } | sort -u with pipefail propagates exit 1 → $(list_bridges | paste …) in build_container's pre-flight kills the script before any whiptail dialog runs. Wasn't the cause of the reported symptom (the operator saw whiptails first), but would have surfaced as a near-identical silent abort on another machine.

Fixes

Commit 34d5028fix(build.func): silent abort after first dialog (set -e + AND-list):

  • default_settings() (misc/build.func:219-227): replaced [ -z ] && var=… with two ${VAR:-…} parameter-expansions which can never propagate a test-style exit code.
  • list_bridges() (misc/build.func:117-130): shopt -s nullglob around the loop + explicit return 0.
  • ERR-handler (_on_err, misc/build.func:65-100): resets terminal (stty sane, tput rmcup, tput sgr0) before printing, so the alt-screen repaint can no longer eat the error. Dumps failing $BASH_COMMAND, FUNCNAME/BASH_LINENO stack trace, and the xtrace-log path. catch_errors adds set -E so the trap propagates into functions.
  • DEBUG=1 env-flag: full bash -x xtrace to /tmp/teddytafforge-trace-<pid>.log on a dedicated FD (whiptail's stdio juggling can't corrupt it) + human-readable dbg "msg" checkpoints at every meaningful state transition. Forwarded into the LXC via pct exec env. Marked TEMP in the source — to be removed after real-hardware validation.

CHANGELOG-Unreleased and README updated (new "Debug mode" section + troubleshooting row).

Verification

Commit 5e4900dtest(dry): full {scope × settings} matrix dry-test under scripts/dry-test/:

The existing CI smoke test only walks one path (sidecar + defaults) and was green throughout the bug's lifetime — because that path also worked before the fix (it was the real-PVE path that broke, where pvesm returns a storage name).

New scripts/dry-test/matrix.sh walks all four combos with title-aware whiptail mocks, real FD convention, and forwards the same pvesm output a real host would return:

[1] PASS  scope=sidecar    settings=defaults
[2] PASS  scope=sidecar    settings=advanced
[3] PASS  scope=all-in-one settings=defaults
[4] PASS  scope=all-in-one settings=advanced

Each run exercises start() → default_settings → (optionally) advanced_settings → build_container (pre-flight + ensure_template + pct create) → description() end-to-end under DRY_RUN=1. Confirmed:

  • default_settings() now completes cleanly with pre-set STORAGE (the formerly-crashing path) — dbg shows STORAGE=local-lvm BRG=vmbr0 NET=dhcp after the function returns
  • advanced_settings() walks all 13 dialogs (CTID/Hostname/Storage/Disk/CPU/RAM/IPv4/MAC/VLAN/MTU/SSH-Key/IPv6/Verbose) and returns clean
  • build_container pre-flight (bridge_exists/storage_exists inside command-substitution with pipefail) doesn't crash on empty interfaces.d/
  • pct create arg-vector composed correctly (DRY-Run log nachvollziehbar)
  • description() returns clean even without an IP

What's NOT covered by the dry test

Layer Status
Real pct create / template download / LXC start not exercised (DRY_RUN)
In-LXC installer (teddytafforge-install.sh / all-in-one-install.sh / teddycloud-install.sh) not exercised — pct exec ... bash -c "curl ... | bash" chain
TC-data bind-mount (mp0) with real path validation mocked with "No" on yesno to avoid endless-loop on absent dir
Real whiptail Cancel/ESC behaviour, re-prompt loops on invalid input not testable without TTY

Operator should re-run with DEBUG=1 on the actual PVE host. If anything new crashes, the improved ERR-handler will surface the failing command, the call stack, and the xtrace-log path — enough to diagnose without round-trips.

References

  • Bug-fix commit: 34d5028
  • Verification commit: 5e4900d
  • Memories (global cross-project lessons):
    • bash-set-e-andlist-lastline
    • bash-nullglob-pipefail-trap
    • whiptail-alt-screen-error-eaten

Closing as fixed; will re-open or file follow-up if real-PVE test reveals additional issues.

## Symptom Reported on real PVE host 2026-05-19 by operator: after running ```bash bash -c "$(curl -fsSL …/ct/teddytafforge.sh)" ``` the script presents the early dialogs correctly, but **the moment `default_settings()` is meant to run, it silently returns to the shell**. No error message visible. Behaviour identical for sidecar and all-in-one. `advanced_settings()` is never reached either, so it's something *inside* `default_settings()` — but no output makes it to the terminal. ## Root cause `misc/build.func:220` ended with ```bash [ -z "$STORAGE" ] && STORAGE="local-lvm" ``` …as the **last** statement of `default_settings()`. The script runs with `set -euo pipefail` (`misc/build.func:12`). On a real PVE host `pvesm status -content rootdir` always returns at least one storage, so `STORAGE` is already non-empty when this line runs: 1. `[ -z "$STORAGE" ]` → exits 1 2. The `&&`-compound short-circuits → exit 1 3. As the function's last command, that exit code **is** the function's return value 4. The function invocation in `start()` is a "simple command" — no `&&`/`||`/`if` exception applies — so `set -e` fires 5. ERR-trap runs, prints `msg_error`, calls `exit 1` 6. But by then whiptail has switched the terminal into the alt-screen buffer; on exit the terminal repaint **eats the red error line** before the user sees it So: silent abort, no diagnostic, exact same UX whichever scope was picked. Sources confirming the `set -e` mechanics: [GNU Bash manual — Set Builtin](https://www.gnu.org/software/bash/manual/html_node/The-Set-Builtin.html), [BashFAQ/105](https://mywiki.wooledge.org/BashFAQ/105). ## A latent sibling bug in `list_bridges()` Same class, different mechanism. `misc/build.func:117-125` had: ```bash for f in /etc/network/interfaces.d/*; do [ -f "$f" ] && awk '/^iface vmbr[0-9]+/ {print $2}' "$f" done ``` If `interfaces.d/` is empty (true on a fresh PVE install), the glob stays literal → `[ -f ]` returns 1 → for-loop exits 1 → `{ … } | sort -u` with `pipefail` propagates exit 1 → `$(list_bridges | paste …)` in `build_container`'s pre-flight kills the script before any whiptail dialog runs. Wasn't the cause of the reported symptom (the operator saw whiptails first), but would have surfaced as a near-identical silent abort on another machine. ## Fixes Commit `34d5028` — `fix(build.func): silent abort after first dialog (set -e + AND-list)`: - **`default_settings()` (misc/build.func:219-227):** replaced `[ -z ] && var=…` with two `${VAR:-…}` parameter-expansions which can never propagate a test-style exit code. - **`list_bridges()` (misc/build.func:117-130):** `shopt -s nullglob` around the loop + explicit `return 0`. - **ERR-handler (`_on_err`, misc/build.func:65-100):** resets terminal (`stty sane`, `tput rmcup`, `tput sgr0`) **before** printing, so the alt-screen repaint can no longer eat the error. Dumps failing `$BASH_COMMAND`, `FUNCNAME`/`BASH_LINENO` stack trace, and the xtrace-log path. `catch_errors` adds `set -E` so the trap propagates into functions. - **`DEBUG=1` env-flag:** full `bash -x` xtrace to `/tmp/teddytafforge-trace-<pid>.log` on a dedicated FD (whiptail's stdio juggling can't corrupt it) + human-readable `dbg "msg"` checkpoints at every meaningful state transition. Forwarded into the LXC via `pct exec` env. Marked TEMP in the source — to be removed after real-hardware validation. CHANGELOG-Unreleased and README updated (new "Debug mode" section + troubleshooting row). ## Verification Commit `5e4900d` — `test(dry): full {scope × settings} matrix dry-test under scripts/dry-test/`: The existing CI smoke test only walks one path (sidecar + defaults) and was green throughout the bug's lifetime — because that path *also* worked before the fix (it was the *real-PVE* path that broke, where `pvesm` returns a storage name). New `scripts/dry-test/matrix.sh` walks all four combos with title-aware whiptail mocks, real FD convention, and forwards the same `pvesm` output a real host would return: ``` [1] PASS scope=sidecar settings=defaults [2] PASS scope=sidecar settings=advanced [3] PASS scope=all-in-one settings=defaults [4] PASS scope=all-in-one settings=advanced ``` Each run exercises `start() → default_settings → (optionally) advanced_settings → build_container (pre-flight + ensure_template + pct create) → description()` end-to-end under `DRY_RUN=1`. Confirmed: - `default_settings()` now completes cleanly with **pre-set** `STORAGE` (the formerly-crashing path) — `dbg` shows `STORAGE=local-lvm BRG=vmbr0 NET=dhcp` after the function returns - `advanced_settings()` walks all 13 dialogs (CTID/Hostname/Storage/Disk/CPU/RAM/IPv4/MAC/VLAN/MTU/SSH-Key/IPv6/Verbose) and returns clean - `build_container` pre-flight (`bridge_exists`/`storage_exists` inside command-substitution with pipefail) doesn't crash on empty `interfaces.d/` - `pct create` arg-vector composed correctly (DRY-Run log nachvollziehbar) - `description()` returns clean even without an IP ## What's NOT covered by the dry test | Layer | Status | |---|---| | Real `pct create` / template download / LXC start | not exercised (DRY_RUN) | | In-LXC installer (`teddytafforge-install.sh` / `all-in-one-install.sh` / `teddycloud-install.sh`) | not exercised — `pct exec ... bash -c "curl ... \| bash"` chain | | TC-data bind-mount (mp0) with real path validation | mocked with "No" on yesno to avoid endless-loop on absent dir | | Real whiptail Cancel/ESC behaviour, re-prompt loops on invalid input | not testable without TTY | Operator should re-run with `DEBUG=1` on the actual PVE host. If anything new crashes, the improved ERR-handler will surface the failing command, the call stack, and the xtrace-log path — enough to diagnose without round-trips. ## References - Bug-fix commit: `34d5028` - Verification commit: `5e4900d` - Memories (global cross-project lessons): - `bash-set-e-andlist-lastline` - `bash-nullglob-pipefail-trap` - `whiptail-alt-screen-error-eaten` Closing as fixed; will re-open or file follow-up if real-PVE test reveals additional issues.
Sign in to join this conversation.
No labels
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
vr6syncro/teddytafforge-proxmox#1
No description provided.