---
type: src
tags: [istio, graceful-termination, haproxy, tls-offload, on-marked-down, homelab]
created: 2026-06-07
---
# HAProxy Walkthrough — L7 offload + on-marked-down

> [!abstract]
> 홈랩 graceful termination 실험의 `haproxy-current.cfg`(143줄)를 읽는다. 단, **목적은 cfg 줄 해설이 아니라 한 문장의 멘탈모델을 세우는 것**: 실험의 전부는 한 backend(443→IGW)에서 벌어지는 **"누가 먼저 끊느냐"의 경합**이다 — pod의 preStop이 Envoy listener를 먼저 drain하느냐, LB의 `on-marked-down shutdown-sessions`가 먼저 RST를 쏘느냐. 이 경합을 가능하게 만드는 단 하나의 트릭이 **data 포트(`30080`)와 health 포트(`check port 30180`)의 분리**다. 결론 셋: ① HAProxy cfg는 current/improved가 동일하고 개선 변수는 IGW manifest의 preStop 스크립트에 격리돼 있다, ② `retries 3`이 5xx를 흡수해 disruption을 감추므로 `5xx + connection_err`로 측정해야 한다, ③ 나머지 네 포트(80/6443/8443/9000)는 이 실험과 무관한 배경 소품이다.

> **대상환경** Istio 1.30 + HAProxy(systemd, homelab `203.0.113.211`) · **대상독자** graceful-termination 실험을 재현·해석하려는 SRE · **범위** 443→IGW backend의 drain 경합 메커니즘 (나머지 포트는 맥락용 요약) · **선행개념** [HC FSM](gt__src-w2-hc-fsm.html), NodePort, TLS offload vs passthrough.

> **명명 매핑(2026-04-26)**: hc FSM 상태 READY/DRAINING/DRAINED_WAIT_LB/TERMINATING/FAILED → OPEN/DRAINING/CLOSING/CLOSED/FAULT(게이트 비유). 본 문서에서 'CLOSING'은 신 명칭 = 구 DRAINED_WAIT_LB(LB에만 503 신호, Envoy는 살아 in-flight 처리). FSM 전체는 [HC FSM walkthrough](gt__src-w2-hc-fsm.html) 참조.

> [!warning] 정본 cfg 파일 부재 — 이 문서가 유일한 기록
> 본 문서가 해석하는 `haproxy-current.cfg`(143줄)와 `haproxy-improved.cfg`는 레포 스냅샷 어디에도 실재하지 않는다(전체 디스크 검색으로 확인). 아래 블록별 인용은 정본의 **부분 발췌**이며, 그 발췌가 현존하는 유일한 기록이다. 따라서 인라인 fragment를 손으로 이어붙여도 완전한 143줄을 보장할 수 없고, 11번의 `scp haproxy/haproxy-current.cfg ...` 배포 명령은 그 파일이 다시 확보돼야 재현 가능하다.

---

## 1. 배경 — 왜 LB cfg가 graceful termination 실험의 무대인가

graceful termination의 본질 질문은 "pod가 죽을 때 진행 중(in-flight) 요청을 어떻게 안 끊는가"다. 이건 pod 혼자 답할 수 없다. pod가 SIGTERM을 받아 Envoy listener를 닫기 시작해도, **그 앞단 LB가 여전히 새 요청을 그 pod로 보내거나, 또는 너무 일찍 기존 연결을 RST로 끊으면** 순단이 난다. 즉 graceful termination은 **pod와 LB 사이의 타이밍 협상**이고, LB의 health check·세션 관리 설정이 그 협상의 절반을 쥔다. 그래서 이 실험에서 HAProxy cfg를 읽는다.

핵심 등장인물은 셋이다.

- **IGW(Istio ingress gateway) pod** — SIGTERM을 받으면 preStop 스크립트가 돌고, 그 안의 hc(health-controller) FSM이 상태를 옮긴다. 죽는 쪽.
- **hc FSM** — pod의 health check 응답을 의도적으로 조작하는 컨트롤러. `CLOSING`(구 DRAINED_WAIT_LB) 상태가 되면 "data는 살리되 health probe에만 503을 응답"한다. 즉 **LB에게만 거짓말로 '나 곧 죽어'라고 신호**를 보낸다.
- **HAProxy** — 그 신호를 health check로 읽고 backend에서 pod를 빼는 쪽. in-flight를 살릴지 끊을지를 `on-marked-down`이 결정한다.

이 셋의 상호작용을 이해하려면 HAProxy가 무엇을 보고 무엇을 하는지를 알아야 하고, 그게 이 문서다. 단 HAProxy는 다섯 포트를 운용하는데, **실험과 직접 관련된 건 443 backend 하나뿐**이다(나머지는 §6 배경 요약). 그러니 다섯 포트 표를 먼저 훑어 "어느 게 무대고 어느 게 소품인지" 가른 뒤, 무대(443→IGW)로 직행한다.

| 포트 | mode | TLS | 목적 | 실험 관련 |
|---|---|---|---|---|
| 80 | http | 없음 | HTTPS로 301 redirect | 소품 |
| **443** | **http** | **offload (terminate)** | **L7 헤더 주입 + IGW plaintext backend** | **무대** |
| 6443 | tcp | passthrough | kube-apiserver (client-cert 유지) | 소품 |
| 8443 | tcp | passthrough | Istio gRPC / mTLS (end-to-end TLS) | 소품 |
| 9000 | http | 없음 | stats UI | 관측 |

다섯 포트의 데이터 경로 — TLS가 443에서만 끊기고(평문으로 backend 진입) 6443/8443은 byte stream 그대로 통과한다. **443만 backend health check 대상이고, 거기만 `on-marked-down`이 붙는다.**

```mermaid
flowchart LR
    Cli[client]
    Cli -->|":80 http"| P80["HAProxy :80"]
    Cli -->|":443 https"| P443["HAProxy :443 (TLS terminate)"]
    Cli -->|":6443 mTLS"| P6443["HAProxy :6443 (tcp)"]
    Cli -->|":8443 mTLS"| P8443["HAProxy :8443 (tcp)"]

    P80 -->|"301 redirect to https"| Cli
    P443 -->|"plaintext + XFF headers"| IGW["IGW NodePort :30080 (Envoy)"]
    P6443 -->|"passthrough, client-cert kept"| API["kube-apiserver :6443"]
    P8443 -->|"passthrough, end-to-end TLS"| GW["Istio gateway :31443 (mTLS terminate)"]
```

---

## 2. 핵심 메커니즘 — 두 포트 분리가 만드는 drain window

### 2.1 멘탈모델 anchor

> **하나만 기억하라**: `check port 30180`(health)이 `30080`(data)과 **다른 포트**이기 때문에, hc FSM은 "health에는 503, data에는 200"이라는 **모순된 두 답**을 동시에 낼 수 있다. 그 모순이 곧 drain window다 — LB는 health 503을 보고 "이 서버 빼자"고 판단하는 동안, data 포트는 살아 있어 in-flight 요청이 계속 처리된다.

만약 health check가 data 포트(30080)를 찔렀다면? Envoy가 살아 있는 한 30080은 항상 200이다. 그러면 hc가 "곧 죽어"라고 LB에 신호할 방법이 없다 → LB는 pod가 진짜 죽는 순간(연결 거부)까지 새 요청을 계속 보냄 → drain window 0 → 순단. **포트를 쪼갰기 때문에 "곧 죽음"을 거짓 신호로 미리 알릴 수 있고, 그 거짓 신호와 진짜 죽음 사이의 간격이 in-flight를 빼낼 시간을 만든다.** 이것이 이 실험 설계의 핵심 통찰이고, server 라인 한 줄(§4)에 응축돼 있다.

### 2.2 경합 — 누가 먼저 끊느냐

drain window가 생겨도, 그 window 안에서 **두 행위자가 기존 연결을 끊을 수 있다**:

- **preStop drain (pod 측, 우호적)** — Envoy listener를 drain 모드로 돌려 새 연결은 막고 기존 연결은 자연 종료시킨다. 끝나면 in-flight가 0이 된 뒤 listener를 닫는다.
- **`on-marked-down shutdown-sessions` (LB 측, 강제적)** — LB가 서버를 DOWN으로 마킹하는 순간, 그 서버로 향하던 active TCP 세션을 **즉시 RST**로 끊는다(HAProxy log code `D`).

graceful의 성패는 이 둘의 **순서**다:

```
preStop drain이 먼저 끝남  →  in-flight 이미 0  →  이후 LB의 RST는 무해     (graceful)
LB의 RST가 먼저 도착       →  in-flight 강제 절단  →  connection_err 발생    (순단)
```

따라서 설계 제약이 도출된다: **preStop drain 시간 ≥ LB detection window**. LB가 서버를 DOWN으로 확정하기까지 걸리는 시간(detection window) 안에 preStop이 in-flight를 비워야, RST가 떨어질 때 끊을 게 없다. detection window는 §4의 `inter × fall = 4초`이고, current와 improved의 차이는 바로 이 preStop 스크립트가 그 4초를 제대로 버티느냐다 — **HAProxy는 양쪽에서 똑같다**(§5).

### 2.3 타임라인

아래는 §2.1의 포트 분리와 §2.2의 경합을 하나로 합친 그림이다. 30080(data)은 RST 직전까지 계속 UP, 30180(health)만 먼저 503으로 flip된다.

```mermaid
sequenceDiagram
    participant Pod as IGW Pod (preStop + hc)
    participant H30180 as :30180 health
    participant H30080 as :30080 data
    participant HAP as HAProxy backend
    participant Cli as client (in-flight req)

    Note over Pod: t0 SIGTERM, preStop drain 시작
    Pod->>H30180: hc FSM CLOSING -> /health_check.html = 503
    Note over H30080: data 포트는 계속 200 (Envoy 살아있음)
    loop fall 2 x inter 2s (최대 4s = detection window)
        HAP->>H30180: GET /health_check.html
        H30180-->>HAP: 503 (fail count++)
    end
    Note over HAP: t0+4s 부근 server DOWN 마킹
    HAP-->>Cli: on-marked-down shutdown-sessions = active TCP RST (log code D)
    Note over Cli,HAP: detection window 동안 in-flight 요청은 30080으로 정상 처리
    HAP->>H30080: 신규 요청은 다른 UP 서버로 roundrobin
```

---

## 3. 443 frontend/backend — 메커니즘을 떠받치는 설정

§2의 anchor를 떠받치는 실제 cfg는 443 frontend(헤더 주입·TLS offload)와 backend(health check + on-marked-down) 두 블록이다. 핵심 줄만 읽는다.

### 3.1 frontend: TLS offload + 헤더 주입

```haproxy
frontend istio-https-l7
    mode http
    bind *:443 ssl crt /etc/haproxy/certs/homelab-lb-bundle.pem alpn h2,http/1.1   # (A)
    option forwardfor                                             # (B)
    http-request set-header X-Forwarded-Proto https              # (C)
    http-request set-header X-Forwarded-Port 443
    http-request set-header X-Forwarded-Host %[req.hdr(Host)]
    default_backend istio-http-backend
```

여기서 443은 TLS를 **terminate**한다(평문이 backend로). 그 부작용이 (B)(C)의 존재 이유다 — TLS를 끊으면 backend(IGW Envoy)는 원본 client IP·프로토콜·포트·호스트를 못 본다.

- **(A) bind 443 ssl ... alpn h2,http/1.1**: `homelab-lb-bundle.pem`은 자체서명 CA가 발급한 서버 cert(SAN: `example.local`, `*.example.local`, `203.0.113.211`) + intermediate chain. `alpn h2,http/1.1`은 ClientHello ALPN에서 h2 지원 시 HTTP/2, 아니면 HTTP/1.1로 협상(curl은 h2 선호, 순서가 우선순위 — §7 Q1). 단일 cert이라 SNI 분기 없음(다중 도메인은 `ssl crt-list`).
- **(B) option forwardfor**: `X-Forwarded-For: <real-client-ip>` 자동 추가 — offload 후 backend가 실 클라이언트 IP를 볼 수 있게.
- **(C) http-request set-header**: offload 후 backend가 원본 프로토콜/포트/호스트를 모르므로 명시 주입. `%[req.hdr(Host)]`는 HAProxy fetch — Host 헤더값 복사. **Istio VS host 매칭과 access log authority가 이 헤더로 결정**되므로 빠지면 라우팅이 깨진다.

### 3.2 backend: health check가 곧 drain 신호

```haproxy
backend istio-http-backend
    mode http
    balance roundrobin
    option httpchk GET /health_check.html                # (D)
    http-check expect status 200                         # (E)
    server master1 203.0.113.212:30080 check port 30180 inter 2s rise 2 fall 2 on-marked-down shutdown-sessions  # (F)
    server worker1 203.0.113.213:30080 check port 30180 inter 2s rise 2 fall 2 on-marked-down shutdown-sessions
    server worker2 203.0.113.214:30080 check port 30180 inter 2s rise 2 fall 2 on-marked-down shutdown-sessions
```

- **(D) option httpchk GET /health_check.html**: TCP connect 대신 L7 응답(200/503)으로 health 판정. 이게 hc FSM이 거짓 신호를 끼워 넣는 지점이다 — TCP connect만 했다면 Envoy가 살아 있는 한 무조건 성공이라 503을 끼울 수 없다.
- **(E) http-check expect status 200**: 200=UP, 그 외(503 포함)=fail. hc FSM이 `CLOSING`이면 `/health_check.html`→503으로 이 check가 fail → 서버 DOWN 마킹 진행.
- **(F) server 라인**이 §2 전체를 한 줄에 압축한다. 분해하면:

| 파라미터 | 값 | 의미 |
|---|---|---|
| `<ip>:30080` | 30080 | **data** NodePort: IGW Envoy traffic (plaintext) |
| `check port 30180` | 30180 | **health** NodePort: hc health probe (포트 분리 = §2.1 anchor) |
| `inter 2s` | 2초 | health check 간격 |
| `rise 2` | 2회 | DOWN→UP 복귀 연속 성공 |
| `fall 2` | 2회 | UP→DOWN 마킹 연속 실패 |
| `on-marked-down shutdown-sessions` | — | DOWN 즉시 active TCP session RST (log code `D` = §2.2 경합) |

**detection window** = `inter × fall` = 4초. hc가 503 flip 시점에서 최대 4초 후 DOWN 마킹 → 이 4초가 preStop이 in-flight를 비워야 하는 데드라인.

**master1 backend가 추가된 이유**(토폴로지 함정): IGW pod은 노드당 1개(`required hostname anti-affinity`)인데 워커가 2대뿐이라, RollingUpdate `maxSurge` 시 새 pod를 올릴 빈 노드가 없어 surge가 스케줄되지 못하고 deadlock된다 → master1 NoSchedule taint 제거 후 backend에도 추가해 세 번째 스케줄 슬롯 확보. `externalTrafficPolicy: Local`이라 master1에 IGW pod이 없으면 30080→503→HAProxy DOWN 자연 처리. 배포 토폴로지 상세는 [IGW deployment](gt__src-w3-igw-deployment.html) / [manifests walkthrough](gt__src-manifests-walkthrough.html) 참조.

---

## 4. `defaults` — `retries 3`이 측정을 왜곡한다

메커니즘을 알았으니, 이 실험을 **잘못 측정하게 만드는** defaults 한 줄을 짚는다.

```
defaults
    log                     global
    option                  dontlognull      # 빈 요청 무시 (health probe 로그 폭발 방지)
    timeout connect         5s
    timeout client          1h               # streaming long-lived connection 지원
    timeout server          1h
    timeout http-request    10s              # 요청 헤더 수신 완료 타임아웃
    timeout queue           60s
    retries                 3                # (중요) 연결 실패 시 retry
```

- **`timeout client/server 1h`**: `/stream?seconds=60` 같은 long-lived connection 지원(기본 1분이면 60초 스트림 중 끊김). `http-request 10s`와 모순 아님 — §7 Q3.
- **`retries 3`**: backend 연결 실패(TCP RST, ECONNREFUSED) 시 **다른 UP 서버로** 최대 3회 재전송. 이게 §2.2의 RST(connection_err)를 사용자 눈에서 가린다.

retries가 disruption을 숨기는 경로(current 모드):

```
(1) worker1 DOWN 마킹
(2) shutdown-sessions -> in-flight TCP RST
(3) curl 입장에선 connection reset  (connection_err++)
(4) HAProxy가 동일 요청을 worker2(UP)로 재전송
(5) worker2 -> 200
(6) curl 최종 exit=0, HTTP 200       (5xx 로그엔 아무것도 안 남음)
```

→ **5xx=0만 보면 retries가 감춘 순단을 놓친다.** 그래서 disruption 지표는 `5xx + connection_err`(또는 LB `termination_code=D` 로그)여야 한다. SLO·모니터링 정본은 [graceful termination runbook](gt__src-runbook.html).

---

## 5. current vs improved — 변수는 LB 밖에 격리돼 있다

두 cfg 파일은 **기능적으로 동일**하다. diff는 주석 두 줄(line 5, backend 주석)뿐이고 나머지 모든 라인이 같다. 이건 실험 설계상 의도된 것이다 — **독립변수를 하나로 묶으려면 나머지를 고정해야 한다.** HAProxy 행동(detection window, on-marked-down, retries)을 양쪽에서 똑같이 고정하고, **IGW manifest의 preStop 스크립트만** 바꿔 그 차이가 곧 graceful termination 개선 효과가 되게 했다. `improved` 레이블은 LB가 아니라 manifest에 붙는다.

이 격리 덕분에 §6의 S3 측정에서 나온 차이는 전부 preStop 탓으로 귀속된다.

---

## 6. 예시 — S3 실측과 배포·검증

### 6.1 S3 결과 (replicas=2, 90초 continuous + rollout restart)

```
current 모드:  5xx=0 (retries 흡수), connection_err=9, p50=5.7ms
improved 모드: 5xx=0,                connection_err=0, p50=5.1ms
```

해석: 양쪽 다 `5xx=0`이라 5xx만 보면 "둘 다 무중단"으로 오판한다. 진실은 `connection_err`에 있다 — current는 9건의 in-flight 절단(§4의 RST→retry 경로), improved는 0건. **improved의 preStop이 §2.2 경합에서 LB RST보다 먼저 drain을 끝냈다**는 증거다. HAProxy cfg는 동일하므로(§5) 이 9→0 차이의 출처는 preStop뿐이다.

### 6.2 배포 + 검증 명령

```bash
# 기존 cfg 백업
ssh homelab "ssh jinsoo@203.0.113.211 \
  'sudo cp -a /etc/haproxy/haproxy.cfg /etc/haproxy/haproxy.cfg.bak.$(date +%Y%m%d-%H%M%S)'"

# current 모드 배포 (-c 로 syntax 검증 후 restart)
scp haproxy/haproxy-current.cfg homelab:/tmp/haproxy.cfg
ssh homelab "scp /tmp/haproxy.cfg jinsoo@203.0.113.211:/tmp/ && \
  ssh jinsoo@203.0.113.211 'sudo install -m 0644 /tmp/haproxy.cfg /etc/haproxy/haproxy.cfg && \
    sudo haproxy -c -f /etc/haproxy/haproxy.cfg && sudo systemctl restart haproxy'"

# backend 상태 확인 (admin socket)
ssh homelab "ssh jinsoo@203.0.113.211 \
  'echo show stat | sudo socat /run/haproxy/admin.sock stdio'" \
  | awk -F, '/istio-http-backend/{print $2"="$18" check="$37}'
# HAProxy stat CSV(1-index): 2=svname, 18=status, 37=check_status(last check result).
# 필드 인덱스가 버전마다 밀릴 수 있으니 의심되면 헤더로 확인:
#   echo "show stat" | socat ... | head -1 | tr ',' '\n' | cat -n
# 또는 인덱스 비의존: echo "show stat typed" | socat ... | grep -E '\.(svname|status|check_status)\.'
```

`admin socket`(`/run/haproxy/admin.sock`, global block의 `stats socket ... level admin`)은 `show stat`·`set server`·`disable server`로 무중단 백엔드 조작의 진입점이다.

**`restart` vs `reload`**: `reload`(SIGUSR2)는 기존 process가 현재 연결을 유지하며 새 config 적용(단 새 bind 포트·global 변경은 restart 필요). `restart`는 **모든 frontend(6443 포함)가 잠시 끊긴다** — kubectl in-flight 시 실패하므로 저활동 시간대에. 이 실험 cfg는 backend server 추가뿐이라 이론상 reload 가능하지만 README는 안전하게 restart 사용(§8 마지막 함정).

Istio + hc sidecar 설치 후 예상 출력:
```
master1=DOWN check=L7STS   # IGW pod 없으면 DOWN (정상)
worker1=UP check=L7OK
worker2=UP check=L7OK
```

---

## 7. 나머지 네 포트 (배경 소품 + 관측)

실험과 무관하지만 cfg를 완결하려면 필요한 네 모드. 공통 원리는 **"TLS를 끊을 권리가 LB에 있나"**다 — client-cert/end-to-end mTLS가 필요하면 passthrough(tcp), 아니면 offload/http.

```haproxy
frontend kube-apiserver            # 6443: mTLS client-cert 유지 필요 -> passthrough
    mode tcp
    bind *:6443
    default_backend kube-apiserver
backend kube-apiserver
    mode tcp
    option tcp-check               # SYN-ACK 자체를 health check
    server master1 203.0.113.212:6443 check inter 3s rise 2 fall 3   # fall 3 -> 9s, false positive 억제

frontend http-redirect             # 80: backend 없이 직접 301
    mode http
    bind *:80
    http-request redirect scheme https code 301

frontend istio-grpc-passthrough    # 8443: end-to-end TLS/mTLS -> passthrough
    mode tcp
    bind *:8443
    default_backend istio-grpc-backend
backend istio-grpc-backend
    mode tcp
    option tcp-check
    server worker1 203.0.113.213:31443 check inter 3s rise 2 fall 3
    server worker2 203.0.113.214:31443 check inter 3s rise 2 fall 3

frontend stats                     # 9000: backend 상태 관측 UI
    mode http
    bind *:9000
    stats enable
    stats uri /
    stats refresh 10s
    stats hide-version             # 핑거프린팅 방지
```

- **6443 (kube-apiserver)**: kubectl/kubeadm/kubespray 모두 mTLS client 인증서로 접속 → LB가 L7을 열면 TLS 종료로 client cert가 사라진다 → passthrough 필수. **fall 3**(443의 fall 2 대비): apiserver DOWN은 kubectl 전체를 끊으므로 detection을 9초로 느리게 해 false positive 감소.
- **80 (redirect)**: backend 없음, HAProxy가 직접 301 생성. permanent라 브라우저가 캐시 → 이후 :80 요청이 클라이언트에서 안 나온다.
- **8443 (gRPC/mTLS passthrough)**: 443이 TLS terminate해 backend에 평문을 주는 것과 반대. mTLS/gRPC(HTTP/2 over TLS)는 SNI/ALPN 재협상 + client cert chain 관리가 LB에 집중되므로, byte stream을 투명 전달해 Istio gateway가 직접 TLS terminate + mTLS peer 인증. `shutdown-sessions` 없음, fall 3.
- **9000 (stats)**: LAN `http://203.0.113.211:9000/`에서 backend 상태 실시간 확인(§6.2 admin socket의 시각화판). 운영에선 `stats auth`/ACL로 접근 제한 권장.

**global의 TLS 정책**(443에 자동 적용): `ssl-min-ver TLSv1.2`(TLS 1.0/1.1 비활성) + `no-tls-tickets`(세션 티켓 비활성 → forward secrecy 보호) + `ssl-default-bind-ciphers ECDHE+AESGCM:ECDHE+CHACHA20`(ECDHE 기반 FS, RC4·3DES·NULL 배제).

---

## 8. 회상 quiz

<details>
<summary>Q1. alpn 순서가 중요한가?</summary>

예. HAProxy는 리스트 순서대로 우선순위를 ALPN에 실어 클라이언트에 제안. `h2,http/1.1`이면 h2 우선. `http/1.1,h2`로 바꾸면 http/1.1 우선이 되어 gRPC-web 같은 h2 의존 기능이 degrade될 수 있다.

</details>

<details>
<summary>Q2. `option dontlognull`이 없으면?</summary>

health check probe(inter 2s × 3 servers ≈ 초당 1.5회)가 모두 access log에 찍혀 실제 요청 로그가 묻힌다. `dontlognull`은 페이로드 없는 TCP 연결(keep-alive probe 포함)을 로그에서 제외.

</details>

<details>
<summary>Q3. `timeout client 1h`와 `timeout http-request 10s`는 모순 아닌가?</summary>

아니다. `http-request 10s`는 **요청 헤더 완전 수신** 제한(slow HTTP attack 방어) — 헤더 수신 완료 시 종료. `client 1h`는 그 이후 **클라이언트 유휴(데이터 미전송)** 제한(스트리밍 중 읽기만 하는 시간). 연결 lifecycle의 다른 단계를 각각 제어.

</details>

<details>
<summary>Q4. check port를 30180이 아니라 data 포트 30080으로 두면?</summary>

drain window가 0이 된다. Envoy가 살아 있는 한 30080은 항상 200이라 hc FSM이 "곧 죽음" 거짓 신호를 끼워 넣을 곳이 없다 → LB는 pod가 진짜 죽어 연결이 거부될 때까지 새 요청을 계속 보낸다 → in-flight를 빼낼 시간이 사라진다. 포트 분리가 곧 실험의 전제다(§2.1).

</details>

---

## 핵심 정리
- **anchor**: 실험의 전부는 443→IGW backend 한 곳의 "누가 먼저 끊느냐" 경합 — preStop drain(우호적) vs `on-marked-down shutdown-sessions`(강제 RST, log code `D`). preStop이 먼저 끝나면 graceful, LB RST가 먼저면 순단.
- 그 경합을 가능하게 하는 트릭은 **data 포트(30080)와 health 포트(`check port 30180`) 분리** — hc FSM이 health에만 503을 내 LB만 "곧 죽음"을 알게 하고, data는 살려 in-flight를 처리한다.
- **설계 제약**: preStop drain 시간 ≥ detection window(`inter × fall` = 4초). 4초 안에 in-flight를 못 비우면 LB RST가 살아 있는 연결을 끊는다.
- HAProxy cfg는 current/improved가 **동일**, 개선 변수는 IGW manifest의 preStop 스크립트에 격리 — S3의 connection_err 9→0 차이는 전부 preStop 탓.
- `retries 3`이 5xx를 흡수하므로 disruption은 `5xx + connection_err`(또는 `termination_code=D`)로 측정해야 순단을 놓치지 않는다.
- 나머지 4포트(80 redirect / 6443·8443 passthrough / 9000 stats)는 "TLS 끊을 권리가 LB에 있나"로 갈린 배경 소품 — client-cert/end-to-end mTLS면 passthrough.

---

## What you might be missing
- **`on-marked-down shutdown-sessions`의 멘탈모델**: "서버를 DOWN으로 마킹하는 순간 그 서버로 향하던 active TCP 세션을 즉시 RST로 끊는다"는 의미. 이름이 직관적이지 않은데 Citrix NetScaler의 `downStateFlush ENABLED`와 동일 거동(상태 전이 시 세션 플러시)이다. 끄면(기본) DOWN 후에도 기존 세션은 자연 종료까지 유지돼 graceful하지만, hc/preStop이 이미 Envoy를 닫는 시나리오에선 오히려 hang을 만들 수 있다 — 그래서 이 실험은 일부러 켜고 preStop으로 경합을 이긴다.
- **detection window는 순단의 하한이 아니라 상한**: `inter×fall=4초`는 hc가 503으로 flip된 뒤 LB가 DOWN을 확정하기까지의 최악 시간이다. 이 4초 동안 30080(data)이 살아 있어야 in-flight가 보존되므로, preStop drain이 이 window보다 짧으면 LB가 인지하기 전에 Envoy가 죽어 순단이 난다 — drain 시간 ≥ detection window가 설계 제약이다.
- **`retries 3`은 양날의 검**: 5xx를 흡수해 체감 가용성을 높이지만 메트릭에서 disruption을 숨겨 "문제 없음"으로 오판하게 만든다. 멱등이 아닌 요청(POST 등)을 retry하면 중복 부작용 위험도 있다(HAProxy는 `option redispatch`/`retry-on`으로 제어). 5xx=0을 SLO 그린으로 읽기 전에 `connection_err`/`termination_code=D`를 반드시 교차 확인.
- **`restart`와 `reload`의 함정**: backend server 추가만으로도 README가 restart를 쓰는 이유는 reload(SIGUSR2)가 그 변경을 못 반영해서가 아니라, restart가 6443 passthrough(kubectl)까지 순간 끊는 비용을 감수하고서라도 "확실한 적용"을 택했기 때문이다. 운영 LB라면 reload 가능 여부를 변경 종류별로 판단해 control-plane 단절을 피해야 한다.