Probe for spring boot

对于 **Spring Boot 跑在 Kubernetes**，现在比较推荐的做法是：

* **Startup Probe**：解决应用启动慢的问题
* **Readiness Probe**：判断是否接收流量
* **Liveness Probe**：判断是否需要重启

不要三个 Probe 都配成一样。

---

## 1. Spring Boot 开启 Actuator

通常在 `application.yml`：

```yaml
management:
  endpoints:
    web:
      exposure:
        include: health
  endpoint:
    health:
      probes:
        enabled: true
```

Spring Boot 2.3+ 会自动提供：

```text
/actuator/health
/actuator/health/liveness
/actuator/health/readiness
```

---

## 2. 推荐 Probe 配置

```yaml
livenessProbe:
  httpGet:
    path: /actuator/health/liveness
    port: 8080
  initialDelaySeconds: 60
  periodSeconds: 15
  timeoutSeconds: 5
  failureThreshold: 3

readinessProbe:
  httpGet:
    path: /actuator/health/readiness
    port: 8080
  initialDelaySeconds: 20
  periodSeconds: 10
  timeoutSeconds: 3
  failureThreshold: 3

startupProbe:
  httpGet:
    path: /actuator/health
    port: 8080
  periodSeconds: 10
  failureThreshold: 30
```

---

## 3. 三个 Probe 的职责

### Startup Probe

解决：

```text
Spring Boot启动慢
JPA初始化
Flyway迁移
缓存预热
```

例如：

```text
启动需要3分钟
```

那么：

```yaml
failureThreshold: 30
periodSeconds: 10
```

表示：

```text
允许启动5分钟
```

启动成功前：

```text
不执行Liveness
不执行Readiness
```

避免：

```text
启动还没完成
K8s就把Pod杀了
```

---

### Readiness Probe

判断：

```text
能否接收业务流量
```

例如：

* 数据库连接正常
* Redis正常
* Kafka正常

通过：

```text
Ready=True
```

Service 才会转发流量。

失败：

```text
Ready=False
```

Pod不会被重启。

只是：

```text
从Service Endpoint移除
```

---

### Liveness Probe

判断：

```text
JVM是否假死
```

例如：

* 死锁
* 线程池卡死
* GC长时间卡顿

失败：

```text
连续3次失败
```

K8s：

```text
重启Pod
```

---

## 4. Spring Boot 最佳实践

不要：

```yaml
livenessProbe:
  path: /actuator/health

readinessProbe:
  path: /actuator/health
```

因为：

```text
数据库挂了
```

Spring Boot 默认：

```json
{
  "status":"DOWN"
}
```

结果：

```text
Liveness失败
Readiness失败
```

Pod不断重启。

形成：

```text
CrashLoopBackOff
```

---

推荐：

```text
Liveness
    只检查JVM是否活着

Readiness
    检查数据库
    Redis
    MQ
```

即：

```text
/actuator/health/liveness
/actuator/health/readiness
```

---

## 5. 大厂常见配置

例如：

```yaml
startupProbe:
  failureThreshold: 60
  periodSeconds: 5

readinessProbe:
  periodSeconds: 10

livenessProbe:
  periodSeconds: 30
```

原因：

```text
Readiness检查频繁
Liveness检查保守
```

避免：

```text
短暂GC
网络抖动
```

导致误重启。

---

## 6. AWS EKS 场景

如果前面有：

* ALB
* NLB
* Ingress Nginx

建议：

```text
ALB Health Check
          ↓
Readiness Probe
          ↓
Spring Boot Readiness
```

形成健康检查链路。

例如：

```text
ALB
 ↓
Ingress
 ↓
Service
 ↓
Pod Readiness
 ↓
DB
```

这样数据库异常时：

```text
Readiness=False
```

Pod自动摘流量，

而不是：

```text
Pod被疯狂重启
```

---

DevOps/SRE 面试里如果被问：

> Spring Boot 的 Liveness 和 Readiness 怎么设计？

标准回答是：

> Liveness 只检查应用进程和 JVM 是否存活，不依赖外部服务；Readiness 检查应用是否具备对外提供服务的能力，包括数据库、缓存和消息队列连接状态。对于启动较慢的 Spring Boot 应用，还会增加 Startup Probe 防止启动期间被 Kubernetes 误判并重启。
Probe for spring boot

导航菜单

搜索