Probe for spring boot:修订间差异
跳转到导航
跳转到搜索
创建页面,内容为“<pre> 对于 **Spring Boot 跑在 Kubernetes**,现在比较推荐的做法是: * **Startup Probe**:解决应用启动慢的问题 * **Readiness Probe**:判断是否接收流量 * **Liveness Probe**:判断是否需要重启 不要三个 Probe 都配成一样。 --- ## 1. Spring Boot 开启 Actuator 通常在 `application.yml`: ```yaml management: endpoints: web: exposure: include: health endpoint: health: probes:…” |
无编辑摘要 |
||
| 第1行: | 第1行: | ||
=issue= | |||
<pre> | |||
如果日志是: | |||
```text | |||
Readiness probe failed: | |||
Get "http://10.6.6.1:8080/actuator/health": | |||
context deadline exceeded (Client.Timeout exceeded while awaiting headers) | |||
``` | |||
**按 Kubernetes 的设计,仅仅 Readiness Probe 失败不会导致 Pod 重启。** | |||
所以你需要先确认: | |||
```bash | |||
kubectl describe pod <pod-name> | |||
``` | |||
看看 Events 里面是不是还有: | |||
```text | |||
Liveness probe failed | |||
``` | |||
或者: | |||
```text | |||
Killing container | |||
``` | |||
或者: | |||
```text | |||
OOMKilled | |||
``` | |||
--- | |||
# 第一种情况:实际上是 Liveness 失败导致重启 | |||
经常看到: | |||
```yaml | |||
livenessProbe: | |||
httpGet: | |||
path: /actuator/health | |||
readinessProbe: | |||
httpGet: | |||
path: /actuator/health | |||
``` | |||
Spring Boot 在: | |||
* 数据库慢 | |||
* Redis慢 | |||
* GC停顿 | |||
时: | |||
```text | |||
/actuator/health | |||
``` | |||
响应超过 timeout。 | |||
然后: | |||
```text | |||
Readiness Fail | |||
Liveness Fail | |||
``` | |||
最终: | |||
```text | |||
Pod Restart | |||
``` | |||
--- | |||
# 第二种情况:timeout 设置太小 | |||
例如: | |||
```yaml | |||
readinessProbe: | |||
timeoutSeconds: 1 | |||
``` | |||
Spring Boot Actuator 访问数据库时: | |||
```text | |||
耗时 2~3 秒 | |||
``` | |||
就会报: | |||
```text | |||
context deadline exceeded | |||
``` | |||
建议: | |||
```yaml | |||
readinessProbe: | |||
timeoutSeconds: 5 | |||
``` | |||
甚至: | |||
```yaml | |||
readinessProbe: | |||
timeoutSeconds: 10 | |||
``` | |||
--- | |||
# 第三种情况:数据库检查拖慢了 Health Check | |||
Spring Boot 默认: | |||
```text | |||
/actuator/health | |||
``` | |||
会检查: | |||
* DB | |||
* Redis | |||
* RabbitMQ | |||
* Kafka | |||
* Elasticsearch | |||
例如数据库偶尔卡顿: | |||
```text | |||
Health Endpoint | |||
↓ | |||
等待 DB | |||
↓ | |||
超时 | |||
↓ | |||
Probe Failed | |||
``` | |||
--- | |||
查看: | |||
```bash | |||
curl localhost:8080/actuator/health | |||
``` | |||
或者: | |||
```bash | |||
time curl localhost:8080/actuator/health | |||
``` | |||
看看是不是很慢。 | |||
--- | |||
# 推荐使用 Spring Boot Probe Endpoint | |||
开启: | |||
```yaml | |||
management: | |||
endpoint: | |||
health: | |||
probes: | |||
enabled: true | |||
``` | |||
然后: | |||
```yaml | |||
livenessProbe: | |||
httpGet: | |||
path: /actuator/health/liveness | |||
port: 8080 | |||
readinessProbe: | |||
httpGet: | |||
path: /actuator/health/readiness | |||
port: 8080 | |||
``` | |||
这样: | |||
### Liveness | |||
只检查 JVM 是否活着。 | |||
### Readiness | |||
检查业务依赖。 | |||
避免: | |||
```text | |||
DB挂了 | |||
↓ | |||
Liveness失败 | |||
↓ | |||
Pod无限重启 | |||
``` | |||
--- | |||
# 第四种情况:JVM 卡顿 | |||
如果日志里经常出现: | |||
```text | |||
context deadline exceeded | |||
``` | |||
但应用日志没有报错, | |||
检查: | |||
```bash | |||
kubectl top pod | |||
``` | |||
看看: | |||
```text | |||
CPU 100% | |||
Memory 接近 Limit | |||
``` | |||
以及: | |||
```bash | |||
kubectl logs pod | |||
``` | |||
是否有: | |||
```text | |||
Full GC | |||
OutOfMemoryError | |||
``` | |||
--- | |||
# 第五种情况:启动太慢 | |||
如果是刚启动就出现: | |||
```text | |||
Readiness probe failed | |||
``` | |||
可能: | |||
* Flyway migration | |||
* Hibernate 初始化 | |||
* 缓存预热 | |||
需要增加: | |||
```yaml | |||
startupProbe: | |||
httpGet: | |||
path: /actuator/health | |||
port: 8080 | |||
failureThreshold: 60 | |||
periodSeconds: 5 | |||
``` | |||
这样允许: | |||
```text | |||
5分钟启动时间 | |||
``` | |||
--- | |||
# 作为 DevOps,我会先执行这几个命令 | |||
```bash | |||
kubectl describe pod <pod> | |||
``` | |||
看: | |||
```text | |||
Liveness probe failed? | |||
OOMKilled? | |||
Exit Code? | |||
``` | |||
```bash | |||
kubectl get pod <pod> -o yaml | grep -A20 probe | |||
``` | |||
看当前 Probe 配置。 | |||
```bash | |||
kubectl logs <pod> --previous | |||
``` | |||
看上一次容器退出原因。 | |||
```bash | |||
kubectl top pod <pod> | |||
``` | |||
看资源是否耗尽。 | |||
--- | |||
如果你把下面这些贴出来: | |||
```bash | |||
kubectl describe pod <pod> | |||
kubectl get deploy xxx -o yaml | |||
kubectl logs <pod> --previous | |||
``` | |||
我基本可以判断到底是: | |||
* Probe 配置问题 | |||
* Spring Boot Health Check 问题 | |||
* OOM | |||
* GC 卡顿 | |||
* 数据库连接慢 | |||
* 启动时间过长 | |||
哪一种导致的重启。 | |||
</pre> | |||
=3= | |||
<pre> | <pre> | ||
对于 **Spring Boot 跑在 Kubernetes**,现在比较推荐的做法是: | 对于 **Spring Boot 跑在 Kubernetes**,现在比较推荐的做法是: | ||
2026年6月2日 (二) 17:18的最新版本
issue
如果日志是:
```text
Readiness probe failed:
Get "http://10.6.6.1:8080/actuator/health":
context deadline exceeded (Client.Timeout exceeded while awaiting headers)
```
**按 Kubernetes 的设计,仅仅 Readiness Probe 失败不会导致 Pod 重启。**
所以你需要先确认:
```bash
kubectl describe pod <pod-name>
```
看看 Events 里面是不是还有:
```text
Liveness probe failed
```
或者:
```text
Killing container
```
或者:
```text
OOMKilled
```
---
# 第一种情况:实际上是 Liveness 失败导致重启
经常看到:
```yaml
livenessProbe:
httpGet:
path: /actuator/health
readinessProbe:
httpGet:
path: /actuator/health
```
Spring Boot 在:
* 数据库慢
* Redis慢
* GC停顿
时:
```text
/actuator/health
```
响应超过 timeout。
然后:
```text
Readiness Fail
Liveness Fail
```
最终:
```text
Pod Restart
```
---
# 第二种情况:timeout 设置太小
例如:
```yaml
readinessProbe:
timeoutSeconds: 1
```
Spring Boot Actuator 访问数据库时:
```text
耗时 2~3 秒
```
就会报:
```text
context deadline exceeded
```
建议:
```yaml
readinessProbe:
timeoutSeconds: 5
```
甚至:
```yaml
readinessProbe:
timeoutSeconds: 10
```
---
# 第三种情况:数据库检查拖慢了 Health Check
Spring Boot 默认:
```text
/actuator/health
```
会检查:
* DB
* Redis
* RabbitMQ
* Kafka
* Elasticsearch
例如数据库偶尔卡顿:
```text
Health Endpoint
↓
等待 DB
↓
超时
↓
Probe Failed
```
---
查看:
```bash
curl localhost:8080/actuator/health
```
或者:
```bash
time curl localhost:8080/actuator/health
```
看看是不是很慢。
---
# 推荐使用 Spring Boot Probe Endpoint
开启:
```yaml
management:
endpoint:
health:
probes:
enabled: true
```
然后:
```yaml
livenessProbe:
httpGet:
path: /actuator/health/liveness
port: 8080
readinessProbe:
httpGet:
path: /actuator/health/readiness
port: 8080
```
这样:
### Liveness
只检查 JVM 是否活着。
### Readiness
检查业务依赖。
避免:
```text
DB挂了
↓
Liveness失败
↓
Pod无限重启
```
---
# 第四种情况:JVM 卡顿
如果日志里经常出现:
```text
context deadline exceeded
```
但应用日志没有报错,
检查:
```bash
kubectl top pod
```
看看:
```text
CPU 100%
Memory 接近 Limit
```
以及:
```bash
kubectl logs pod
```
是否有:
```text
Full GC
OutOfMemoryError
```
---
# 第五种情况:启动太慢
如果是刚启动就出现:
```text
Readiness probe failed
```
可能:
* Flyway migration
* Hibernate 初始化
* 缓存预热
需要增加:
```yaml
startupProbe:
httpGet:
path: /actuator/health
port: 8080
failureThreshold: 60
periodSeconds: 5
```
这样允许:
```text
5分钟启动时间
```
---
# 作为 DevOps,我会先执行这几个命令
```bash
kubectl describe pod <pod>
```
看:
```text
Liveness probe failed?
OOMKilled?
Exit Code?
```
```bash
kubectl get pod <pod> -o yaml | grep -A20 probe
```
看当前 Probe 配置。
```bash
kubectl logs <pod> --previous
```
看上一次容器退出原因。
```bash
kubectl top pod <pod>
```
看资源是否耗尽。
---
如果你把下面这些贴出来:
```bash
kubectl describe pod <pod>
kubectl get deploy xxx -o yaml
kubectl logs <pod> --previous
```
我基本可以判断到底是:
* Probe 配置问题
* Spring Boot Health Check 问题
* OOM
* GC 卡顿
* 数据库连接慢
* 启动时间过长
哪一种导致的重启。
3
对于 **Spring Boot 跑在 Kubernetes**,现在比较推荐的做法是:
* **Startup Probe**:解决应用启动慢的问题
* **Readiness Probe**:判断是否接收流量
* **Liveness Probe**:判断是否需要重启
不要三个 Probe 都配成一样。
---
## 1. Spring Boot 开启 Actuator
通常在 `application.yml`:
```yaml
management:
endpoints:
web:
exposure:
include: health
endpoint:
health:
probes:
enabled: true
```
Spring Boot 2.3+ 会自动提供:
```text
/actuator/health
/actuator/health/liveness
/actuator/health/readiness
```
---
## 2. 推荐 Probe 配置
```yaml
livenessProbe:
httpGet:
path: /actuator/health/liveness
port: 8080
initialDelaySeconds: 60
periodSeconds: 15
timeoutSeconds: 5
failureThreshold: 3
readinessProbe:
httpGet:
path: /actuator/health/readiness
port: 8080
initialDelaySeconds: 20
periodSeconds: 10
timeoutSeconds: 3
failureThreshold: 3
startupProbe:
httpGet:
path: /actuator/health
port: 8080
periodSeconds: 10
failureThreshold: 30
```
---
## 3. 三个 Probe 的职责
### Startup Probe
解决:
```text
Spring Boot启动慢
JPA初始化
Flyway迁移
缓存预热
```
例如:
```text
启动需要3分钟
```
那么:
```yaml
failureThreshold: 30
periodSeconds: 10
```
表示:
```text
允许启动5分钟
```
启动成功前:
```text
不执行Liveness
不执行Readiness
```
避免:
```text
启动还没完成
K8s就把Pod杀了
```
---
### Readiness Probe
判断:
```text
能否接收业务流量
```
例如:
* 数据库连接正常
* Redis正常
* Kafka正常
通过:
```text
Ready=True
```
Service 才会转发流量。
失败:
```text
Ready=False
```
Pod不会被重启。
只是:
```text
从Service Endpoint移除
```
---
### Liveness Probe
判断:
```text
JVM是否假死
```
例如:
* 死锁
* 线程池卡死
* GC长时间卡顿
失败:
```text
连续3次失败
```
K8s:
```text
重启Pod
```
---
## 4. Spring Boot 最佳实践
不要:
```yaml
livenessProbe:
path: /actuator/health
readinessProbe:
path: /actuator/health
```
因为:
```text
数据库挂了
```
Spring Boot 默认:
```json
{
"status":"DOWN"
}
```
结果:
```text
Liveness失败
Readiness失败
```
Pod不断重启。
形成:
```text
CrashLoopBackOff
```
---
推荐:
```text
Liveness
只检查JVM是否活着
Readiness
检查数据库
Redis
MQ
```
即:
```text
/actuator/health/liveness
/actuator/health/readiness
```
---
## 5. 大厂常见配置
例如:
```yaml
startupProbe:
failureThreshold: 60
periodSeconds: 5
readinessProbe:
periodSeconds: 10
livenessProbe:
periodSeconds: 30
```
原因:
```text
Readiness检查频繁
Liveness检查保守
```
避免:
```text
短暂GC
网络抖动
```
导致误重启。
---
## 6. AWS EKS 场景
如果前面有:
* ALB
* NLB
* Ingress Nginx
建议:
```text
ALB Health Check
↓
Readiness Probe
↓
Spring Boot Readiness
```
形成健康检查链路。
例如:
```text
ALB
↓
Ingress
↓
Service
↓
Pod Readiness
↓
DB
```
这样数据库异常时:
```text
Readiness=False
```
Pod自动摘流量,
而不是:
```text
Pod被疯狂重启
```
---
DevOps/SRE 面试里如果被问:
> Spring Boot 的 Liveness 和 Readiness 怎么设计?
标准回答是:
> Liveness 只检查应用进程和 JVM 是否存活,不依赖外部服务;Readiness 检查应用是否具备对外提供服务的能力,包括数据库、缓存和消息队列连接状态。对于启动较慢的 Spring Boot 应用,还会增加 Startup Probe 防止启动期间被 Kubernetes 误判并重启。