[대세는쿠버네티스] 3. [중급] Pod -1

2021-08-26

k8s, kubernetes

대세는 쿠버네티스 강의 중급편 시작

1. Pod의 라이프사이클

사람이 나이에 따라 행동이 다른 것처럼
pod도 특정 단계에 따른 행동이 존재한다
여러 pod의 기능들이 특정 단계와 밀접 - 라이프 사이클을 잘 알아야 한다

pod 생성후 status를 확인하면 볼 수 있는 것들로 간단 요약

status
- Phase : Pod의 대표 상태
  - 종류
    - Pending, Running, Succeeded, Failed, Unknown
- conditions : pod의 실행 단계와 그 상태
  - 종류
    - Initialized, ContainerReady, PodScheduled, Ready
  - Reason : 컨디션의 세부항목, Status가 False일 경우 그 이유가 추가 됨
    - ContainersNotReady, PodCompleted
containers
- ConteinerStatuses
  - State : 컨테이너의 상태
    - Waiting, Running, Terminated
  - Reason
    - ContainerCreating, CrashLoopBackOff, Error, Completed

각 단계별

Pending

pod의 최초상태
초기화
- 사전작업이 필요한 경우 설정에 initContainer와 사전 스크립트들을 정의
- 해당 컨테이너 구동 전에 initConatiner가 먼저 구동되어 사전작업 진행
- 해당작업이 끝나면 Initialized를 True로, 실패시 False처리한다.
노드 스케쥴링
- 설정에 따라 자동 혹은 정해진 노드에 스케쥴링이 된다
- 완료되면 PodScheduled : True가 된다
컨테이너 이미지 다운로드
- 컨테이너 상태는 Waiting이 되며, Reason은 ContainerCreating이 된다

Running

pod과 컨테이너가 구동되면서 Running상태가 된다
pod 내부의 컨테이너 구동이 실패인 경우
컨테이너의 state는 Waiting, Reason은 CrashLoopBackOff가 된다
위의 경우 Pod는 Running으로 간주해서 state: Running이 된다
다만 Pod 컨디션은 ContainerReady : False, Ready : False가 된다
이후 컨디션들이 정상적으로 Running이 된다면 위의 Pod 컨디션들은 True가 된다

위의 내용의 중요점

Pod이 Running일 상태에도 컨테이너는 Running이 아닐 수 있음

따라서 Pod이 아닌 컨테이너까지도 모니터링을 할 필요가 있음

Succeeded, Failed

Job, CronJob등의 Pod의 경우 Job을 마치게 되면 일을 하지 않는 상태가 됨
작업 내용에 따라 위의 컨디션으로 갈림
성공과 실패 관계 없이 pod의 컨디션은 ContainerReady : False, Ready : False
성공시
- Pod : state : Succeeded
- 컨테이너 : state : Terminated, Reason : Completed
실패시
- Pod : state : Failed
- 컨테이너 : state : Terminated, Reason : Error ← 에러가 난 컨테이너

다른 사례

pending에서 바로 Failed로 빠질 수 있음
pending,running중에 Unknown으로 빠질 수도 있음 (통신장애)
통신장애가 복구가 안되는 경우 Failed로 처리

2. Pod - ReadinessProbe, LivenessProbe

필요성

오토 힐링시 Pod이 새로 구동되는 경우
Pod과 컨테이너는 Running 상태이나 아직 어플리케이션 로딩이 안된 상태가 존재
이떄 서비스를 따라온 트래픽이 해당 Pod으로 분산되는 경우
해당 트래픽의 사용자는 에러메세지를 보게 된다
ReadinessProbe : 어플리케이션 구동 순간의 트래픽 실패를 없앰
- 어플리케이션 준비가 되는게 확인이 될때까지 트래픽 유입을 막음
- 어플리케이션 준비가 확인되면 트래픽 유입
이후 다시 어플리케이션 다운시 문제?
- Tomcat은 돌고 있으나 어플리케이션 상태의 메모리 오버플로우등
- 502 에러가 보통 떨어지는 경우?
- 이때는 톰캣의 프로세스가 죽은게 아니라 톰캣 위의 서비스가 문제인 상태
- 톰캣의 프로세스를 보고 있는 pod입장에서는 계속 Running상태로 있게 됨
- 따라서 계속적으로 트래픽이 유입되는데 서비스가 되지 않는 상태
LivenessProbe : App 장애시 지속적인 트래픽 실패를 없앰
- 해당 앱에 문제가 생겼을때 pod restart
- 잠깐의 트래픽 에러는 발생하나 지속적인 트래픽 실패를 막을 수 있음
pod 생성시 꼭 사용해야함 - 서비스를 좀 더 안정적으로 운용 가능

내용과 속성

ReadinessProbe, LivenessProbe 모두 사용목적이 다르나 내용이 같음
httpget, Exec, tcpSocket 으로 해당 앱 상태 확인 - 필수값
- httpget : 포트, 호스트 패스 , http헤더, 스킴등을 체크
- exec : 커맨드 실행
- tcpSocket : 포트 , 호스트
옵션
- initialDeplaySeconds(default: 0초) - 최초 probe를 하기전 딜레이 시간
- periodSeconds(default: 10초) - probe 체크 간격
- timeoutSeconds(default: 1초) - probe 결과 대기 시간
- successThreshold(default: 1회) - 몇 번 결과로 성공 여부 판단
- failureThreshold(default: 3회) - 몇 번 실패로 실패 여부 판단

3. Pod - ReadinessProbe, LivenessProbe 실습

1. ReadinessProbe

1-1)Service

apiVersion: v1
kind: Service
metadata:
  name: svc-readiness
spec:
  selector:
    app: readiness
  ports:
  - port: 8080
    targetPort: 8080

1-2) Pod

apiVersion: v1
kind: Pod
metadata:
  name: pod1
  labels:
    app: readiness  
spec:
  containers:
  - name: container
    image: kubetm/app
    ports:
    - containerPort: 8080  
  terminationGracePeriodSeconds: 0

[root@m-k8s master-node]# k get svc
NAME            TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)    AGE
kubernetes      ClusterIP   10.96.0.1       <none>        443/TCP    9d
svc-readiness   ClusterIP   10.106.164.27   <none>        8080/TCP   2m7s
[root@m-k8s master-node]# while true; do date && curl 10.106.164.27:8080/hostname; sleep 1; done
Wed Aug 18 14:37:22 KST 2021
Hostname : pod1
Wed Aug 18 14:37:23 KST 2021
Hostname : pod1
Wed Aug 18 14:37:24 KST 2021
Hostname : pod1
Wed Aug 18 14:37:25 KST 2021
Hostname : pod1
Wed Aug 18 14:37:26 KST 2021
Hostname : pod1
[반복]

1-3) Pod

apiVersion: v1
kind: Pod
metadata:
  name: pod-readiness-exec1
  labels:
    app: readiness  
spec:
  containers:
  - name: readiness
    image: kubetm/app
    ports:
    - containerPort: 8080  
    readinessProbe:
      exec:
        command: ["cat", "/readiness/ready.txt"]
      initialDelaySeconds: 5
      periodSeconds: 10
      successThreshold: 3
    volumeMounts:
    - name: host-path
      mountPath: /readiness
  volumes:
  - name : host-path
    hostPath:
      path: /tmp/readiness
      type: DirectoryOrCreate
  terminationGracePeriodSeconds: 0

pod-readiness-exec1 pod은 readinessProbe를 가지고 있음
readinessProbe가 실패했으므로 트래픽 유입없음 - 실패도 없음

상태확인

[root@m-k8s ~]# kubectl get events -w | grep pod-readiness-exec1

30s         Normal    Scheduled                 pod/pod-readiness-exec1   Successfully   assigned default/pod-readiness-exec1 to w2-k8s
29s         Normal    Pulling                   pod/pod-readiness-exec1   Pulling image   "kubetm/app"
14s         Normal    Pulled                    pod/pod-readiness-exec1   Successfully   pulled image "kubetm/app" in 15.704192038s
12s         Normal    Created                   pod/pod-readiness-exec1   Created   container readiness
12s         Normal    Started                   pod/pod-readiness-exec1   Started   container readiness
0s          Warning   Unhealthy                 pod/pod-readiness-exec1   Readiness   probe failed: cat: /readiness/ready.txt: No such file or directory
0s          Warning   Unhealthy                 pod/pod-readiness-exec1   Readiness   probe failed: cat: /readiness/ready.txt: No such file or directory
0s          Warning   Unhealthy                 pod/pod-readiness-exec1   Readiness   probe failed: cat: /readiness/ready.txt: No such file or directory
[반복]

Readness 실패 이벤트 확인

해당 팟의 컨디션과 엔드포인트 확인

[root@m-k8s ~]# kubectl describe pod pod-readiness-exec1 | grep -A5 Conditions
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
[root@m-k8s ~]# 
[root@m-k8s ~]# kubectl describe endpoints svc-readiness
Name:         svc-readiness
Namespace:    default
Labels:       <none>
Annotations:  endpoints.kubernetes.io/last-change-trigger-time: 2021-08-18T05:39:12Z
Subsets:
  Addresses:          172.16.180.6
  NotReadyAddresses:  172.16.103.165
  Ports:
    Name     Port  Protocol
    ----     ----  --------
    <unset>  8080  TCP

Events:  <none>
[root@m-k8s ~]# touch ready.txt

컨디션 확인 : Ready와 ContainerReady False 부분
엔드포인트 확인시 정상팟은 Addresses로, 현재 실험중인 팟은 NotReadyAddresses 인 부분 확인

readinessProbe 활성화

컨테이너에서 파일 생성

[root@m-k8s ~]# k get pod
NAME                  READY   STATUS    RESTARTS   AGE
pod-readiness-exec1   0/1     Running   0          11m
pod1                  1/1     Running   0          14m
[root@m-k8s ~]# k exec -it pod-readiness-exec1 -- /bin/bash
root@pod-readiness-exec1:/# cd readiness/
root@pod-readiness-exec1:/readiness# touch ready.txt
root@pod-readiness-exec1:/readiness#

이벤트 probe failed가 사라짐
3번 확인 처리되면 이제 성공적인 pod이므로 트래픽이 가기 시작함

트래픽 확인

Wed Aug 18 14:51:05 KST 2021
Hostname : pod1
Wed Aug 18 14:51:06 KST 2021
Hostname : pod1
Wed Aug 18 14:51:07 KST 2021
Hostname : pod1
Wed Aug 18 14:51:08 KST 2021
Hostname : pod1
Wed Aug 18 14:51:09 KST 2021
Hostname : pod1
Wed Aug 18 14:51:10 KST 2021
Hostname : pod1
Wed Aug 18 14:51:11 KST 2021
Hostname : pod1
Wed Aug 18 14:51:12 KST 2021
Hostname : pod1
Wed Aug 18 14:51:13 KST 2021
Hostname : pod1
Wed Aug 18 14:51:14 KST 2021
Hostname : pod1
Wed Aug 18 14:51:15 KST 2021
Hostname : pod1
Wed Aug 18 14:51:16 KST 2021
Hostname : pod1
Wed Aug 18 14:51:17 KST 2021
Hostname : pod-readiness-exec1
Wed Aug 18 14:51:18 KST 2021
Hostname : pod-readiness-exec1
Wed Aug 18 14:51:19 KST 2021
Hostname : pod-readiness-exec1
Wed Aug 18 14:51:20 KST 2021
Hostname : pod1
Wed Aug 18 14:51:21 KST 2021
Hostname : pod-readiness-exec1
Wed Aug 18 14:51:22 KST 2021
Hostname : pod-readiness-exec1
Wed Aug 18 14:51:23 KST 2021
Hostname : pod-readiness-exec1
Wed Aug 18 14:51:24 KST 2021
Hostname : pod-readiness-exec1
Wed Aug 18 14:51:26 KST 2021
Hostname : pod1
Wed Aug 18 14:51:27 KST 2021
Hostname : pod1
Wed Aug 18 14:51:28 KST 2021
Hostname : pod1
Wed Aug 18 14:51:29 KST 2021
Hostname : pod1
Wed Aug 18 14:51:30 KST 2021
Hostname : pod-readiness-exec1
Wed Aug 18 14:51:31 KST 2021

컨디션과 엔드포인트 다시 재확인

[root@m-k8s ~]# kubectl describe pod pod-readiness-exec1 | grep -A5 Conditions
Conditions:
  Type              Status
  Initialized       True 
  Ready             True 
  ContainersReady   True 
  PodScheduled      True 
[root@m-k8s ~]# kubectl describe endpoints svc-readiness
Name:         svc-readiness
Namespace:    default
Labels:       <none>
Annotations:  endpoints.kubernetes.io/last-change-trigger-time: 2021-08-18T05:51:12Z
Subsets:
  Addresses:          172.16.103.165,172.16.180.6
  NotReadyAddresses:  <none>
  Ports:
    Name     Port  Protocol
    ----     ----  --------
    <unset>  8080  TCP

Events:  <none>
[root@m-k8s ~]#

Ready와 ContainerReady False 부분이 True로 바뀐 부분 확인
기존 NotReadyAddresses였던 172.16.103.165가 Addresses 집합으로 옮겨 진 부분 확인

2. LivenessProve

2-1) Service

apiVersion: v1
kind: Service
metadata:
  name: svc-liveness
spec:
  selector:
    app: liveness
  ports:
  - port: 8080
    targetPort: 8080

2-2) Pod

apiVersion: v1
kind: Pod
metadata:
  name: pod2
  labels:
    app: liveness
spec:
  containers:
  - name: container
    image: kubetm/app
    ports:
    - containerPort: 8080
  terminationGracePeriodSeconds: 0

[root@m-k8s tmp]# k get svc
NAME           TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)    AGE
kubernetes     ClusterIP   10.96.0.1        <none>        443/TCP    9d
svc-liveness   ClusterIP   10.105.150.122   <none>        8080/TCP   3h37m
[root@m-k8s tmp]# while true; do date && curl 10.105.150.122:8080/health; sleep 1; done
Wed Aug 18 18:43:07 KST 2021
pod2 is Running
Wed Aug 18 18:43:08 KST 2021
pod2 is Running
Wed Aug 18 18:43:09 KST 2021
pod2 is Running
[반복]

2-3) Pod

apiVersion: v1
kind: Pod
metadata:
  name: pod-liveness-httpget1
  labels:
    app: liveness
spec:
  containers:
  - name: liveness
    image: kubetm/app
    ports:
    - containerPort: 8080
    livenessProbe:
      httpGet:
        path: /health
        port: 8080
      initialDelaySeconds: 5
      periodSeconds: 10
      failureThreshold: 3
  terminationGracePeriodSeconds: 0

[root@m-k8s ~]# kubectl get events -w | grep pod-liveness-httpget1
# yaml 반영시까지 가만히...
0s          Normal   Scheduled   pod/pod-liveness-httpget1   Successfully assigned default/pod-liveness-httpget1 to w2-k8s
0s          Warning   FailedMount   pod/pod-liveness-httpget1   MountVolume.SetUp failed for volume "kube-api-access-rx5c5" : failed to sync configmap cache: timed out waiting for the condition
0s          Normal    Pulling       pod/pod-liveness-httpget1   Pulling image "kubetm/app"
0s          Normal    Pulled        pod/pod-liveness-httpget1   Successfully pulled image "kubetm/app" in 2.542199305s
0s          Normal    Created       pod/pod-liveness-httpget1   Created container liveness
0s          Normal    Started       pod/pod-liveness-httpget1   Started container liveness

Wed Aug 18 18:45:07 KST 2021
pod2 is Running
Wed Aug 18 18:45:08 KST 2021
pod2 is Running
Wed Aug 18 18:45:09 KST 2021
pod2 is Running
Wed Aug 18 18:45:10 KST 2021
pod2 is Running
Wed Aug 18 18:45:11 KST 2021
pod2 is Running
Wed Aug 18 18:45:12 KST 2021
pod2 is Running
Wed Aug 18 18:45:13 KST 2021
pod2 is Running
Wed Aug 18 18:45:14 KST 2021
pod2 is Running
Wed Aug 18 18:45:15 KST 2021
pod2 is Running
Wed Aug 18 18:45:16 KST 2021
pod-liveness-httpget1 is Running # 새로운 팟 등장
Wed Aug 18 18:45:17 KST 2021
pod2 is Running
Wed Aug 18 18:45:18 KST 2021
pod2 is Running
Wed Aug 18 18:45:19 KST 2021
pod2 is Running
Wed Aug 18 18:45:20 KST 2021
pod-liveness-httpget1 is Running
Wed Aug 18 18:45:21 KST 2021
pod-liveness-httpget1 is Running
Wed Aug 18 18:45:22 KST 2021
pod2 is Running
Wed Aug 18 18:45:23 KST 2021
pod2 is Running
Wed Aug 18 18:45:24 KST 2021
pod-liveness-httpget1 is Running
Wed Aug 18 18:45:25 KST 2021
pod-liveness-httpget1 is Running
Wed Aug 18 18:45:26 KST 2021
pod2 is Running
Wed Aug 18 18:45:28 KST 2021
pod-liveness-httpget1 is Running

이벤트 캐치

1	watch "kubectl describe pod pod-liveness-httpget1 \| grep -A10 Events"

health 체크

[root@m-k8s ~]# curl 172.16.103.166:8080/health
pod-liveness-httpget1 is Running
[root@m-k8s ~]# curl -s -o /dev/null -w "%{http_code}\n" 172.16.103.166:8080/health
200
[root@m-k8s ~]# watch 'curl -s -o /dev/null -w "%{http_code}" 172.16.103.166:8080/health'

# 다른방법
[root@m-k8s ~]# while true; do curl -s -o /dev/null -w "%{http_code}\n" 172.16.103.166:8080/health;sleep 1;done
200
200
200

livenessProbe의 팟의 status를 500으로 바꾸면

[root@m-k8s ~]# k get pod -o wide
NAME                    READY   STATUS    RESTARTS   AGE   IP               NODE     NOMINATED NODE   READINESS   GATES
pod-liveness-httpget1   1/1     Running   0          17m   172.16.103.166   w2-k8s   <none>           <none>
pod2                    1/1     Running   0          16s   172.16.180.8     w4-k8s   <none>           <none>
[root@m-k8s ~]# curl 172.16.103.166:8080/status500
Status Code has Changed to 500
[root@m-k8s ~]#

상태를 보면

d2 is Running
Wed Aug 18 19:03:18 KST 2021
pod2 is Running
Wed Aug 18 19:03:19 KST 2021
pod2 is Running
Wed Aug 18 19:03:20 KST 2021
pod-liveness-httpget1 : Internal Server Error # 에러가 발생 함
Wed Aug 18 19:03:21 KST 2021
pod2 is Running
Wed Aug 18 19:03:22 KST 2021
pod2 is Running
Wed Aug 18 19:03:23 KST 2021
pod2 is Running
Wed Aug 18 19:03:24 KST 2021
pod2 is Running
Wed Aug 18 19:03:25 KST 2021
pod-liveness-httpget1 : Internal Server Error
Wed Aug 18 19:03:26 KST 2021
pod-liveness-httpget1 : Internal Server Error

# 그러다가

Wed Aug 18 19:03:35 KST 2021
pod-liveness-httpget1 : Internal Server Error
Wed Aug 18 19:03:36 KST 2021
pod2 is Running
Wed Aug 18 19:03:37 KST 2021
pod-liveness-httpget1 : Internal Server Error
Wed Aug 18 19:03:38 KST 2021
pod2 is Running
Wed Aug 18 19:03:39 KST 2021
pod2 is Running
Wed Aug 18 19:03:40 KST 2021
pod2 is Running
Wed Aug 18 19:03:41 KST 2021
pod2 is Running
Wed Aug 18 19:03:42 KST 2021
pod2 is Running
Wed Aug 18 19:03:43 KST 2021
pod2 is Running
Wed Aug 18 19:03:44 KST 2021
pod2 is Running
Wed Aug 18 19:03:45 KST 2021
pod-liveness-httpget1 is Running # 복구됨 
Wed Aug 18 19:03:46 KST 2021
pod-liveness-httpget1 is Running
Wed Aug 18 19:03:47 KST 2021
pod-liveness-httpget1 is Running
Wed Aug 18 19:03:48 KST 2021
pod-liveness-httpget1 is Running
Wed Aug 18 19:03:49 KST 2021
pod-liveness-httpget1 is Running
Wed Aug 18 19:03:50 KST 2021
pod-liveness-httpget1 is Running

curl로 http 코드만 watch로 얻었을떄 부분

200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
500 # 500 에러
500
500
500
500
500
500
500
500
500
500
500
500
500
500
500
500
500
500
500
500
500
500
500
500
500
500
500
500
500
000 # 000 잠깐
000
000
200 # 200으로 돌아옴
200
200
200
200
200
200
200
200

500 바꿨을때 livenessProbe에 의해 컨테이너를 죽인후 다시 시작하는 부분 확인 가능

[root@m-k8s ~]# kubectl describe pod pod-liveness-httpget1 | grep -A10 Events
Events:
  Type     Reason       Age                 From               Message
  ----     ------       ----                ----               -------
  Normal   Scheduled    20m                 default-scheduler  Successfully assigned default/pod-liveness-httpget1 to w2-k8s
  Warning  FailedMount  20m                 kubelet            MountVolume.SetUp failed for volume "kube-api-access-rx5c5" : failed to sync configmap   cache: timed out waiting for the condition
  Normal   Pulled       19m                 kubelet            Successfully pulled image "kubetm/app" in 2.542199305s
  # 500바꿨을때 
  Warning  Unhealthy    92s (x3 over 112s)  kubelet            Liveness probe failed: HTTP probe failed with statuscode: 500
  Normal   Killing      92s                 kubelet            Container liveness failed liveness probe, will be restarted
  Normal   Pulling      90s (x2 over 20m)   kubelet            Pulling image "kubetm/app"
  Normal   Created      88s (x2 over 19m)   kubelet            Created container liveness
  Normal   Started      88s (x2 over 19m)   kubelet            Started container liveness
[root@m-k8s ~]#

HTTP Status 200~400까지는 성공, 그외의 상태코드는 실패로 처리

전체적인 샘플

Pod

apiVersion: v1
kind: Pod
metadata:
  name: pod-probe
  labels:
    app: probe
spec:
  containers:
  - name: probe
    image: kubetm/app
    ports:
    - containerPort: 8080  
    readinessProbe:
      exec:                   # command 내용으로 점검
        command: ["cat", "/readiness/ready.txt"]   
      initialDelaySeconds: 10
      periodSeconds: 5
      successThreshold: 3     # 3번 성공시 Service와 연결됨
    livenessProbe:
      httpGet:                # HttpGet 메소드로 점검
        path: /health         # 체크할 경로
        port: 8080            # 체크할 Port
      initialDelaySeconds: 5  # 최초 5초 후에 LivenessProbe 체크를 시작함
      periodSeconds: 10       # 10초마다 LivenessProbe 체크
      failureThreshold: 3      # 3번 실패시 Pod Restart

kubectl watch

# Object들의 모든 Event 정보를 지속적으로 조회해서 | 그중에 pod-readiness-exec1라는 단어와 매칭되는 내용만 출력
kubectl get events -w | grep pod-readiness-exec1
# pod-readiness-exec1이름의 Pod 상세 내용중에 | Events와 매칭되는 단어에서 20번째 줄까지 지속적으로 출력
watch "kubectl describe pod pod-readiness-exec1 | grep -A20 Events"

devcloudk8s