MegaThinking

better tokens, better intelligence, contributing superior tokens to models

  1. https://medium.com/analytics-vidhya/lenet-with-tensorflow-a35da0d503df
  2. https://medium.com/@mgazar/lenet-5-in-9-lines-of-code-using-keras-ac99294c8086

https://www.tensorflow.org/api_docs/python/tf/pad

paddings is an integer tensor with shape [n, 2], where n is the rank of tensor.

each dimension D

  • paddings[D, 0]: add before tensor
  • paddings[D, 1]: add after tensor

https://www.tensorflow.org/api_docs/python/tf/expand_dims

Docker Resource Limit

CPU

https://docs.docker.com/config/containers/resource_constraints/#cpu

  • --cpu-period: CFS 调度算法中的 cpu 时间分片,默认为 100ms
  • --cpu-quota: CFS quota,在每个 cpu period 分片中,在 cpu 限流前,docker container 能使用的 cpu 时间
  • --cpuset-cpus: docker container binding to cpu core
  • --cpu-shares: Set this flag to a value greater or less than the default of 1024 to increase or reduce the container’s weight, and give it access to a greater or lesser proportion of the host machine’s CPU cycles. This is only enforced when CPU cycles are constrained. When plenty of CPU cycles are available, all containers use as much CPU as they need. In that way, this is a soft limit.

Memory

https://docs.docker.com/config/containers/resource_constraints/#limit-a-containers-access-to-memory

  • --memory: The maximum amount of memory the container can use (cgroup limit)
  • --memory-swap:
  • --oom-kill-disable:

针对 OOM 补充说明如下

  1. 容器内进程使用 memory 超过限制,kernel 会触发 oom killer (cgroup),kill oom_score 高分进程
  2. 容器只要 1 pid 进程未退出,则容器不会退出

OOM 始终针对的是进程,而非容器

Docker Container OOMKilled status

  1. https://stackoverflow.com/questions/48618431/what-does-oom-kill-disable-do-for-a-docker-container
  2. https://github.com/moby/moby/issues/14440#issuecomment-119243820
  3. https://plumbr.io/blog/java/oomkillers-in-docker-are-more-complex-than-you-thought
  4. https://zhimin-wen.medium.com/memory-limit-of-pod-and-oom-killer-891ee1f1cad8
  5. https://faun.pub/understanding-docker-container-memory-limit-behavior-41add155236c
  6. https://github.com/moby/moby/issues/15621#issuecomment-181418985
  7. https://draveness.me/docker/
  8. https://github.com/moby/moby/issues/38352#issuecomment-446329512
  9. https://github.com/containerd/cgroups/issues/74
  10. https://github.com/kubernetes/kubernetes/issues/78973
  11. https://github.com/kubernetes/kubernetes/issues/50632

容器内的子进程发生了 oom killed,在 docker container 退出时也会被设置 OOMKilled 标志;参考该 issue

https://github.com/moby/moby/issues/15621#issuecomment-181418985

在 docker container 未退出时会设置 container event

https://docs.docker.com/engine/reference/commandline/events/

docker container 设置 OOMKilled 原理,参考该 issue

https://github.com/moby/moby/issues/38352#issuecomment-446329512

在实现上

  1. containerd 监听了一系列事件,假若获取到 cgroup oom event 则记录 OOMKilled = true
  2. containerd 将处理后的事件发送至 dockerd 进一步处理
  3. dockerd 在处理 OOM 事件时,记录 container oom 事件
  4. dockerd 在处理 Exit 事件时,将 OOMKilled = true 写入容器的 status

K8S Resource Limit

https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#how-pods-with-resource-limits-are-run

CPU (Docker Container config)

CPU is always requested as an absolute quantity, never as a relative quantity; 0.1 is the same amount of CPU on a single-core, dual-core, or 48-core machine.

  • --cpu-shares: max({requests.cpu} * 1024, 2)

例如 requests 为 180,则 --cpu-shares=184320

  • --cpu-period: 100

  • --cpu-quota: limits.cpu * 100

https://stackoverflow.com/a/63352630

The resulting value is the total amount of CPU time in microseconds that a container can use every 100ms. A container cannot use more than its share of CPU time during this interval.

The default quota period is 100ms. The minimum resolution of CPU quota is 1ms.

cpu 时间分片为 period,quota 为实际每个 period 周期中,可使用的 cpu time;假若受到 qutoa 限制的 cpu 任务,在当前 period 的 quota 仍未完成,则当前任务挂起,等待下个 period 继续执行

multi cpu 机器注意 quota 可以是 period 的倍数,例如限制 container 使用 0.5 cpu,则 --cpu-quota=50,假若主机有 20 cpu,限制 container 使用 10 cpu,则 --cpu-quota=10*100=1000

Memory (Docker Container config)

  • --memory: int({limits.memory})
  • --memory-swap: int({limits.memory})

the container does not have access to swap

K8s OOM Watcher

https://github.com/kubernetes/kubernetes/blob/v1.22.1/pkg/kubelet/oom/oom_watcher_linux.go

  • /dev/kmsg

Start watches for system oom’s and records an event for every system oom encountered.

当 kubelet 观测到节点发生 system oom 时(非 cgroup oom),生成 event;可通过 kubectl 工具查询

1
kubectl get event --field-selector type=Warning,reason=SystemOOM

如下 PR 尝试将 pod 内进程 oom 关联至 pod,未合入

https://github.com/kubernetes/kubernetes/issues/100483

https://github.com/kubernetes/kubernetes/pull/100487


graph LR
InitContainer --> TrainingContainer
InitContainer --> SidecarContainer

InitContainer and SidecarContainer act like system container and they are transparent to the TrainingContainer

TrainingJob(process) of user is running at TrainingContainer

we can do the init env action at InitContainer, such as download data, and the upload action can be done at SidecarContainer

however, there will be an engineering problem, that is, the file read permission problem. The best way is to make the InitC / SidecarC / TrainingC users (uid) the same

powered by mermaid

https://mermaid-js.github.io/mermaid/#/flowchart

https://theme-next.js.org/docs/tag-plugins/mermaid.html?highlight=mermaid

https://github.com/theme-next/hexo-theme-next/pull/649

pod spec of volcano job

https://github.com/volcano-sh/volcano/blob/v1.3.0/pkg/controllers/job/job_controller_util.go

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
import (
v1 "k8s.io/api/core/v1"
...
)

// MakePodName append podname,jobname,taskName and index and returns the string.
func MakePodName(jobName string, taskName string, index int) string {
return fmt.Sprintf(jobhelpers.PodNameFmt, jobName, taskName, index)
}

func createJobPod(job *batch.Job, template *v1.PodTemplateSpec, ix int) *v1.Pod {
templateCopy := template.DeepCopy()

pod := &v1.Pod{
ObjectMeta: metav1.ObjectMeta{
Name: jobhelpers.MakePodName(job.Name, template.Name, ix),
Namespace: job.Namespace,
OwnerReferences: []metav1.OwnerReference{
*metav1.NewControllerRef(job, helpers.JobKind),
},
Labels: templateCopy.Labels,
Annotations: templateCopy.Annotations,
},
Spec: templateCopy.Spec,
}

...
}

sysctl of pod spec

https://kubernetes.io/docs/tasks/administer-cluster/sysctl-cluster/

1
2
3
4
5
6
7
8
9
apiVersion: v1
kind: Pod
metadata:
name: sysctl-example
spec:
securityContext:
sysctls:
- name: net.ipv4.ip_local_port_range
value: "30000 50000"

find out current ip local port range

1
cat /proc/sys/net/ipv4/ip_local_port_range

https://www.thegeekdiary.com/how-to-reserve-a-port-range-for-a-third-party-application-in-centos-rhel/

Note: ip_local_port_range and ip_local_reserved_ports settings are independent and both are considered by the kernel when determining which ports are available for automatic port assignments.

https://blog.golang.org/context#:~:text=A%20Context%20is%20safe%20for,to%20signal%20all%20of%20them.

A Context is safe for simultaneous use by multiple goroutines. Code can pass a single Context to any number of goroutines and cancel that Context to signal all of them.

project structure

1
2
3
4
5
6
7
8
9
10
11
.
├── cmd
│   └── command.go
├── go.mod
├── go.sum
├── main.go
└── pkg
└── run
└── long_run_cli.go

3 directories, 5 files

main.go

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
package main

import (
"context"
"os"
"os/signal"
"syscall"

"zs/toolkit-cli/cmd"
)

func main() {
c := make(chan os.Signal, 2)
signal.Notify(c, syscall.SIGINT, syscall.SIGTERM)

ctx := context.Background()
ctx, cancel := context.WithCancel(ctx)

go func() {
select {
case <-c:
cancel()
}
}()

cmd.Execute(ctx)
}

command.go

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
package cmd

import (
"context"
"fmt"
"os"
"os/exec"

"github.com/spf13/cobra"

"zs/toolkit-cli/pkg/run"
)

var rootCmd = &cobra.Command{
Use: "long run cli",
Run: func(cmd *cobra.Command, args []string) {
cli := run.New()
err := cli.LongRun(cmd.Context())

if err != nil {
fmt.Printf("cli run err: %v\n", err)
if exitError, ok := err.(*exec.ExitError); ok {
fmt.Printf("exit code: %d\n", exitError.ExitCode())
}
}
},
}

func Execute(ctx context.Context) {
if err := rootCmd.ExecuteContext(ctx); err != nil {
fmt.Printf("err: %v\n", err)
os.Exit(1)
}
}

long_run_cli.go

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
package run

import (
"context"
"os/exec"
)

type CLI struct {

}

func (cli CLI) LongRun(ctx context.Context) error {
cmd := exec.CommandContext(ctx, "sleep", "30")
return cmd.Run()
}

func New() *CLI {
return &CLI{}
}

https://pkg.go.dev/os/exec#CommandContext

The provided context is used to kill the process (by calling os.Process.Kill) if the context becomes done before the command completes on its own.

https://github.com/golang/go/issues/21135

proposal: os/exec: allow user of CommandContext to specify the kill signal when context is done

commandContext will trigger SIGKILL when the ctx is done …

get exit code of pipeline command in background

https://stackoverflow.com/questions/37257668/get-exit-code-of-a-piped-background-process

https://stackoverflow.com/questions/35842600/tee-resets-exit-status-is-always-0

1
2
3
4
5
6
7
8
9
someCommand="python test.py"

{
${someCommand} 2>&1 | tee -a training.log
exit ${PIPESTATUS[0]}
} &

wait $!
echo $?

回显

1
127

综上 wait 即使指定的是 pid,然而内部代码实现依然会 wait pid 对应的 job,这点 wait 的文档里边说的比较隐晦

https://www.gnu.org/software/bash/manual/html_node/Job-Control-Builtins.html

Wait until the child process specified by each process ID pid or job specification jobspec exits and return the exit status of the last command waited for.

注意 return the exit status of the last command waited for

所以上述代码,wait 命令实际上获取到的是 tee 命令的退出码

在 shell 中获取 pipeline command status 的简易方法似乎只能通过 ${PIPESTATUS[0]}

get pid of pipeline command in background

进一步的,我们想获取 someCommand 的 pid,有办法么,尝试做如下改造

1
2
3
4
5
6
7
8
9
10
11
someCommand="python test.py"

{
${someCommand} 2>&1 &
pid_someCommand=$!
wait ${pid_someCommand}
exit $?
} | tee -a training.log &

wait $!
echo ${PIPESTATUS[0]}

回显

1
0

but not work

最后只能使用 ps -ef | grep someCommand 的终极大法,加上通过 subshell pid 作为 parent id 过滤

1
2
3
4
5
6
7
8
9
10
11
12
13
someCommand="python test.py"

{
${someCommand} 2>&1 | tee -a training.log
exit ${PIPESTATUS[0]}
} &
someCommand_job_pid=$!

someCommand_pid=`ps -efj | awk -v parent_pid=${someCommand_job_pid} '$3==parent_pid { print $0 }' | grep "${someCommand}" | awk '{ print $2 }'`
echo someCommand_pid ${someCommand_pid}

wait ${someCommand_job_pid}
echo $?

回显

1
2
someCommand_pid 55863
127

test.py

1
2
3
4
5
import time
import sys

time.sleep(5)
sys.exit(127)

log output format sample

1
2
INFO[2021-07-04 15:26:26]main.go:28 have a nice day                               zs=log
INFO[2021-07-04 15:26:26]main.go:29 zs gogogo zs=log

code sample

show timestamp

the meaning of [0000]

add common prefix

have a little overhead, add filename and line number

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
package main

import (
"path"
"runtime"
"strconv"

"github.com/sirupsen/logrus"
)

func main() {
var log = logrus.New()

formatter := &logrus.TextFormatter{
FullTimestamp: true,
TimestampFormat: "2006-01-02 15:04:05",
CallerPrettyfier: func(f *runtime.Frame) (string, string) {
_, filename := path.Split(f.File)
// do not log func name
return "", filename + ":" + strconv.Itoa(f.Line)
},
}
log.SetFormatter(formatter)
log.SetReportCaller(true)

contextLogger := log.WithField("zs", "log")

contextLogger.Info("have a nice day")
contextLogger.Infof("%s gogogo", "zs")
}

third-party formatter

https://github.com/sirupsen/logrus#formatters

log output format sample

1
2
[2021-07-04 15:50:26]  INFO log: have a nice day
[2021-07-04 15:50:26] INFO log: zs gogogo

code sample

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
package main

import (
"github.com/sirupsen/logrus"
prefixed "github.com/x-cray/logrus-prefixed-formatter"
)

func main() {
var log = logrus.New()

formatter := &prefixed.TextFormatter{
FullTimestamp: true,
TimestampFormat: "2006-01-02 15:04:05",
}
log.Formatter = formatter

contextLogger := log.WithField("prefix", "log")

contextLogger.Info("have a nice day")
contextLogger.Infof("%s gogogo", "zs")
}

as previous code show

1
contextLogger := log.WithField("prefix", "log")

u can prefix a log key and colon before the msg output

docker security

https://docs.docker.com/engine/security/

docker security 总的来说,一个是使用了 kernel namespace 技术,为每个 container 创建了 process, network 等 namepsace,使得多个 container 不会有很大的相互影响

另外一个方面是使用了 control groups 技术,用于限制 container 所使用的各类资源

ensure that each container gets its fair share of memory, CPU, disk I/O

简单理解,比如 cpu 资源,cgroup 用于避免某个 container 不当使用(或者恶意 or 无意代码 bug)cpu,导致其他 container 没法正常使用 cpu 的场景

container root user

https://docs.docker.com/engine/security/userns-remap/

container 中不建议使用 root 用户执行进程,很大部分原因因为容器内的 uid gid 会映射到 host 上,举个例子,一旦容器内的进程逃逸到 host 上,那么它也有 root 用户的权限

虽然说容器内的进程逃逸,是很严重的安全问题,docker 社区会第一时间修复

https://community.mellanox.com/s/article/in-between-ethernet-vlans-and-infiniband-pkeys

https://community.mellanox.com/s/article/howto-use-infiniband-pkey-membership-types-in-virtualization-environment--connectx-3--connectx-3-pro-x

https://community.mellanox.com/s/article/howto-configure-ipoib-networks-with-gateway-and-multiple-pkeys

https://community.mellanox.com/s/article/HowTo-Configure-SR-IOV-for-ConnectX-4-ConnectX-5-ConnectX-6-with-KVM-Ethernet

https://github.com/Mellanox/k8s-rdma-sriov-dev-plugin

https://github.com/mellanox/k8s-rdma-shared-dev-plugin

https://docs.openshift.com/container-platform/4.6/networking/hardware_networks/add-pod.html#add-pod

IOV: I/O Virtualization

Single Root I/O Virtualization (SR-IOV) network

https://docs.openshift.com/container-platform/4.6/networking/hardware_networks/about-sriov.html

https://github.com/k8snetworkplumbingwg/sriov-cni

https://docs.mellanox.com/display/MLNXOFEDv461000/Kubernetes%20Using%20SR-IOV

https://community.mellanox.com/s/article/kubernetes-ipoib-sriov-networking-with-connectx4-connectx5

Type size

https://golang.org/ref/spec#Size_and_alignment_guarantees

https://github.com/ardanlabs/gotraining-studyguide/blob/master/go/language/struct.go

1
2
3
4
5
type example struct {
flag bool
counter int16
pi float32
}

字节对齐系数 #pragma pack(n)

  • 成员对齐
  • 结构体对齐

对齐系数规则

  1. For a variable x of any type: unsafe.Alignof(x) is at least 1.
  2. For a variable x of struct type: unsafe.Alignof(x) is the largest of all the values unsafe.Alignof(x.f) for each field f of x, but at least 1.
  3. For a variable x of array type: unsafe.Alignof(x) is the same as the alignment of a variable of the array’s element type.

layout

  • bool(0)
  • int16(2)
  • float32(4)

8 bytes

https://eddycjy.gitbook.io/golang/di-1-ke-za-tan/go-memory-align

//TODO list

0%