0%

  1. https://medium.com/analytics-vidhya/lenet-with-tensorflow-a35da0d503df
  2. https://medium.com/@mgazar/lenet-5-in-9-lines-of-code-using-keras-ac99294c8086

https://www.tensorflow.org/api_docs/python/tf/pad

paddings is an integer tensor with shape [n, 2], where n is the rank of tensor.

each dimension D

  • paddings[D, 0]: add before tensor
  • paddings[D, 1]: add after tensor

https://www.tensorflow.org/api_docs/python/tf/expand_dims

pod spec of volcano job

https://github.com/volcano-sh/volcano/blob/v1.3.0/pkg/controllers/job/job_controller_util.go

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
import (
v1 "k8s.io/api/core/v1"
...
)

// MakePodName append podname,jobname,taskName and index and returns the string.
func MakePodName(jobName string, taskName string, index int) string {
return fmt.Sprintf(jobhelpers.PodNameFmt, jobName, taskName, index)
}

func createJobPod(job *batch.Job, template *v1.PodTemplateSpec, ix int) *v1.Pod {
templateCopy := template.DeepCopy()

pod := &v1.Pod{
ObjectMeta: metav1.ObjectMeta{
Name: jobhelpers.MakePodName(job.Name, template.Name, ix),
Namespace: job.Namespace,
OwnerReferences: []metav1.OwnerReference{
*metav1.NewControllerRef(job, helpers.JobKind),
},
Labels: templateCopy.Labels,
Annotations: templateCopy.Annotations,
},
Spec: templateCopy.Spec,
}

...
}

sysctl of pod spec

https://kubernetes.io/docs/tasks/administer-cluster/sysctl-cluster/

1
2
3
4
5
6
7
8
9
apiVersion: v1
kind: Pod
metadata:
name: sysctl-example
spec:
securityContext:
sysctls:
- name: net.ipv4.ip_local_port_range
value: "30000 50000"

find out current ip local port range

1
cat /proc/sys/net/ipv4/ip_local_port_range

https://www.thegeekdiary.com/how-to-reserve-a-port-range-for-a-third-party-application-in-centos-rhel/

Note: ip_local_port_range and ip_local_reserved_ports settings are independent and both are considered by the kernel when determining which ports are available for automatic port assignments.

graph LR
InitContainer --> TrainingContainer
InitContainer --> SidecarContainer

InitContainer and SidecarContainer act like system container and they are transparent to the TrainingContainer

TrainingJob(process) of user is running at TrainingContainer

we can do the init env action at InitContainer, such as download data, and the upload action can be done at SidecarContainer

however, there will be an engineering problem, that is, the file read permission problem. The best way is to make the InitC / SidecarC / TrainingC users (uid) the same

powered by mermaid

https://mermaid-js.github.io/mermaid/#/flowchart

https://theme-next.js.org/docs/tag-plugins/mermaid.html?highlight=mermaid

https://github.com/theme-next/hexo-theme-next/pull/649

https://blog.golang.org/context#:~:text=A%20Context%20is%20safe%20for,to%20signal%20all%20of%20them.

A Context is safe for simultaneous use by multiple goroutines. Code can pass a single Context to any number of goroutines and cancel that Context to signal all of them.

project structure

1
2
3
4
5
6
7
8
9
10
11
.
├── cmd
│   └── command.go
├── go.mod
├── go.sum
├── main.go
└── pkg
└── run
└── long_run_cli.go

3 directories, 5 files

main.go

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
package main

import (
"context"
"os"
"os/signal"
"syscall"

"zs/toolkit-cli/cmd"
)

func main() {
c := make(chan os.Signal, 2)
signal.Notify(c, syscall.SIGINT, syscall.SIGTERM)

ctx := context.Background()
ctx, cancel := context.WithCancel(ctx)

go func() {
select {
case <-c:
cancel()
}
}()

cmd.Execute(ctx)
}

command.go

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
package cmd

import (
"context"
"fmt"
"os"
"os/exec"

"github.com/spf13/cobra"

"zs/toolkit-cli/pkg/run"
)

var rootCmd = &cobra.Command{
Use: "long run cli",
Run: func(cmd *cobra.Command, args []string) {
cli := run.New()
err := cli.LongRun(cmd.Context())

if err != nil {
fmt.Printf("cli run err: %v\n", err)
if exitError, ok := err.(*exec.ExitError); ok {
fmt.Printf("exit code: %d\n", exitError.ExitCode())
}
}
},
}

func Execute(ctx context.Context) {
if err := rootCmd.ExecuteContext(ctx); err != nil {
fmt.Printf("err: %v\n", err)
os.Exit(1)
}
}

long_run_cli.go

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
package run

import (
"context"
"os/exec"
)

type CLI struct {

}

func (cli CLI) LongRun(ctx context.Context) error {
cmd := exec.CommandContext(ctx, "sleep", "30")
return cmd.Run()
}

func New() *CLI {
return &CLI{}
}

https://pkg.go.dev/os/exec#CommandContext

The provided context is used to kill the process (by calling os.Process.Kill) if the context becomes done before the command completes on its own.

https://github.com/golang/go/issues/21135

proposal: os/exec: allow user of CommandContext to specify the kill signal when context is done

commandContext will trigger SIGKILL when the ctx is done …

get exit code of pipeline command in background

https://stackoverflow.com/questions/37257668/get-exit-code-of-a-piped-background-process

https://stackoverflow.com/questions/35842600/tee-resets-exit-status-is-always-0

1
2
3
4
5
6
7
8
9
someCommand="python test.py"

{
${someCommand} 2>&1 | tee -a training.log
exit ${PIPESTATUS[0]}
} &

wait $!
echo $?

回显

1
127

综上 wait 即使指定的是 pid,然而内部代码实现依然会 wait pid 对应的 job,这点 wait 的文档里边说的比较隐晦

https://www.gnu.org/software/bash/manual/html_node/Job-Control-Builtins.html

Wait until the child process specified by each process ID pid or job specification jobspec exits and return the exit status of the last command waited for.

注意 return the exit status of the last command waited for

所以上述代码,wait 命令实际上获取到的是 tee 命令的退出码

在 shell 中获取 pipeline command status 的简易方法似乎只能通过 ${PIPESTATUS[0]}

get pid of pipeline command in background

进一步的,我们想获取 someCommand 的 pid,有办法么,尝试做如下改造

1
2
3
4
5
6
7
8
9
10
11
someCommand="python test.py"

{
${someCommand} 2>&1 &
pid_someCommand=$!
wait ${pid_someCommand}
exit $?
} | tee -a training.log &

wait $!
echo ${PIPESTATUS[0]}

回显

1
0

but not work

最后只能使用 ps -ef | grep someCommand 的终极大法,加上通过 subshell pid 作为 parent id 过滤

1
2
3
4
5
6
7
8
9
10
11
12
13
someCommand="python test.py"

{
${someCommand} 2>&1 | tee -a training.log
exit ${PIPESTATUS[0]}
} &
someCommand_job_pid=$!

someCommand_pid=`ps -efj | awk -v parent_pid=${someCommand_job_pid} '$3==parent_pid { print $0 }' | grep "${someCommand}" | awk '{ print $2 }'`
echo someCommand_pid ${someCommand_pid}

wait ${someCommand_job_pid}
echo $?

回显

1
2
someCommand_pid 55863
127

test.py

1
2
3
4
5
import time
import sys

time.sleep(5)
sys.exit(127)

log output format sample

1
2
INFO[2021-07-04 15:26:26]main.go:28 have a nice day                               zs=log
INFO[2021-07-04 15:26:26]main.go:29 zs gogogo zs=log

code sample

show timestamp

the meaning of [0000]

add common prefix

have a little overhead, add filename and line number

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
package main

import (
"path"
"runtime"
"strconv"

"github.com/sirupsen/logrus"
)

func main() {
var log = logrus.New()

formatter := &logrus.TextFormatter{
FullTimestamp: true,
TimestampFormat: "2006-01-02 15:04:05",
CallerPrettyfier: func(f *runtime.Frame) (string, string) {
_, filename := path.Split(f.File)
// do not log func name
return "", filename + ":" + strconv.Itoa(f.Line)
},
}
log.SetFormatter(formatter)
log.SetReportCaller(true)

contextLogger := log.WithField("zs", "log")

contextLogger.Info("have a nice day")
contextLogger.Infof("%s gogogo", "zs")
}

third-party formatter

https://github.com/sirupsen/logrus#formatters

log output format sample

1
2
[2021-07-04 15:50:26]  INFO log: have a nice day
[2021-07-04 15:50:26] INFO log: zs gogogo

code sample

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
package main

import (
"github.com/sirupsen/logrus"
prefixed "github.com/x-cray/logrus-prefixed-formatter"
)

func main() {
var log = logrus.New()

formatter := &prefixed.TextFormatter{
FullTimestamp: true,
TimestampFormat: "2006-01-02 15:04:05",
}
log.Formatter = formatter

contextLogger := log.WithField("prefix", "log")

contextLogger.Info("have a nice day")
contextLogger.Infof("%s gogogo", "zs")
}

as previous code show

1
contextLogger := log.WithField("prefix", "log")

u can prefix a log key and colon before the msg output

Download

https://rsync.samba.org/

最新版本:Rsync version 3.2.3 released

How rsync works

https://rsync.samba.org/how-rsync-works.html

Guide

https://download.samba.org/pub/rsync/rsync.html

  • --recursive: recurse into directories
  • --append: append data onto shorter files
  • --filter
1
/usr/local/Cellar/rsync/3.2.3/bin/rsync --verbose --no-whole-file --recursive --append --include='*.log' --include='*/' --exclude='*' --prune-empty-dirs dir1/ dir2/

注意 rsync 本地目录的特殊之处

https://superuser.com/questions/234273/why-doest-rsync-use-delta-transfer-for-local-files

–whole-file, This is the default when both the source and destination are specified as local paths, but only if no batch-writing option is in effect.

High Availability

https://unix.stackexchange.com/questions/48298/can-rsync-resume-after-being-interrupted

https://community.mellanox.com/s/article/in-between-ethernet-vlans-and-infiniband-pkeys

https://community.mellanox.com/s/article/howto-use-infiniband-pkey-membership-types-in-virtualization-environment--connectx-3--connectx-3-pro-x

https://community.mellanox.com/s/article/howto-configure-ipoib-networks-with-gateway-and-multiple-pkeys

https://community.mellanox.com/s/article/HowTo-Configure-SR-IOV-for-ConnectX-4-ConnectX-5-ConnectX-6-with-KVM-Ethernet

https://github.com/Mellanox/k8s-rdma-sriov-dev-plugin

https://github.com/mellanox/k8s-rdma-shared-dev-plugin

https://docs.openshift.com/container-platform/4.6/networking/hardware_networks/add-pod.html#add-pod

IOV: I/O Virtualization

Single Root I/O Virtualization (SR-IOV) network

https://docs.openshift.com/container-platform/4.6/networking/hardware_networks/about-sriov.html

https://github.com/k8snetworkplumbingwg/sriov-cni

https://docs.mellanox.com/display/MLNXOFEDv461000/Kubernetes%20Using%20SR-IOV

https://community.mellanox.com/s/article/kubernetes-ipoib-sriov-networking-with-connectx4-connectx5

Type size

https://golang.org/ref/spec#Size_and_alignment_guarantees

https://github.com/ardanlabs/gotraining-studyguide/blob/master/go/language/struct.go

1
2
3
4
5
type example struct {
flag bool
counter int16
pi float32
}

字节对齐系数 #pragma pack(n)

  • 成员对齐
  • 结构体对齐

对齐系数规则

  1. For a variable x of any type: unsafe.Alignof(x) is at least 1.
  2. For a variable x of struct type: unsafe.Alignof(x) is the largest of all the values unsafe.Alignof(x.f) for each field f of x, but at least 1.
  3. For a variable x of array type: unsafe.Alignof(x) is the same as the alignment of a variable of the array’s element type.

layout

  • bool(0)
  • int16(2)
  • float32(4)

8 bytes

https://eddycjy.gitbook.io/golang/di-1-ke-za-tan/go-memory-align

//TODO list