openmpi
https://github.com/open-mpi/ompi/issues/6691#issuecomment-497245597
ess_hnp_module.c
1 | orte_set_attribute(&transports, ORTE_RML_TRANSPORT_TYPE, ORTE_ATTR_LOCAL, orte_mgmt_transport, OPAL_STRING); |
rml: resource message layer
根据 attr 选择 rtmod, 返回的为 mod 在 array 中的 index
Open conduit - call each component and see if they can provide a
conduit that can satisfy all these attributes - return the conduit id
(a negative value indicates error)
rml_base_stubs.c
1 | orte_rml_API_open_conduit |
遍历 active 的 rml mod, 调用各个 rml mod 的 open_conduit, 例如 oob (rml_oob_component.c), 返回 mod 之后,存入 array,返回 array index
1 | open_conduit() |
从 attr 里获取 key 值,即 orte_mgmt_transport or orte_coll_transport, 确定是否指定了 oob
继续获取 routed mod
1 | orte_get_attribute(attributes, ORTE_RML_ROUTED_ATTRIB, (void**)&comp_attrib, OPAL_STRING); |
根据设置的 routed mod (NULL 则按优先级) 分配 routed mod
1 | md->routed = orte_routed.assign_module(comp_attrib); |
orte_mca_params.c
1 | orte_mgmt_transport = "oob" // default |
plm_base_launch_support.c
1 | param = NULL; |
orte_node_pool
node_regex_threshold
= 1024 default len
Resource Allocation Subsystem (RAS)