服务平面

如果你要接入 Fluxon 的 KV,除了业务进程本身,还需要先理解一组固定的服务平面对象。它们负责控制面元数据、共享内存池,以及 KV 进程的启动编排。

从用户视角,最常直接接触的对象有三类:

  • 外部依赖:etcdgreptimeTiKV
  • Fluxon 自有角色:masterowner
  • 启动入口:etcd / greptime / TiKV 原始运行时、fluxon_py.runtime、以及你自己的 supervisor / 启动脚本

如果你写的是业务进程,这一页回答的问题是:哪些进程必须先起来、它们之间是什么关系、哪些对象可以由 runtime 拉起,哪些不可以。具体 API 见 用户 - 3 - KV 和 RPC 接口

角色关系

服务平面可以先收束成 5 个稳定对象:

  • 外部依赖:etcd
  • 外部依赖:greptime
  • 外部依赖:TiKV
  • Fluxon 自有角色:master
  • Fluxon 自有角色:owner

它们的部署位置如下:

其中:

  • etcd 负责控制面元数据存储
  • greptime 负责标准监控链路
  • TiKV 负责需要持久任务状态的扩展能力;当前最直接的用户可见场景是 FS 目录传输和预扫描使用的 transfer_state_store
  • master 负责成员关系、路由、租约、监控广播和 master 侧日志
  • owner 负责在本机提供共享内存池和 shared.json

最小启动顺序

KV 复用同一套底座,最小启动链路如下:

  • KV:greptime -> etcd -> fluxonkv master -> owner -> 业务进程 new_store(...)

如果要启用目录传输、预扫描这类需要持久任务状态的能力,还要额外准备 TiKV 这条扩展链路:

  • Transfer / Pre-Scan:TiKV PD -> TiKV -> fs master transfer_state_store

这里有一个硬边界:

  • etcdgreptimeTiKV 是外部依赖
  • masterowner 是 Fluxon 自己的角色

如果控制面没起来,master 不可用;如果 owner 没起来,业务侧 FluxonKvClientConfig({...}) -> new_store(...) 也附着不到共享内存池。TiKV 不影响最小 KV 读写链路,但会影响依赖 transfer_state_store 的功能。当前文档仍以 etcd 为基础控制面权威对象;后续如果控制面统一迁移到 TiKV,这一页再整体收束。

启动入口

启动 etcdgreptimeTiKV

先按 用户 - 0 - 安装 准备 etcd / greptime / TiKV 运行时包,并确认已经解压出:

  • ext_images/etcd/etcd
  • ext_images/etcd/etcdctl
  • ext_images/etcd/start.sh
  • ext_images/greptime/greptime
  • ext_images/greptime/start.sh
  • ext_images/tikv/pd-server
  • ext_images/tikv/tikv-server
  • ext_images/tikv/start_pd.sh
  • ext_images/tikv/start_tikv.sh

这些启动脚本都使用同一条契约:

  • --config/-c:Shell 配置文件
  • --workdir/-w:本地工作目录

etcd 的配置文件需要定义 ETCD_ARGS 这个 bash array,例如:

cat > /tmp/etcd.config.sh <<'EOF'
ETCD_ARGS=(
  --data-dir "$WORKDIR/etcd-data"
  --name etcd0
  --advertise-client-urls "http://127.0.0.1:2379"
  --listen-client-urls "http://0.0.0.0:2379"
  --listen-peer-urls "http://0.0.0.0:2380"
  --initial-advertise-peer-urls "http://127.0.0.1:2380"
  --initial-cluster "etcd0=http://127.0.0.1:2380"
  --initial-cluster-token "etcd-cluster"
  --initial-cluster-state "new"
  --auto-compaction-retention=1
)
EOF
 
bash ./ext_images/etcd/start.sh \
  --config /tmp/etcd.config.sh \
  --workdir /tmp/fluxon_service_plane_demo/etcd

greptime 的配置文件需要定义 GREPTIME_ARGS 这个 bash array,例如:

cat > /tmp/greptime.config.sh <<'EOF'
GREPTIME_ARGS=(
  standalone start
  --data-home "$WORKDIR/greptimedb"
  --http-addr 0.0.0.0:34030
)
EOF
 
bash ./ext_images/greptime/start.sh \
  --config /tmp/greptime.config.sh \
  --workdir /tmp/fluxon_service_plane_demo/greptime

TiKV 运行时分成两个进程:PD 和 TiKV。启动脚本分别是 ext_images/tikv/start_pd.shext_images/tikv/start_tikv.sh。它们同样使用 --config/-c--workdir/-w

PD 配置文件需要定义 PD_ARGS 这个 bash array,例如:

cat > /tmp/pd.config.sh <<'EOF'
PD_ARGS=(
  --name pd0
  --data-dir "$WORKDIR/pd-data"
  --client-urls "http://127.0.0.1:12379"
  --advertise-client-urls "http://127.0.0.1:12379"
  --peer-urls "http://127.0.0.1:12380"
  --advertise-peer-urls "http://127.0.0.1:12380"
  --initial-cluster "pd0=http://127.0.0.1:12380"
  --log-file "$WORKDIR/pd.log"
)
EOF
 
bash ./ext_images/tikv/start_pd.sh \
  --config /tmp/pd.config.sh \
  --workdir /tmp/fluxon_service_plane_demo/tikv_pd

TiKV 配置文件需要定义 TIKV_ARGS 这个 bash array,例如:

cat > /tmp/tikv.config.sh <<'EOF'
TIKV_ARGS=(
  --pd-endpoints "127.0.0.1:12379"
  --addr "127.0.0.1:20160"
  --advertise-addr "127.0.0.1:20160"
  --status-addr "127.0.0.1:20180"
  --data-dir "$WORKDIR/tikv-data"
  --log-file "$WORKDIR/tikv.log"
)
EOF
 
bash ./ext_images/tikv/start_tikv.sh \
  --config /tmp/tikv.config.sh \
  --workdir /tmp/fluxon_service_plane_demo/tikv

etcdgreptimeTiKV 都属于外部依赖。它们不由 fluxon_py.runtime 拉起。最小 KV 基础链路要求 etcd / greptimemaster 启动前 ready;如果要启用目录传输或预扫描,还要求对应的 TiKV PD / TiKV 也先 ready。

fluxon_py.runtime

fluxon_py.runtime 只负责 Fluxon 自己的角色,不负责替代 etcdgreptimeTiKV

最常用的运行时入口有:

  • start_kv_master_process(config=...)
  • start_owner_kvclient_process(config=...)

这些入口还有一个可选参数:

  • log_path=...

它控制的是 Python wrapper 子进程的 stdout/stderr 落点,不是服务自身配置文件里的业务日志目录。

如果你是安装好的 wheel 用户,优先直接使用这些 Python 入口,并直接传 Python dict;不要依赖 examples/ 目录里的脚本路径。

对应示例脚本:examples/start_master_owner.py

这个脚本支持两种启动方式:

  • 默认方式:启动 master + owner
  • --without-master:只启动 owner,接入已经存在的 KV 集群 master
#!/usr/bin/env python3
 
import argparse
 
from pathlib import Path
 
from fluxon_py.runtime import (
    start_kv_master_process,
    start_owner_kvclient_process,
    wait_subproc_or_ctrlc,
)
from fluxon_py.runtime.process_runner import ManagedSubprocess
 
ETCD_ENDPOINT = "127.0.0.1:2379"
GREPTIME_HTTP_PORT = 34030
GREPTIME_BASE_URL = f"http://127.0.0.1:{GREPTIME_HTTP_PORT}"
CLUSTER_NAME = "demo-kv-cluster"
SHARED_MEMORY_PATH = Path("/dev/shm/fluxon_kv_demo").resolve()
SHARED_FILE_PATH = Path("/tmp/fluxon_kv_demo/shared").resolve()
WORKDIR = Path("/tmp/fluxon_kv_demo/runtime").resolve()
MASTER_PORT = 31000
MASTER_INSTANCE_KEY = "demo_kv_master"
OWNER_INSTANCE_KEY = "demo_kv_owner"
OWNER_DRAM_BYTES = 1073741824
 
 
def main() -> None:
    args = parse_args()
    SHARED_FILE_PATH.mkdir(parents=True, exist_ok=True)
    log_dir = (WORKDIR / "log").resolve()
 
    if args.with_master:
        master_log_dir = (WORKDIR / "master_logs").resolve()
        master_log_dir.mkdir(parents=True, exist_ok=True)
        master_stdout_log = log_dir / "master.log"
        master_proc = start_kv_master_process(
            config=build_master_config(log_dir=master_log_dir),
            log_path=master_stdout_log,
        )
    else:
        master_stdout_log = None
        master_proc = None
 
    owner_stdout_log = log_dir / "owner.log"
    owner_proc = start_owner_kvclient_process(
        config=build_owner_config(),
        log_path=owner_stdout_log,
    )
    children = []
    if master_proc is not None:
        children.append(
            ManagedSubprocess(
                label="master",
                proc=master_proc,
            )
        )
    children.append(
        ManagedSubprocess(
            label="owner",
            proc=owner_proc,
        )
    )
 
    print(f"[fluxon_kv] shared memory path: {SHARED_MEMORY_PATH}")
    print(f"[fluxon_kv] shared file path: {SHARED_FILE_PATH}")
    print(f"[fluxon_kv] etcd endpoint: {ETCD_ENDPOINT}")
    print(f"[fluxon_kv] greptime base url: {GREPTIME_BASE_URL}")
    print(f"[fluxon_kv] start master in this script: {args.with_master}")
    if master_stdout_log is not None:
        print(f"[fluxon_kv] master stdout log: {master_stdout_log}")
    else:
        print("[fluxon_kv] master stdout log: disabled by --without-master")
    print(f"[fluxon_kv] owner stdout log: {owner_stdout_log}")
    stack_label = "master and owner" if args.with_master else "owner"
    print(f"[fluxon_kv] waiting for Ctrl-C to stop {stack_label}")
    wait_subproc_or_ctrlc(
        children,
        on_ctrlc=lambda: print(f"[fluxon_kv] caught Ctrl-C, stopping {stack_label}"),
    )
 
 
def parse_args() -> argparse.Namespace:
    parser = argparse.ArgumentParser(description="Start KV demo owner, optionally with a local master")
    group = parser.add_mutually_exclusive_group()
    group.add_argument(
        "--with-master",
        dest="with_master",
        action="store_true",
        help="Start a local kv master in this script (default)",
    )
    group.add_argument(
        "--without-master",
        dest="with_master",
        action="store_false",
        help="Do not start a local kv master; only start owner and attach to an existing cluster master",
    )
    parser.set_defaults(with_master=True)
    return parser.parse_args()
 
 
def build_master_config(*, log_dir: Path) -> dict:
    return {
        "instance_key": MASTER_INSTANCE_KEY,
        "cluster_name": CLUSTER_NAME,
        "port": MASTER_PORT,
        "etcd_endpoints": [ETCD_ENDPOINT],
        "log_dir": str(log_dir),
        "monitoring": {
            "prometheus_base_url": f"{GREPTIME_BASE_URL}/v1/prometheus",
            "prom_remote_write_url": [f"{GREPTIME_BASE_URL}/v1/prometheus/write"],
            "otlp_log_api": {
                "otlp_endpoint": f"{GREPTIME_BASE_URL}/v1/otlp/v1/logs",
            },
        },
    }
 
 
def build_owner_config() -> dict:
    return {
        "instance_key": OWNER_INSTANCE_KEY,
        "contribute_to_cluster_pool_size": {
            "dram": OWNER_DRAM_BYTES,
            "vram": {},
        },
        "fluxonkv_spec": {
            "etcd_addresses": [ETCD_ENDPOINT],
            "cluster_name": CLUSTER_NAME,
            "shared_memory_path": str(SHARED_MEMORY_PATH),
            "shared_file_path": str(SHARED_FILE_PATH),
            "sub_cluster": "default",
        },
    }
 
 
if __name__ == "__main__":
    main()

启动命令:

python3 examples/start_master_owner.py
python3 examples/start_master_owner.py --without-master

默认命令会启动本机 master + owner--without-master 只启动本机 owner,要求同一个 cluster_name 对应的 master 已经在别处运行。

如果用同一个 supervisor / example 脚本拉起多个角色,要求给每个角色分配独立 log_path,并在主进程里用 wait_subproc_or_ctrlc(...) 统一等待和收束。这样 Ctrl-C 时会按已拉起的子进程组一起停止,不会把服务平面子进程继续留在终端外面。

完整配置对象和字段语义,见 用户 - 3 - KV 和 RPC 接口

什么时候看哪一页