Bootstrap

Why “dev port show “ cmd‘s output have pci info over the old kernel

output of dev port show

  • in kernel 5.15
pci/0000:08:00.0/65535: type eth netdev eth2 flavour physical port 0 splittable false
  function:
    hw_addr 00:00:00:00:00:00
pci/0000:08:00.1/131071: type eth netdev eth3 flavour physical port 1 splittable false
  function:
    hw_addr 00:00:00:00:00:00
  • in kernel 6.13
auxiliary/mlx5_core.eth.0/65535: type eth netdev ens1f0np0 flavour physical port 0 splittable false

in two different kerenl , we have different output format. In old kernel , we have the pci information, but in new kernel , we don’t have the pci information

devlink callback fucntion

for devlink port show cmd, it will call kernel devlink framework functions to give the netlink feedbak

  • 6.13 kernel
    devlink port show ==> devlink_nl_port_get_dumpit()
    we use mlx5e_devlink_port_register() to register the port information

  • 5.15 kernel
    devlink port show ==> devlink_nl_cmd_port_get_dumpit()

    “devlink port show” will use the devlink framework function to finish the cmd. this command will not call the specific driver (mlx5e)'s struct devlink_ops (mlx5_devlink_ops) interfaces

Both of them will call function devlink_nl_put_handle to fill in the devlink port index information

static int devlink_nl_put_handle(struct sk_buff *msg, struct devlink *devlink)
{
	if (nla_put_string(msg, DEVLINK_ATTR_BUS_NAME, devlink->dev->bus->name))
		return -EMSGSIZE;
	if (nla_put_string(msg, DEVLINK_ATTR_DEV_NAME, dev_name(devlink->dev)))
		return -EMSGSIZE;
	return 0;
}

So that means, the “pci/0000:08:00.0/65535” and “auxiliary/mlx5_core.eth.0/65535” come from devlink->dev->bus->name and dev_name(devlink->dev). So the key structure is devlink->dev

devlink->dev

pci level devlink

devlink of mlx5_core dev is pci device level. and is created at pci_setup_device() which is called before probe_one()

=> pci_setup_device  set pci device name as BDF#
=> probe_one
=> dev_name(dev->device) = 0000:08:00.0 (mlx5_core_dev *dev)
=> devlink = mlx5_devlink_alloc(&pdev->dev);
=> devlink_alloc

devlink_alloc will set devlink->dev as **&pdev->dev **

auxiliary dev level devlink

In function _mlx5e_probe ,will create devlink port instance under devlink instance

HAVE_DEVLINK_PER_AUXDEV enable

New kernel will enable HAVE_DEVLINK_PER_AUXDEV flag. which means each auxiliary device has its own devlink instance

=> _mlx5e_probe
=> mlx5e_create_devlink(&adev->dev, mdev);
=> devlink_alloc(&mlx5e_devlink_ops, sizeof(*mlx5e_dev), dev);
=> devlink->dev = dev // set devlink->dev as adev->dev, dev_name is "mlx5_core.eth.0"

so, when we run “devlink port show” , the devlink_nl_put_handle will get per adev devlink instance and get the dev_name() as mlx5_core.eth.0 and pci type as auxiliary, that is why we don’t have pci information over the new kernel.

devlink_alloc alloc will set the private data of devlink as mlx5e_dev
_mlx5e_probe function will assosiate the adev with mlx5e_dev

auxiliary_set_drvdata(adev, mlx5e_dev);

register the devlink port over the per adev devlink instance

=> _mlx5e_probe
=> err = mlx5e_devlink_port_register(mlx5e_dev, mdev);

in mlx5e_devlink_port_register function, it will register devlink over the per adev devlink instance

    struct devlink *devlink = priv_to_devlink(mlx5e_dev);
	return devlink_port_register(devlink, &mlx5e_dev->dl_port,
				     dl_port_index);

BTW, the adev->dev and adev->dev->bus are initialized at auxiliary_device_init

int auxiliary_device_init(struct auxiliary_device *auxdev)
{
    struct device *dev = &auxdev->dev;

    dev->bus = &auxiliary_bus_type; // 设置总线类型为 auxiliary bus
    dev_set_name(dev, "%s.%d", auxdev->name, auxdev->id); // 设置设备名称

    return 0;
}

HAVE_DEVLINK_PER_AUXDEV disable

Old kernel will enable HAVE_DEVLINK_PER_AUXDEV flag. which means only pci level device(mlx5_core) has devlink instance
in mlx5e_devlink_port_register function, it will get devlink instance from mlx5_core_device(pci level)

struct devlink *devlink = priv_to_devlink(priv->mdev);

and register the devlink port with the pci level devlink

return devlink_port_register(devlink, dl_port, dl_port_index);

so, when we run “devlink port show” , the devlink_nl_put_handle will get pci level devlink instance and get the dev_name() as 0000:08:00.0 and pci type as pci, that is why we have pci information over the old kernel.

;