26 Linux 网络框架
Linux内核网络栈(Network Stack)是一个复杂且分层的系统,用于处理网络数据包从物理层到应用层的传输。
网络栈框图如下:
+-------------------------------------------------------------+
| 应用层 |
| (如 HTTP, FTP, DNS, SMTP, SSH,提供网络服务和通信接口) |
+-------------------------------------------------------------+
↑
|
+-------------------------------------------------------------+
| 表示层 |
| (如 SSL/TLS, JPEG, MPEG, ASCII,负责数据加密、压缩和格式化) |
+-------------------------------------------------------------+
↑
|
+-------------------------------------------------------------+
| 会话层 |
| (如 NetBIOS, PPTP,负责会话的建立、管理与终止) |
+-------------------------------------------------------------+
↑
|
+-------------------------------------------------------------+
| 传输层 |
| (如 TCP, UDP,负责端到端的数据传输,确保可靠性) |
+-------------------------------------------------------------+
↑
|
+-------------------------------------------------------------+
| 网络层 |
| (如 IPv4, IPv6, ICMP, ARP,负责路由选择和跨网络的数据转发) |
+-------------------------------------------------------------+
↑
|
+-------------------------------------------------------------+
| 数据链路层 |
| (如 Ethernet, Frame Relay, PPP,负责节点间数据传输) |
+-------------------------------------------------------------+
↑
|
+-------------------------------------------------------------+
| 物理层 |
| (如 Ethernet Physical Layer, 光纤, 同轴电缆,负责比特传输) |
+-------------------------------------------------------------+
26.1 物理层 (Physical Layer)
物理层主要负责处理网络接口的硬件信号,这部分涉及到PHY(Physical Layer Device)芯片和驱动程序。
PHY 层主要涉及的内容:
PHY层重要的数据结构:
struct phy_device:它是内核中用于表示PHY设备的主要数据结构。PHY设备的状态、功能和能力都存储在这个结构中。
/**
* struct phy_device - An instance of a PHY
*
* @mdio: MDIO bus this PHY is on
* @drv: Pointer to the driver for this PHY instance
* @phy_id: UID for this device found during discovery
* @c45_ids: 802.3-c45 Device Identifiers if is_c45.
* @is_c45: Set to true if this PHY uses clause 45 addressing.
* @is_internal: Set to true if this PHY is internal to a MAC.
* @is_pseudo_fixed_link: Set to true if this PHY is an Ethernet switch, etc.
* @is_gigabit_capable: Set to true if PHY supports 1000Mbps
* @has_fixups: Set to true if this PHY has fixups/quirks.
* @suspended: Set to true if this PHY has been suspended successfully.
* @suspended_by_mdio_bus: Set to true if this PHY was suspended by MDIO bus.
* @sysfs_links: Internal boolean tracking sysfs symbolic links setup/removal.
* @loopback_enabled: Set true if this PHY has been loopbacked successfully.
* @downshifted_rate: Set true if link speed has been downshifted.
* @state: State of the PHY for management purposes
* @dev_flags: Device-specific flags used by the PHY driver.
* @irq: IRQ number of the PHY's interrupt (-1 if none)
* @phy_timer: The timer for handling the state machine
* @phylink: Pointer to phylink instance for this PHY
* @sfp_bus_attached: Flag indicating whether the SFP bus has been attached
* @sfp_bus: SFP bus attached to this PHY's fiber port
* @attached_dev: The attached enet driver's device instance ptr
* @adjust_link: Callback for the enet controller to respond to changes: in the
* link state.
* @phy_link_change: Callback for phylink for notification of link change
* @macsec_ops: MACsec offloading ops.
*
* @speed: Current link speed
* @duplex: Current duplex
* @port: Current port
* @pause: Current pause
* @asym_pause: Current asymmetric pause
* @supported: Combined MAC/PHY supported linkmodes
* @advertising: Currently advertised linkmodes
* @adv_old: Saved advertised while power saving for WoL
* @lp_advertising: Current link partner advertised linkmodes
* @eee_broken_modes: Energy efficient ethernet modes which should be prohibited
* @autoneg: Flag autoneg being used
* @link: Current link state
* @autoneg_complete: Flag auto negotiation of the link has completed
* @mdix: Current crossover
* @mdix_ctrl: User setting of crossover
* @interrupts: Flag interrupts have been enabled
* @interface: enum phy_interface_t value
* @skb: Netlink message for cable diagnostics
* @nest: Netlink nest used for cable diagnostics
* @ehdr: nNtlink header for cable diagnostics
* @phy_led_triggers: Array of LED triggers
* @phy_num_led_triggers: Number of triggers in @phy_led_triggers
* @led_link_trigger: LED trigger for link up/down
* @last_triggered: last LED trigger for link speed
* @master_slave_set: User requested master/slave configuration
* @master_slave_get: Current master/slave advertisement
* @master_slave_state: Current master/slave configuration
* @mii_ts: Pointer to time stamper callbacks
* @lock: Mutex for serialization access to PHY
* @state_queue: Work queue for state machine
* @shared: Pointer to private data shared by phys in one package
* @priv: Pointer to driver private data
*
* interrupts currently only supports enabled or disabled,
* but could be changed in the future to support enabling
* and disabling specific interrupts
*
* Contains some infrastructure for polling and interrupt
* handling, as well as handling shifts in PHY hardware state
*/
struct phy_device {
struct mdio_device mdio;
/* Information about the PHY type */
/* And management functions */
struct phy_driver *drv;
u32 phy_id;
struct phy_c45_device_ids c45_ids;
unsigned is_c45:1;
unsigned is_internal:1;
unsigned is_pseudo_fixed_link:1;
unsigned is_gigabit_capable:1;
unsigned has_fixups:1;
unsigned suspended:1;
unsigned suspended_by_mdio_bus:1;
unsigned sysfs_links:1;
unsigned loopback_enabled:1;
unsigned downshifted_rate:1;
unsigned autoneg:1;
/* The most recently read link state */
unsigned link:1;
unsigned autoneg_complete:1;
/* Interrupts are enabled */
unsigned interrupts:1;
enum phy_state state;
u32 dev_flags;
phy_interface_t interface;
/*
* forced speed & duplex (no autoneg)
* partner speed & duplex & pause (autoneg)
*/
int speed;
int duplex;
int port;
int pause;
int asym_pause;
u8 master_slave_get;
u8 master_slave_set;
u8 master_slave_state;
/* Union of PHY and Attached devices' supported link modes */
/* See ethtool.h for more info */
__ETHTOOL_DECLARE_LINK_MODE_MASK(supported);
__ETHTOOL_DECLARE_LINK_MODE_MASK(advertising);
__ETHTOOL_DECLARE_LINK_MODE_MASK(lp_advertising);
/* used with phy_speed_down */
__ETHTOOL_DECLARE_LINK_MODE_MASK(adv_old);
/* Energy efficient ethernet modes which should be prohibited */
u32 eee_broken_modes;
#ifdef CONFIG_LED_TRIGGER_PHY
struct phy_led_trigger *phy_led_triggers;
unsigned int phy_num_led_triggers;
struct phy_led_trigger *last_triggered;
struct phy_led_trigger *led_link_trigger;
#endif
/*
* Interrupt number for this PHY
* -1 means no interrupt
*/
int irq;
/* private data pointer */
/* For use by PHYs to maintain extra state */
void *priv;
/* shared data pointer */
/* For use by PHYs inside the same package that need a shared state. */
struct phy_package_shared *shared;
/* Reporting cable test results */
struct sk_buff *skb;
void *ehdr;
struct nlattr *nest;
/* Interrupt and Polling infrastructure */
struct delayed_work state_queue;
struct mutex lock;
/* This may be modified under the rtnl lock */
bool sfp_bus_attached;
struct sfp_bus *sfp_bus;
struct phylink *phylink;
struct net_device *attached_dev;
struct mii_timestamper *mii_ts;
u8 mdix;
u8 mdix_ctrl;
void (*phy_link_change)(struct phy_device *phydev, bool up);
void (*adjust_link)(struct net_device *dev);
#if IS_ENABLED(CONFIG_MACSEC)
/* MACsec management functions */
const struct macsec_ops *macsec_ops;
#endif
ANDROID_KABI_RESERVE(1);
ANDROID_KABI_RESERVE(2);
ANDROID_KABI_RESERVE(3);
ANDROID_KABI_RESERVE(4);
};
state:表示PHY设备的状态(例如链接是否已建立)。
link:表示物理链路是否处于连接状态。
speed:当前连接的速率,如10Mbps、100Mbps或1000Mbps。
duplex:半双工或全双工模式。
phy_attach():内核使用这个函数将PHY设备与网卡(MAC)关联起来,注册PHY设备并为其创建设备文件。
phy_start() / phy_stop():这些函数用于启动或停止PHY设备,并处理链路状态变化。
数据从PHY层到MAC层的传递:
在PHY层完成对物理信号的解码后,比特流数据通过MII或GMII等接口传递到MAC层。PHY的驱动程序将物理层信息通知给MAC驱动,告诉它当前的链路状态(如是否链接成功、速率等),以便MAC层可以正确配置和处理接收到的帧。
物理介质(RJ45) ──> PHY设备 (电信号转换成比特流)──> RGMII ──> MAC驱动(比特流转换成) ──> Ethernet帧
26.2 数据链路层 (Data Link Layer)
MAC层驱动是用于管理网络接口控制器(NIC)的驱动,它负责数据的收发并与PHY层进行交互。Rockchip平台的MAC层驱动代码在: kernel/drivers/net/ethernet/stmicro/*。
26.2.1 MAC层驱动的主要部分
初始化与注册:驱动需要向内核注册设备,以便内核能够管理它。
接收数据处理:从PHY层接收到数据,并将数据传递给上层的网络协议栈。
发送数据处理:将来自上层的数据通过MAC层发送到PHY层,再传输到物理介质。
中断处理:处理网络接口卡的硬件中断,通常与数据的接收和发送密切相关。
26.2.2 注册过程
MAC驱动通过 net_device 结构向内核注册网络设备。注册步骤包括:
int register_netdev(struct net_device *dev);
该函数会将 net_device 结构注册到内核的网络子系统中,并初始化相关的网络接口。
26.2.3 关键数据结构
struct net_device:这是Linux内核中代表网络设备的核心数据结构,包含了设备的基本信息和功能接口。它定义了设备的各种操作函数,如发送、接收、初始化、关闭等。
struct net_device {
char name[IFNAMSIZ];
unsigned char dev_addr[ETH_ALEN]; // MAC地址
int (*open)(struct net_device *dev); // 打开设备
int (*stop)(struct net_device *dev); // 关闭设备
netdev_tx_t (*hard_start_xmit)(struct sk_buff *skb, struct net_device *dev); // 发送数据
int (*ndo_do_ioctl)(struct net_device *dev, struct ifreq *ifr, int cmd); // I/O控制
// other fields ...
};
struct sk_buff:Linux中用来存储和处理网络数据包的结构,代表了数据包的封装。所有从MAC层接收到或要发送到MAC层的数据都是通过 sk_buff 结构来管理的。
struct sk_buff {
struct net_device *dev; // 关联的网络设备
unsigned char *data; // 数据指针
unsigned int len; // 数据长度
// other fields ...
};
struct phy_device:表示物理层设备(PHY)的结构体。MAC层驱动通过这个结构体与PHY层交互。
struct phy_device {
struct mii_bus *bus; // 关联的MII总线
int phy_id; // PHY的唯一标识
int link; // 链接状态
// other fields ...
};
26.2.4 数据接收
1.) 从PHY接收数据:
在MAC层驱动中,有接收中断或轮询机制来检测是否有新数据到达。
接收到的数据会被填充到 sk_buff 结构中,这些数据通常是以太网帧格式(包含MAC地址、帧类型、数据负载、CRC等)
Ethernet data frame
+-------------------+------------------+---------+--------------+------------+-----------------------+
| Destination MAC | Source MAC | Type | Payload Data | Padding | Frame Check |
|-------------------|------------------|---------|--------------|------------|-----------------------|
| Address (6 bytes) | Address (6 bytes)| 2 bytes |46-1500 bytes | 0-46 bytes | Sequence (FCS) 4 bytes|
+-------------------+------------------+---------+--------------+------------+-----------------------+
整个以太网帧的最大长度为 1518字节(含FCS)。
2.)传递数据到上层
当MAC层驱动接收到数据后,它会通过 netif_rx(skb) 函数将数据包传递给内核网络协议栈处理。
netif_rx(skb); // 将数据包传递到协议栈
netif_rx(skb) 是异步的, skb 会被放入一个 软中断(softirq) 上下文中,具体是 NET_RX_SOFTIRQ 队列。
一旦数据包被加入 NET_RX_SOFTIRQ 队列,Linux内核将触发软中断,网络栈会调用软中断处理程序 net_rx_action 处理队列中的所有数据包。
当上层协议栈(如IP层、传输层TCP/UDP,最终到用户空间)处理完数据包后,skb 的内存将被释放。
3.)接收数据代码
static void stmmac_napi_add(struct net_device *dev)
{
struct stmmac_priv *priv = netdev_priv(dev);
u32 queue, maxq;
maxq = max(priv->plat->rx_queues_to_use, priv->plat->tx_queues_to_use);
for (queue = 0; queue < maxq; queue++) {
struct stmmac_channel *ch = &priv->channel[queue];
int rx_budget = ((priv->plat->dma_rx_size < NAPI_POLL_WEIGHT) &&
(priv->plat->dma_rx_size > 0)) ?
priv->plat->dma_rx_size : NAPI_POLL_WEIGHT;
int tx_budget = ((priv->plat->dma_tx_size < NAPI_POLL_WEIGHT) &&
(priv->plat->dma_tx_size > 0)) ?
priv->plat->dma_tx_size : NAPI_POLL_WEIGHT;
ch->priv_data = priv;
ch->index = queue;
spin_lock_init(&ch->lock);
if (queue < priv->plat->rx_queues_to_use) {
netif_napi_add(dev, &ch->rx_napi, stmmac_napi_poll_rx,
rx_budget);
}
if (queue < priv->plat->tx_queues_to_use) {
netif_tx_napi_add(dev, &ch->tx_napi,
stmmac_napi_poll_tx, tx_budget);
}
}
}
该函数的主要功能是为网络设备的接收和发送队列分别添加 NAPI(New API) 处理程序,NAPI 是 Linux 网络子系统中的一种机制,用于通过轮询的方式处理网络数据包,以减少中断压力,特别是在高负载情况下。 通过 netif_napi_add 和 netif_tx_napi_add,分别为接收队列和发送队列注册 NAPI 轮询处理函数,实现了轮询机制。
26.2.5 发送数据
1.) 发送数据的流程
2.) 发送数据的关键函数
在 net_device 中定义了发送数据的函数指针 ndo_start_xmit(),当网络层需要发送数据时,它会调用这个函数。
MAC层驱动主代码kernel/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c, 网络注册函数如下:
static const struct net_device_ops stmmac_netdev_ops = {
.ndo_open = stmmac_open,
.ndo_start_xmit = stmmac_xmit,
.ndo_stop = stmmac_release,
.ndo_change_mtu = stmmac_change_mtu,
.ndo_fix_features = stmmac_fix_features,
.ndo_set_features = stmmac_set_features,
.ndo_set_rx_mode = stmmac_set_rx_mode,
.ndo_tx_timeout = stmmac_tx_timeout,
.ndo_do_ioctl = stmmac_ioctl,
.ndo_setup_tc = stmmac_setup_tc,
.ndo_select_queue = stmmac_select_queue,
#ifdef CONFIG_NET_POLL_CONTROLLER
.ndo_poll_controller = stmmac_poll_controller,
#endif
.ndo_set_mac_address = stmmac_set_mac_address,
.ndo_vlan_rx_add_vid = stmmac_vlan_rx_add_vid,
.ndo_vlan_rx_kill_vid = stmmac_vlan_rx_kill_vid,
};
26.3 总结
其它网络层TCP/IP及应用层, 在此就不讨论。