前言:
现时咱们对“linux系统网卡驱动配置”大概比较关心,你们都需要学习一些“linux系统网卡驱动配置”的相关知识。那么小编在网络上汇集了一些有关“linux系统网卡驱动配置””的相关资讯,希望小伙伴们能喜欢,朋友们一起来了解一下吧!为什么要在自制操作系统上写网卡驱动?请看这里:
如何在自制操作系统写网卡驱动程序(1)
那么今天,我们就开始第一步:看看其他操作系统上的网卡驱动是如何写的?
先看下linux操作系统中,是如何和网卡通信的。
在硬件加电初始化时,BIOS统一检查所有的PCI设备,并为每个设备分配一个物理地址,该地址通过BIOS获得并写到设备的配置空间内,驱动程序就可以将网卡的普通控制寄存器映射到一段内存空间内,CPU通过访问映射后的虚拟地址来操控网卡的寄存器。
当操作系统初始化时,其为每个PCI设备分配一个pci_dev结构,并将前面分配的物理地址写到pci_dev的resource字段中。
在网卡驱动程序中则可以通过读取pci_dev中的resource字段获得网卡的寄存器配置空间地址,其由函数pci_resource_start()和pci_resource_end()获得该空间的起始位置,通过ioremap()将该段位置映射到主存中,以便CPU访问控制网卡的I/O和内存空间。
调用pci_resource_start
ioremap( pci_resource_start(pdev, BAR_1), pci_resource_len(pdev, BAR_1) );
pci_resource_start只是一个宏定义:
/* * These helpers provide future and backwards compatibility * for accessing popular PCI BAR info */// 开始地址#define pci_resource_start(dev, bar) ((dev)->resource[(bar)].start)// 结束地址#define pci_resource_end(dev, bar) ((dev)->resource[(bar)].end)// 设置标志#define pci_resource_flags(dev, bar) ((dev)->resource[(bar)].flags)// 获取长度:(start==0 && start==end)?0:end-start+1;// 这句话是简写,展开后:// if(start==0 && start==end){ return 0;}else{ return end-start+1;}#define pci_resource_len(dev,bar) \ ((pci_resource_start((dev), (bar)) == 0 && \ pci_resource_end((dev), (bar)) == \ pci_resource_start((dev), (bar))) ? 0 : \ \ (pci_resource_end((dev), (bar)) - \ pci_resource_start((dev), (bar)) + 1))
它定义了对resouce结构体列表中的resource的start,end字段赋值的动作
struct resource resource[DEVICE_COUNT_RESOURCE]; /* I/O and memory regions + expansion ROMs */
/* * Resources are tree-like, allowing * nesting etc.. */struct resource { resource_size_t start; resource_size_t end; const char *name; unsigned long flags; unsigned long desc; struct resource *parent, *sibling, *child;};
在linux使用e1000网卡时,初始化的函数为e1000_probe.
这个函数执行完,网卡以及相关协议的结构体都完成配置,中断函数也完成了绑定,操作系统就可以收到网卡的中断信息了,所以这个函数里的代码时可以参考的。
/** * e1000_probe - Device Initialization Routine * @pdev: PCI device information struct * @ent: entry in e1000_pci_tbl * * Returns 0 on success, negative on failure * * e1000_probe initializes an adapter identified by a pci_dev structure. * The OS initialization, configuring of the adapter private structure, * and a hardware reset occur. **/static int e1000_probe(struct pci_dev *pdev, const struct pci_device_id *ent){ struct net_device *netdev; struct e1000_adapter *adapter = NULL; struct e1000_hw *hw; static int cards_found; static int global_quad_port_a; /* global ksp3 port a indication */ int i, err, pci_using_dac; u16 eeprom_data = 0; u16 tmp = 0; u16 eeprom_apme_mask = E1000_EEPROM_APME; int bars, need_ioport; bool disable_dev = false; /* do not allocate ioport bars when not needed */ need_ioport = e1000_is_need_ioport(pdev); if (need_ioport) { bars = pci_select_bars(pdev, IORESOURCE_MEM | IORESOURCE_IO); err = pci_enable_device(pdev); } else { bars = pci_select_bars(pdev, IORESOURCE_MEM); err = pci_enable_device_mem(pdev); } if (err) return err; err = pci_request_selected_regions(pdev, bars, e1000_driver_name); if (err) goto err_pci_reg; pci_set_master(pdev); err = pci_save_state(pdev); if (err) goto err_alloc_etherdev; err = -ENOMEM; netdev = alloc_etherdev(sizeof(struct e1000_adapter)); if (!netdev) goto err_alloc_etherdev; SET_NETDEV_DEV(netdev, &pdev->dev); pci_set_drvdata(pdev, netdev); adapter = netdev_priv(netdev); adapter->netdev = netdev; adapter->pdev = pdev; adapter->msg_enable = netif_msg_init(debug, DEFAULT_MSG_ENABLE); adapter->bars = bars; adapter->need_ioport = need_ioport; hw = &adapter->hw; hw->back = adapter; err = -EIO; hw->hw_addr = pci_ioremap_bar(pdev, BAR_0); if (!hw->hw_addr) goto err_ioremap; if (adapter->need_ioport) { for (i = BAR_1; i < PCI_STD_NUM_BARS; i++) { if (pci_resource_len(pdev, i) == 0) continue; if (pci_resource_flags(pdev, i) & IORESOURCE_IO) { hw->io_base = pci_resource_start(pdev, i); break; } } } /* make ready for any if (hw->...) below */ err = e1000_init_hw_struct(adapter, hw); if (err) goto err_sw_init; /* there is a workaround being applied below that limits * 64-bit DMA addresses to 64-bit hardware. There are some * 32-bit adapters that Tx hang when given 64-bit DMA addresses */ pci_using_dac = 0; if ((hw->bus_type == e1000_bus_type_pcix) && !dma_set_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(64))) { pci_using_dac = 1; } else { err = dma_set_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(32)); if (err) { pr_err("No usable DMA config, aborting\n"); goto err_dma; } } netdev->netdev_ops = &e1000_netdev_ops; e1000_set_ethtool_ops(netdev); netdev->watchdog_timeo = 5 * HZ; netif_napi_add(netdev, &adapter->napi, e1000_clean, 64); strncpy(netdev->name, pci_name(pdev), sizeof(netdev->name) - 1); adapter->bd_number = cards_found; /* setup the private structure */ err = e1000_sw_init(adapter); if (err) goto err_sw_init; err = -EIO; if (hw->mac_type == e1000_ce4100) { hw->ce4100_gbe_mdio_base_virt = ioremap(pci_resource_start(pdev, BAR_1), pci_resource_len(pdev, BAR_1)); if (!hw->ce4100_gbe_mdio_base_virt) goto err_mdio_ioremap; } if (hw->mac_type >= e1000_82543) { netdev->hw_features = NETIF_F_SG | NETIF_F_HW_CSUM | NETIF_F_HW_VLAN_CTAG_RX; netdev->features = NETIF_F_HW_VLAN_CTAG_TX | NETIF_F_HW_VLAN_CTAG_FILTER; } if ((hw->mac_type >= e1000_82544) && (hw->mac_type != e1000_82547)) netdev->hw_features |= NETIF_F_TSO; netdev->priv_flags |= IFF_SUPP_NOFCS; netdev->features |= netdev->hw_features; netdev->hw_features |= (NETIF_F_RXCSUM | NETIF_F_RXALL | NETIF_F_RXFCS); if (pci_using_dac) { netdev->features |= NETIF_F_HIGHDMA; netdev->vlan_features |= NETIF_F_HIGHDMA; } netdev->vlan_features |= (NETIF_F_TSO | NETIF_F_HW_CSUM | NETIF_F_SG); /* Do not set IFF_UNICAST_FLT for VMWare's 82545EM */ if (hw->device_id != E1000_DEV_ID_82545EM_COPPER || hw->subsystem_vendor_id != PCI_VENDOR_ID_VMWARE) netdev->priv_flags |= IFF_UNICAST_FLT; /* MTU range: 46 - 16110 */ netdev->min_mtu = ETH_ZLEN - ETH_HLEN; netdev->max_mtu = MAX_JUMBO_FRAME_SIZE - (ETH_HLEN + ETH_FCS_LEN); adapter->en_mng_pt = e1000_enable_mng_pass_thru(hw); /* initialize eeprom parameters */ if (e1000_init_eeprom_params(hw)) { e_err(probe, "EEPROM initialization failed\n"); goto err_eeprom; } /* before reading the EEPROM, reset the controller to * put the device in a known good starting state */ e1000_reset_hw(hw); /* make sure the EEPROM is good */ if (e1000_validate_eeprom_checksum(hw) < 0) { e_err(probe, "The EEPROM Checksum Is Not Valid\n"); e1000_dump_eeprom(adapter); /* set MAC address to all zeroes to invalidate and temporary * disable this device for the user. This blocks regular * traffic while still permitting ethtool ioctls from reaching * the hardware as well as allowing the user to run the * interface after manually setting a hw addr using * `ip set address` */ memset(hw->mac_addr, 0, netdev->addr_len); } else { /* copy the MAC address out of the EEPROM */ if (e1000_read_mac_addr(hw)) e_err(probe, "EEPROM Read Error\n"); } /* don't block initialization here due to bad MAC address */ memcpy(netdev->dev_addr, hw->mac_addr, netdev->addr_len); if (!is_valid_ether_addr(netdev->dev_addr)) e_err(probe, "Invalid MAC Address\n"); INIT_DELAYED_WORK(&adapter->watchdog_task, e1000_watchdog); INIT_DELAYED_WORK(&adapter->fifo_stall_task, e1000_82547_tx_fifo_stall_task); INIT_DELAYED_WORK(&adapter->phy_info_task, e1000_update_phy_info_task); INIT_WORK(&adapter->reset_task, e1000_reset_task); e1000_check_options(adapter); /* Initial Wake on LAN setting * If APM wake is enabled in the EEPROM, * enable the ACPI Magic Packet filter */ switch (hw->mac_type) { case e1000_82542_rev2_0: case e1000_82542_rev2_1: case e1000_82543: break; case e1000_82544: e1000_read_eeprom(hw, EEPROM_INIT_CONTROL2_REG, 1, &eeprom_data); eeprom_apme_mask = E1000_EEPROM_82544_APM; break; case e1000_82546: case e1000_82546_rev_3: if (er32(STATUS) & E1000_STATUS_FUNC_1) { e1000_read_eeprom(hw, EEPROM_INIT_CONTROL3_PORT_B, 1, &eeprom_data); break; } /* Fall Through */ default: e1000_read_eeprom(hw, EEPROM_INIT_CONTROL3_PORT_A, 1, &eeprom_data); break; } if (eeprom_data & eeprom_apme_mask) adapter->eeprom_wol |= E1000_WUFC_MAG; /* now that we have the eeprom settings, apply the special cases * where the eeprom may be wrong or the board simply won't support * wake on lan on a particular port */ switch (pdev->device) { case E1000_DEV_ID_82546GB_PCIE: adapter->eeprom_wol = 0; break; case E1000_DEV_ID_82546EB_FIBER: case E1000_DEV_ID_82546GB_FIBER: /* Wake events only supported on port A for dual fiber * regardless of eeprom setting */ if (er32(STATUS) & E1000_STATUS_FUNC_1) adapter->eeprom_wol = 0; break; case E1000_DEV_ID_82546GB_QUAD_COPPER_KSP3: /* if quad port adapter, disable WoL on all but port A */ if (global_quad_port_a != 0) adapter->eeprom_wol = 0; else adapter->quad_port_a = true; /* Reset for multiple quad port adapters */ if (++global_quad_port_a == 4) global_quad_port_a = 0; break; } /* initialize the wol settings based on the eeprom settings */ adapter->wol = adapter->eeprom_wol; device_set_wakeup_enable(&adapter->pdev->dev, adapter->wol); /* Auto detect PHY address */ if (hw->mac_type == e1000_ce4100) { for (i = 0; i < 32; i++) { hw->phy_addr = i; e1000_read_phy_reg(hw, PHY_ID2, &tmp); if (tmp != 0 && tmp != 0xFF) break; } if (i >= 32) goto err_eeprom; } /* reset the hardware with the new settings */ e1000_reset(adapter); strcpy(netdev->name, "eth%d"); err = register_netdev(netdev); if (err) goto err_register; e1000_vlan_filter_on_off(adapter, false); /* print bus type/speed/width info */ e_info(probe, "(PCI%s:%dMHz:%d-bit) %pM\n", ((hw->bus_type == e1000_bus_type_pcix) ? "-X" : ""), ((hw->bus_speed == e1000_bus_speed_133) ? 133 : (hw->bus_speed == e1000_bus_speed_120) ? 120 : (hw->bus_speed == e1000_bus_speed_100) ? 100 : (hw->bus_speed == e1000_bus_speed_66) ? 66 : 33), ((hw->bus_width == e1000_bus_width_64) ? 64 : 32), netdev->dev_addr); /* carrier off reporting is important to ethtool even BEFORE open */ netif_carrier_off(netdev); e_info(probe, "Intel(R) PRO/1000 Network Connection\n"); cards_found++; return 0;err_register:err_eeprom: e1000_phy_hw_reset(hw); if (hw->flash_address) iounmap(hw->flash_address); kfree(adapter->tx_ring); kfree(adapter->rx_ring);err_dma:err_sw_init:err_mdio_ioremap: iounmap(hw->ce4100_gbe_mdio_base_virt); iounmap(hw->hw_addr);err_ioremap: disable_dev = !test_and_set_bit(__E1000_DISABLED, &adapter->flags); free_netdev(netdev);err_alloc_etherdev: pci_release_selected_regions(pdev, bars);err_pci_reg: if (!adapter || disable_dev) pci_disable_device(pdev); return err;}
其中,函数e1000_is_need_ioport,表示当前网卡是否要通过ip口配置:
/** * e1000_is_need_ioport - determine if an adapter needs ioport resources or not * @pdev: PCI device information struct * * Return true if an adapter needs ioport resources **/static int e1000_is_need_ioport(struct pci_dev *pdev){ switch (pdev->device) { case E1000_DEV_ID_82540EM: case E1000_DEV_ID_82540EM_LOM: case E1000_DEV_ID_82540EP: case E1000_DEV_ID_82540EP_LOM: case E1000_DEV_ID_82540EP_LP: case E1000_DEV_ID_82541EI: case E1000_DEV_ID_82541EI_MOBILE: case E1000_DEV_ID_82541ER: case E1000_DEV_ID_82541ER_LOM: case E1000_DEV_ID_82541GI: case E1000_DEV_ID_82541GI_LF: case E1000_DEV_ID_82541GI_MOBILE: case E1000_DEV_ID_82544EI_COPPER: case E1000_DEV_ID_82544EI_FIBER: case E1000_DEV_ID_82544GC_COPPER: case E1000_DEV_ID_82544GC_LOM: case E1000_DEV_ID_82545EM_COPPER: case E1000_DEV_ID_82545EM_FIBER: case E1000_DEV_ID_82546EB_COPPER: case E1000_DEV_ID_82546EB_FIBER: case E1000_DEV_ID_82546EB_QUAD_COPPER: return true; default: return false; }}
/** * 内存映射相关:想尝试追踪到底层汇编的代码上 * pci_select_bars - Make BAR mask from the type of resource * @dev: the PCI device for which BAR mask is made * @flags: resource type mask to be selected * * This helper routine makes bar mask from the type of resource. */int pci_select_bars(struct pci_dev *dev, unsigned long flags){ int i, bars = 0; for (i = 0; i < PCI_NUM_RESOURCES; i++) if (pci_resource_flags(dev, i) & flags) bars |= (1 << i); return bars;}EXPORT_SYMBOL(pci_select_bars);
具体这里的EXPORT_SYMBOL是啥?
#define EXPORT_SYMBOL(sym) _EXPORT_SYMBOL(sym, "")#define __EXPORT_SYMBOL(sym, sec, ns) ___EXPORT_SYMBOL(sym, sec, ns)#endif /* CONFIG_MODULES */#ifdef DEFAULT_SYMBOL_NAMESPACE#include <linux/stringify.h>#define _EXPORT_SYMBOL(sym, sec) __EXPORT_SYMBOL(sym, sec, __stringify(DEFAULT_SYMBOL_NAMESPACE))#else#define _EXPORT_SYMBOL(sym, sec) __EXPORT_SYMBOL(sym, sec, "")#endif#define EXPORT_SYMBOL(sym) _EXPORT_SYMBOL(sym, "")#define EXPORT_SYMBOL_GPL(sym) _EXPORT_SYMBOL(sym, "_gpl")#define EXPORT_SYMBOL_GPL_FUTURE(sym) _EXPORT_SYMBOL(sym, "_gpl_future")#define EXPORT_SYMBOL_NS(sym, ns) __EXPORT_SYMBOL(sym, "", #ns)#define EXPORT_SYMBOL_NS_GPL(sym, ns) __EXPORT_SYMBOL(sym, "_gpl", #ns)#ifdef CONFIG_UNUSED_SYMBOLS#define EXPORT_UNUSED_SYMBOL(sym) _EXPORT_SYMBOL(sym, "_unused")#define EXPORT_UNUSED_SYMBOL_GPL(sym) _EXPORT_SYMBOL(sym, "_unused_gpl")/* * For every exported symbol, do the following: * * - If applicable, place a CRC entry in the __kcrctab section. * - Put the name of the symbol and namespace (empty string "" for none) in * __ksymtab_strings. * - Place a struct kernel_symbol entry in the __ksymtab section. * * note on .section use: we specify progbits since usage of the "M" (SHF_MERGE) * section flag requires it. Use '%progbits' instead of '@progbits' since the * former apparently works on all arches according to the binutils source. */#define ___EXPORT_SYMBOL(sym, sec, ns) \ extern typeof(sym) sym; \ extern const char __kstrtab_##sym[]; \ extern const char __kstrtabns_##sym[]; \ __CRC_SYMBOL(sym, sec); \ asm(" .section \"__ksymtab_strings\",\"aMS\",%progbits,1 \n" \ "__kstrtab_" #sym ": \n" \ " .asciz \"" #sym "\" \n" \ "__kstrtabns_" #sym ": \n" \ " .asciz \"" ns "\" \n" \ " .previous \n"); \ __KSYMTAB_ENTRY(sym, sec)
#define __CRC_SYMBOL(sym, sec) \ asm(" .section \"___kcrctab" sec "+" #sym "\", \"a\" \n" \ " .weak __crc_" #sym " \n" \ " .long __crc_" #sym " - . \n" \ " .previous \n")#else#define __CRC_SYMBOL(sym, sec) \ asm(" .section \"___kcrctab" sec "+" #sym "\", \"a\" \n" \ " .weak __crc_" #sym " \n" \ " .long __crc_" #sym " \n" \ " .previous \n")#endif
/* * Emit the ksymtab entry as a pair of relative references: this reduces * the size by half on 64-bit architectures, and eliminates the need for * absolute relocations that require runtime processing on relocatable * kernels. */#define __KSYMTAB_ENTRY(sym, sec) \ __ADDRESSABLE(sym) \ asm(" .section \"___ksymtab" sec "+" #sym "\", \"a\" \n" \ " .balign 4 \n" \ "__ksymtab_" #sym ": \n" \ " .long " #sym "- . \n" \ " .long __kstrtab_" #sym "- . \n" \ " .long __kstrtabns_" #sym "- . \n" \ " .previous \n")
原来是汇编。
其中,对pci设备初始化的函数也很重要:
函数 pci_enable_device(pdev)
/** * pci_enable_device - Initialize device before it's used by a driver. * @dev: PCI device to be initialized * * Initialize device before it's used by a driver. Ask low-level code * to enable I/O and memory. Wake up the device if it was suspended. * Beware, this function can fail. * * Note we don't actually enable the device many times if we call * this function repeatedly (we just increment the count). */#define IORESOURCE_IO 0x00000100 /* PCI/ISA I/O ports */#define IORESOURCE_MEM 0x00000200int pci_enable_device(struct pci_dev *dev){ return pci_enable_device_flags(dev, IORESOURCE_MEM | IORESOURCE_IO);}EXPORT_SYMBOL(pci_enable_device);
static int pci_enable_device_flags(struct pci_dev *dev, unsigned long flags){ struct pci_dev *bridge; int err; int i, bars = 0; /* * Power state could be unknown at this point, either due to a fresh * boot or a device removal call. So get the current power state * so that things like MSI message writing will behave as expected * (e.g. if the device really is in D0 at enable time). */ if (dev->pm_cap) { u16 pmcsr; pci_read_config_word(dev, dev->pm_cap + PCI_PM_CTRL, &pmcsr); dev->current_state = (pmcsr & PCI_PM_CTRL_STATE_MASK); } if (atomic_inc_return(&dev->enable_cnt) > 1) return 0; /* already enabled */ bridge = pci_upstream_bridge(dev); if (bridge) pci_enable_bridge(bridge); /* only skip sriov related */ for (i = 0; i <= PCI_ROM_RESOURCE; i++) if (dev->resource[i].flags & flags) bars |= (1 << i); for (i = PCI_BRIDGE_RESOURCES; i < DEVICE_COUNT_RESOURCE; i++) if (dev->resource[i].flags & flags) bars |= (1 << i); err = do_pci_enable_device(dev, bars); if (err < 0) atomic_dec(&dev->enable_cnt); return err;}
最终的执行函数为do_pci_enable_device
#define PCI_D0 ((pci_power_t __force) 0)#define PCI_INTERRUPT_PIN 0x3d /* 8 bits */#define PCI_COMMAND 0x04 /* 16 bits */#define PCI_COMMAND_INTX_DISABLE 0x400 /* INTx Emulation Disable */static int do_pci_enable_device(struct pci_dev *dev, int bars){ int err; struct pci_dev *bridge; u16 cmd; u8 pin; err = pci_set_power_state(dev, PCI_D0); if (err < 0 && err != -EIO) return err; bridge = pci_upstream_bridge(dev); if (bridge) pcie_aspm_powersave_config_link(bridge); err = pcibios_enable_device(dev, bars); if (err < 0) return err; pci_fixup_device(pci_fixup_enable, dev); if (dev->msi_enabled || dev->msix_enabled) return 0; pci_read_config_byte(dev, PCI_INTERRUPT_PIN, &pin); if (pin) { pci_read_config_word(dev, PCI_COMMAND, &cmd); if (cmd & PCI_COMMAND_INTX_DISABLE) pci_write_config_word(dev, PCI_COMMAND, cmd & ~PCI_COMMAND_INTX_DISABLE); } return 0;}
int __weak pcibios_enable_device(struct pci_dev *dev, int bars){return pci_enable_resources(dev, bars);}
int pci_enable_resources(struct pci_dev *dev, int mask){ u16 cmd, old_cmd; int i; struct resource *r; // 重点解读函数 pci_read_config_word(dev, PCI_COMMAND, &cmd); old_cmd = cmd; for (i = 0; i < PCI_NUM_RESOURCES; i++) { if (!(mask & (1 << i))) continue; r = &dev->resource[i]; if (!(r->flags & (IORESOURCE_IO | IORESOURCE_MEM))) continue; if ((i == PCI_ROM_RESOURCE) && (!(r->flags & IORESOURCE_ROM_ENABLE))) continue; if (r->flags & IORESOURCE_UNSET) { pci_err(dev, "can't enable device: BAR %d %pR not assigned\n", i, r); return -EINVAL; } if (!r->parent) { pci_err(dev, "can't enable device: BAR %d %pR not claimed\n", i, r); return -EINVAL; } if (r->flags & IORESOURCE_IO) cmd |= PCI_COMMAND_IO; if (r->flags & IORESOURCE_MEM) cmd |= PCI_COMMAND_MEMORY; } if (cmd != old_cmd) { pci_info(dev, "enabling device (%04x -> %04x)\n", old_cmd, cmd); //重点解读函数 pci_write_config_word(dev, PCI_COMMAND, cmd); } return 0;}
int pci_read_config_word(const struct pci_dev *dev, int where, u16 *val){ if (pci_dev_is_disconnected(dev)) { *val = ~0; return PCIBIOS_DEVICE_NOT_FOUND; } return pci_bus_read_config_word(dev->bus, dev->devfn, where, val);}
顺着代码往深了找,希望能找到读写配置文件的汇编代码
#define PCI_byte_BAD 0#define PCI_word_BAD (pos & 1)#define PCI_dword_BAD (pos & 3)#ifdef CONFIG_PCI_LOCKLESS_CONFIG# define pci_lock_config(f) do { (void)(f); } while (0)# define pci_unlock_config(f) do { (void)(f); } while (0)#else# define pci_lock_config(f) raw_spin_lock_irqsave(&pci_lock, f)# define pci_unlock_config(f) raw_spin_unlock_irqrestore(&pci_lock, f)#endif#define PCI_OP_READ(size, type, len) \int noinline pci_bus_read_config_##size \ (struct pci_bus *bus, unsigned int devfn, int pos, type *value) \{ \ int res; \ unsigned long flags; \ u32 data = 0; \ if (PCI_##size##_BAD) return PCIBIOS_BAD_REGISTER_NUMBER; \ pci_lock_config(flags); \ res = bus->ops->read(bus, devfn, pos, len, &data); \ *value = (type)data; \ pci_unlock_config(flags); \ return res; \}#define PCI_OP_WRITE(size, type, len) \int noinline pci_bus_write_config_##size \ (struct pci_bus *bus, unsigned int devfn, int pos, type value) \{ \ int res; \ unsigned long flags; \ if (PCI_##size##_BAD) return PCIBIOS_BAD_REGISTER_NUMBER; \ pci_lock_config(flags); \ res = bus->ops->write(bus, devfn, pos, len, value); \ pci_unlock_config(flags); \ return res; \}PCI_OP_READ(byte, u8, 1)PCI_OP_READ(word, u16, 2)PCI_OP_READ(dword, u32, 4)PCI_OP_WRITE(byte, u8, 1)PCI_OP_WRITE(word, u16, 2)PCI_OP_WRITE(dword, u32, 4)EXPORT_SYMBOL(pci_bus_read_config_byte);EXPORT_SYMBOL(pci_bus_read_config_word);EXPORT_SYMBOL(pci_bus_read_config_dword);EXPORT_SYMBOL(pci_bus_write_config_byte);EXPORT_SYMBOL(pci_bus_write_config_word);EXPORT_SYMBOL(pci_bus_write_config_dword);
es = bus->ops->write(bus, devfn, pos, len, value);
bus->ops->write
这个write是不是就是我要找的往IO端口写操作?
/* Low-level architecture-dependent routines */struct pci_ops { int (*add_bus)(struct pci_bus *bus); void (*remove_bus)(struct pci_bus *bus); void __iomem *(*map_bus)(struct pci_bus *bus, unsigned int devfn, int where); int (*read)(struct pci_bus *bus, unsigned int devfn, int where, int size, u32 *val); int (*write)(struct pci_bus *bus, unsigned int devfn, int where, int size, u32 val);};
pci_ops里都是函数指针了
这种函数指针不太好找源头了,只要把函数的地址给它,它就可以调用那个函数。
所以,要想知道它具体执行的是哪段代码,就不太好找了。我还没找到。
看来,直接读linux操作系统的源码,找到一些非常底层,非常细节的信息还是比较困难的。因为这里有大量的宏定义的使用,导致我们使用字符串查找去跟踪函数调用的方法失败了。
不过,从代码的上下文来猜测的话,应该是读PCI的配置信息的。
其实读PCI的配置信息可以直接从IO端口读。
我们可以从底层出发,直接自己写IO端口的读写程序,写完以后再尝试来参考linxu操作系统的相关代码。
直接从IO端口读
在内核的:arch/x86/pci/early.c
文件内,有从IO端口读取PCI配置信息的代码
// SPDX-License-Identifier: GPL-2.0#include <linux/kernel.h>#include <linux/pci.h>#include <asm/pci-direct.h>#include <asm/io.h>#include <asm/pci_x86.h>/* Direct PCI access. This is used for PCI accesses in early boot before the PCI subsystem works. */u32 read_pci_config(u8 bus, u8 slot, u8 func, u8 offset){ u32 v; outl(0x80000000 | (bus<<16) | (slot<<11) | (func<<8) | offset, 0xcf8); v = inl(0xcfc); return v;}u8 read_pci_config_byte(u8 bus, u8 slot, u8 func, u8 offset){ u8 v; outl(0x80000000 | (bus<<16) | (slot<<11) | (func<<8) | offset, 0xcf8); v = inb(0xcfc + (offset&3)); return v;}u16 read_pci_config_16(u8 bus, u8 slot, u8 func, u8 offset){ u16 v; outl(0x80000000 | (bus<<16) | (slot<<11) | (func<<8) | offset, 0xcf8); v = inw(0xcfc + (offset&2)); return v;}void write_pci_config(u8 bus, u8 slot, u8 func, u8 offset, u32 val){ outl(0x80000000 | (bus<<16) | (slot<<11) | (func<<8) | offset, 0xcf8); outl(val, 0xcfc);}void write_pci_config_byte(u8 bus, u8 slot, u8 func, u8 offset, u8 val){ outl(0x80000000 | (bus<<16) | (slot<<11) | (func<<8) | offset, 0xcf8); outb(val, 0xcfc + (offset&3));}void write_pci_config_16(u8 bus, u8 slot, u8 func, u8 offset, u16 val){ outl(0x80000000 | (bus<<16) | (slot<<11) | (func<<8) | offset, 0xcf8); outw(val, 0xcfc + (offset&2));}int early_pci_allowed(void){ return (pci_probe & (PCI_PROBE_CONF1|PCI_PROBE_NOEARLY)) == PCI_PROBE_CONF1;}
这个代码我们是可以移植到30天操作系统harios上的,把这个代码中的outl和inl换成harios内的io_out,io_in函数就可以了。
也就是说,我们可以按照以上代码读取,或者写入到配置信息了。
这里的配置信息到底是什么呢?,就是如下图中的4x16=64Bytes的信息
参考:
通过读这个配置信息的class_code字段,我们就知道PCI连接的设备是网卡?还是显卡?还是硬盘?还是声卡?
这个配置信息的Device ID,Vendor ID,表明PCI连接的设备的型号和制造厂商.
这个配置信息的Base Address 0 里存储了PCI连接的设备的地址映射内存中的地址。
总之,只要读到这个配置信息,我们就可以找到网卡,找到网卡,才能控制网卡向外发送信息,接收信息等。
那么具体如何读取这个配置信息呢?通过I/O口0xCF8和0xCFC,通过这两个端口,就可以读取到配置信息了。如下两行代码:
io_out32(0xCF8, addr);// 把配置信息的地址addr输出到I/O端口0xCF8indata = io_in32(0xCFC); //从I/O端口0xCFC获取到配置信息。
这里面的addr是什么?是配置信息所在的地址,可以这样生成:
unsigned int bus_max=0xff; unsigned int dev_max=0x1f; unsigned int func_max=0x07; // 遍历配置信息 for(bus=0;bus<=bus_max;++bus) { for(dev=0;dev<=dev_max;++dev) { for(func=0;func<=func_max;++func) { // 生成配置信息的地址 unsigned int addr = 0x80000000L | (bus<<16) | (dev<<11) | (func<<8) | (0<<2); io_out32(0xCF8, addr); indata = io_in32(0xCFC); } } }
这里生成配置信息的地址:0x80000000 | (bus<16) | (dev<<11) | (func<<8) | (0<<2);
这是什么意思呢?
首先这是一个8*4=32bits的数,4个bytes,这4个bytes的意义:
最高位31位要设置成1,才能表示对PCI的配置信息操作。如果此位为0,表示要PCI所连接的设备的操作。
30--24是保留位,我们可以写入我们特定信息。
23--16位是总线号,15--11位是设备号,10--8位是功能号,7--2位是配置信息中某条信息的偏移地址。
所以,
0x80000000 | (bus<16) | (dev<<11) | (func<<8) | (offset<<2);
的意思是对配置信息中的总线号为bus,设备号为dev,功能号为func所指定的配置信息中的第offset个4字节的信息进行操作。
第0个4字节的信息,其实就是配置信息的第一行,Device ID 和Vendor ID.
那么第三行就是class code,第4行就是header type 了。
所以,我们要访问所有的配置信息,只用改变offset的值,使其为分别为0--15,就可以分别读取到所有的配置信息了。
那么地址中的bus,dev,func是什么意思呢?
跟CPU连接的PCI有很多,用bus, dev,func就把不同的PCI区分开了。那么与当前CPU连接的PCI的bus, dev,func分别是多少呢?
也就是说,bus, dev,func到底填多少合适?到底网卡连接的pci对应的bus,dev,func应该是多少呢?
可以写for循环去搜索,当bus,dev,func取到合适的值时,我们读取到的配置信息的class_code应该是02。
因为class_code是02时,表示PCI连接的设备是网卡。
所以,我们在以上代码中写了for循环,去遍历所有的bus,dev,func的值,看看哪些位置有pci,并且这个pci连接的是网卡。
那么如何知道哪些bus,dev,func的值对应的有pci? 只用看其对应的配置信息的第1行,如果第1行的DeviceID 和 Vendor ID不是0xFFFF,就说明有配置信息。
有配置信息,说明PCI连接的有设备,但不一定是网卡,还可能是显卡,硬盘等。
所以,我们还要进一步看其class_code的值,如果是02,就表示是网卡。
另外,遍历总线号bus时,由于bus的数字是存储于adder的23--16位的,一共8个bit位,所以,其取值范围为0--255,
遍历设备号dev时,由于其存储于addr的15--11位,共5位,所以,其取值范围为0--31.
遍历功能号func时,由于其存储于addr的10-8位,共3位,所以,其取值范围为0-7.
那么最终,我们的代码为:
void check_pci(unsigned char *buf_back,struct BOOTINFO *binfo){ char s[200]; unsigned int indata; int bus,dev,func; unsigned int bus_max=0xff; unsigned int dev_max=0x1f; unsigned int func_max=0x07; // 设置打印信息的显示位置 int start_row=250; int start_col=50; int row_inc=0; int i; sprintf(s, "buf ,dev ,func ,vender_id ,device_id ,header_type,class_code "); putfonts8_asc(buf_back, binfo->scrnx, start_col, start_row-1*16, COL8_FFFFFF, s); for(bus=0;bus<=bus_max;++bus) { for(dev=0;dev<=dev_max;++dev) { for(func=0;func<=func_max;++func) { unsigned int addr = 0x80000000L | (bus<<16) | (dev<<11) | (func<<8) | (0<<2); io_out32(0xCF8, addr); indata = io_in32(0xCFC); // 查看当前bus,dev,func处有无pci if( ((indata & 0xffff) != 0xffff) && (indata !=0)) { //如果有,获取当前pci的第4行的header type unsigned int addr1 = addr | (3<<2); io_out32(0xCF8,addr1); unsigned int header_type = (io_in32(0xCFC)&0x00ff0000)>>16; //获取当前pci的第3行的class code ,这个可以看到设备是否是网卡 unsigned int addr2 = addr | (2<<2); io_out32(0xCF8,addr2); unsigned int class_code = (io_in32(0xCFC)&0xff000000)>>24; // 显示pci的device id,vendor id,header type, class code sprintf(s, "%02d ,%02d ,%02d ,0x%04x ,0x%04x ,0x%02x ,0x%02x ",bus,dev,func,indata&0xffff,(indata&0xffff0000)>>16,header_type,class_code); putfonts8_asc(buf_back, binfo->scrnx, start_col, start_row+row_inc*16, COL8_FFFFFF, s); row_inc++; } } } } return;}
把函数check_pci添加到bootpack.c的for循环之前:
check_pci(buf_back,binfo);//打印pci信息 // 图层刷新 sheet_refresh(sht_back, 0, 0, sht_back->bxsize, sht_back->bysize);
显示效果如下
可以看到,在bus=0,dev=0,func=0时,对应的pci所连接的设备的vender_id是0x8086,表示是Inter的,device_id是1237,header_type是0,class_code是0x06,表明不是网卡,0x06具体表示什么?看下表:
表明它是桥设备。
根据这个表,咱们打印出的第3行是0x01, 表示海量存储器
根据这个表,咱们打印出的第4行是0x03, 表示网络控制器,即显卡
根据这个表,咱们打印出的第5行是0x02, 表示显示控制器,即网卡。
那么到这里,我们在bus号为0,dev号为3,func号为0的位置,找到了一张网卡。它的厂商是0x10ech,设备号是8029h
好了,到这里,今天我们通过I/O端口,读取到了网卡对应的PCI.
下一步就可以通过网卡对应的PCI来控制网卡收发信息。
附录:
在30天操作系统上写网卡驱动需要从头开始,是比较琐碎的事,为此,我搜索了一定的源码和资料。
源码就是linux系统的各版本内核的源码,它带有很多网卡的驱动。
资料就是对pci,总线,以及内存映射,网络通信协议等资料。
在后续的驱动编写中,可能要反复的来复习这些资料。
比如:
这里面说了两种方法访问配置信息,第一种就是通过端口的0xCF8,0xCFC;第二种是通过内存映射。
比如直接访问如下的内存区域即可得到:
我还没有试。留着以后参考中。
还有一个细节,汇编对端口的读写,要特别注意。
比如我这里一开始端口的读写程序有bug,所以读取信息一直是错误的。
最后把 端口的读写程序从
_io_out32: ; void io_out32(int port, int data); MOV EDX,[ESP+4] ; port MOV EAX,[ESP+8] ; data OUT EDX,EAX RET
改为
_io_out32: ; void io_out32(int port, int data); MOV DX,[ESP+4] ; port MOV EAX,[ESP+8] ; data OUT DX,EAX RET
后,就正常了,
因为端口的地址0xCF8,不需要EDX,用EDX反而不能表示端口了,就无法给实现往端口上输出信息。
关于端口读写,还有用嵌入汇编的方式,比如:
另外,使用端口对PCI配置信息的读取有个例子不错,可以参考:
这里写了一个window上的代码和一个linux上的代码。
比如我用linux上的代码后,输入的结果如下:
运行的时候,需要用sudo。因为这里设计到IO读取,所以需要root权限。
另外运行这个例子的时候,我就有个疑问:问什么这个例子明明是个应用程序,不是操作系统本身的代码,它却可以访问IO端口?
一般应用程序的代码,由于GDT的定义,应该是不能操作I/O端口的。
后来的代码里发现了
if ( iopl(3) < 0 ) { printf("iopl set error\n"); return -1; }
iopl函数用于获取io端口的访问权限,如果这个函数获取返回值大于0,就可以通过操作系统所提供的中断函数来对io端口进行访问了。
总之:应用程序要访问操作系统所占用的哪些资源,都得通过中断函数来访问。通过中断函数,我们就可以设计权限,加以控制,保证操作系统代码本身的安全性。
以下是一些样例:
在30天操作系统的代码上,添加读pci配置的程序check_pci的过程并不顺利,一开始写到屏幕上的信息无法刷新出来,就通过鼠标移动过去,把这些信息“擦”出来。当然后来发现使用刷新函数的时候,刷新错对象向,应该刷新sht_back,就过刷新sht_win了,sht_win表示的是tast_a窗口,如下图。
上图中显示了很多0xffff,这是因为程序什么也没有输出,所以我们调试的时候,就把所有的信息都输出的屏幕上了;我把没有cpi时的信息显示到左边,有cpi时的信息显示到了右边。
在MAC上
可以看到,这里一共6个设备,比在windows上的qemu多了一个设备,多了一些PCI桥设备。网卡的型号变成100e了。
既然找到网卡了,下一步就是启动网卡,读取网卡的mac地址,命令网卡发送数据等了。
具体怎么实现呢?
1比如重启网卡设备,就是给网卡的某寄存器写入命令,用I/O方法,或者内存映射的方法。具体详细信息可以查看网卡信息的datasheet。
重启网卡设备,则是通过向映射后的网卡的相应寄存器写入命令,其通过映射后的首地址及相应的寄存器偏移量找到该寄存器的位置,然后通过函数writeb()写该寄存器。有关相关寄存器对应的偏移量,一般是通过网卡的相关的datasheet获得。
2.如果要获取网卡的MAC地址,则一般通过函数readb()读取首地址开始的前六位。
后面我们就获取到网卡的MAC地址,然后给网卡的寄存器里写值,来命令网卡收发信息。或者把网卡动作与操作系统的中断绑定起来。