SRIOV introduction, VF pass-through configuration, and packet forwarding rate performance test Table of contents
1. Introduction to SRIOV ▷ Bottleneck of the traditional method: The traditional method of Qemu's network card is to use a tap network card to bridge to the host's bridge, but the performance is very poor, especially the packet forwarding rate is very low, which is difficult to meet the scenarios with high performance requirements. The main reason for the poor performance is that the path is too long and passes through too many kernel devices. The fundamental reason is that the Linux/Unix kernel itself is not designed for high performance. Linux/Unix is more suitable for control planes rather than forwarding planes. The conclusion of the performance test is given first. Compared with the traditional tap+bridge solution, SRIOV VF pass-through has improved performance: ▷ Packet forwarding rate increased: 677% 2. Environmental Description Model: Dell PowerEdge R620 3. Enable SRIOV Enable SRIOV in BIOS, as shown in the figure Note: Even if global SRIOV is enabled in BIOS, the network card can still be used as a normal network card You need to enable VT-d in BIOS grub configuration iommu iommu=pt intel_iommu=on 4. Generate VF # Start the network card ip link set p1p1 up # View the pci number of pf lshw -c network -businfo # View the number of vfs supported by the network card cat /sys/bus/pci/devices/0000:41:00.0/sriov_totalvfs # Generate vf, it is recommended to add startup echo 63 > /sys/bus/pci/devices/0000:41:00.0/sriov_numvfs Note: If the host's VF driver is not blocked, you must wait for a while after generating the vf to see all the named network cards on the host (otherwise you will see a bunch of ethX network cards). The more vfs there are, the longer the waiting time will be. For 63 vfs, it will take about 10 seconds. 5. VF pass-through If qemu is managed by libvirt, there are three configuration methods: ▷ Method 1 (interface): Add in the devices section <interface type='hostdev' managed='yes'> <mac address='52:54:00:ad:ef:8d'/> <source> <address type='pci' domain='0x0000' bus='0x41' slot='0x10' function='0x0'/> </source> <vlan> <tag id='4010'/> </vlan> </interface> The address in <source> above can be configured according to "lshw -c network -businfo", for example pci@0000:41:10.0 p1p1_0 ▷ Method 2 (hostdev): Add in the devices section <hostdev mode='subsystem' type='pci' managed='yes'> <source> <address domain='0x0000' bus='0x41' slot='0x10' function='0x0'/> </source> </hostdev> The address in <source> above is also configured according to "lshw -c network -businfo" ▷ Method 3 (net-pool) Define a net-pool for each PF network card, that is, edit an xml file respectively. Here only one PF is shown, edit sriov-int.xml <network> <name>sriov-int</name> <forward mode='hostdev' managed='yes'> <pf dev='p1p1'/> </forward> </network> Add to libvirt net-pool, activate, and set to start on boot virsh net-define sriov-int.xml virsh net-start sriov-int virsh net-autostart sriov-int Although net-autostart is configured, it does not work because when the physical machine starts, libvirt is often started before vf is generated (assuming vf is generated in rc.local), and net-pool (sriov-int) should be started after vf is generated. Therefore, it is recommended to add the following content in rc.local to ensure startup ip link set p1p2 up echo 63 > /sys/bus/pci/devices/0000:41:00.0/sriov_numvfs virsh net-start sriov-int Then, add in vm's xml <interface type='network'> <mac address='52:54:00:ad:ef:8d'/> <source network='sriov-int'/> <vlan> <tag id='4010'/> </vlan> </interface> How to choose among the three methods ▷ Method 1: Multiple functions, can configure mac and vlan There is a bug. When the total number of VFs used by all VMs on this host exceeds the VF upper limit, no error is reported and the VM can be started, but there may be exceptions. If the VM is shut down by destroy, the corresponding VF will have problems. For example, when you use ip link set p1p1 vf 0 mac 00:00:00:00:00:00 to reset it, you will be prompted with "RTNETLINK answers: Cannot allocate memory". It is difficult to repair, and even if it is repaired, you don't know whether there are any invisible exceptions. There is no way to know which vf a particular vm is using. Therefore, if you want to set a speed limit for a vf or turn on or off spoofchk, you can only get the vf number on the host by "ip link show dev p1p1 | grep MAC address" before you can set the speed limit and other operations. To sum up: Method 3 is the most convenient, but it has a bug, so you need to have good logic to prevent the total number of vfs used by the VM from exceeding the upper limit. 6. Enable irqbalance x520 has 2 queues, x710 has 4 queues, and the interrupt balancing service (irqbalance) needs to be started in the VM, otherwise only one CPU will process the data packet. Also, this has nothing to do with query_rss of the vf on the host. 7. VM Migration The pass-through network card is a PCI device, but libvirt and qemu do not support migration of VMs with non-USB PCI devices, including cold migration and hot migration. Therefore, hot migration cannot be achieved. There are two solutions for cold migration: ▷ Detach the vf network card, then use libvirt to migrate, and then attach the vf network card on the new host after migration Note: You cannot use libvirt's migration function when the VM is shut down, which may cause the virtual machine to disappear, including the original host and the new host. 8. Bandwidth Limitation Only outbound bandwidth can be limited, not inbound bandwidth ip link set p1p1 vf 0 max_tx_rate 100 Indicates that the outbound bandwidth is limited to 100Mbps, which varies for different network cards: ▷ The minimum speed limit of the x520 network card is 11Mbps and the maximum speed limit is 10000Mbps. Setting it to 0 means no speed limit. If it is less than 11 or greater than 10000, an error will be reported. Note: The bandwidth limit of vf will not be reset after vm is shut down 9. Security Only source MAC filtering and network card MAC anti-tampering are supported, and other security protections are not supported (anti-ARP spoofing cannot be achieved) Source mac filtering ip link set p1p1 vf 0 spoofchk on Indicates that if the source MAC of the packet sent from the VM is not the specified MAC, the data packet is not allowed to pass. Note: vf's spoofchk will not be reset after vm is shut down Network card mac anti-tampering ▷ If you modify the mac on the host, the mac in the VM will not be modified; if you modify the mac in the VM, the changes can be seen on the host Manually modify the mac method on the host (can be modified when the VM is turned off or on): ip link set p1p1 vf 0 mac aa:bb:cc:dd:ee:ff suggestion: ▷ Reset vf before starting vm 10. Other Restrictions on Use ▷ The vf network card in the VM cannot be bridged to the Linux bridge in the VM, which also makes ebtables unusable, but iptables can be used 11. Performance Testing Test Method: ▷ Multiple VMs send packets at the same time, and one VM receives packets. Observe the performance of sending and receiving packets respectively. Configuration: ▷ VM configuration is 4 cores and 8G Test results: Test conclusion: Using SR-IOV+VF pass-through can significantly improve the packet forwarding rate. The 1:1 test results show that the kernel state packet sending can reach 3.5Mpps and the packet receiving can reach 1.9Mpps. ▷ The packet sending rate is 1196% higher than that of vxlan and 677% higher than that of vlan. This result refers to 1 to 1 (1 sending VM, 1 receiving VM) illustrate: ▷ The kernel state single-core data packet (64B) processing capacity is 2Mpps More test results: The packet size used in the following test is 64B ▷ Kernel state, Layer 3 forwarding performance: the packet sender uses different source IPs ▪ BCM57800: 2Mpps ▷ Kernel state, Layer 2 forwarding performance: the packet sender uses a different source MAC ▪ BCM57800: 2Mpps ▷ VXLAN encapsulation capability in kernel state ▪ The inner layer of vxlan uses different source ip to send packets ▷ DPDK user mode, Layer 2 forwarding performance: the packet sender uses different source IP ▪ BCM57800: Not supported SR-IOV mode ▪ X520 total 11.2Mpps, each VM is 11.2Mpps/total VM (ie VF number) Summarize: ▷ Interrupt balancing in kernel state is based on the following factors: Layer 2 is based on source mac, and Layer 3 is based on source ip Notice: ▷ In kernel state, using multi-queue RSS interrupt balancing to improve throughput will cause very high CPU usage 12. Using VF in Windows virtual machines Go to the network card official website to download the corresponding driver and install it. After testing, win2012 has the 82599 (x520) driver by default, but the version is old. 13. Operation and maintenance commands # View the number of vfs supported by the network card cat /sys/bus/pci/devices/0000:41:00.0/sriov_totalvfs # After the host blocks the VF driver, check the correspondence between vf and pf https://github.com/intel/SDN-NFV-Hands-on-Samples/blob/master/SR-IOV_Network_Virtual_Functions_in_KVM/listvfs_by_pf.sh After downloading, execute ./listvfs_by_pf.sh# After the host blocks VF, check which VFs are being used yum install dpdk-tools dpdk-devbind --status # Check which socket the network card corresponds to lstopo-no-graphics # lspci to view the network card information lspci -Dvmm|grep -B 1 -A 4 Ethernet # View specific VF traffic on the host (only supports x520, x710 cannot be found) ethtool -S p1p1 | grep VF 14. Host blocks VF driver echo "blacklist ixgbevf" >> /etc/modprobe.d/blacklist.conf This means that when the physical machine is started, the ixgbevf driver is not loaded by default, but if you manually modprobe ixgbevf, the driver will be loaded. If ixgbevf is currently loaded and you want to uninstall it, you need to do the following steps: echo 0 > /sys/bus/pci/devices/0000:41:00.0/sriov_numvfs rmmod ixgbevf echo 63 > /sys/bus/pci/devices/0000:41:00.0/sriov_numvfs Appendix. Packet forwarding rate test method modprobe pktgen: The packets are sent through pktgen, and the received packets are checked through sar -n DEV. The packets sent are udp packets. #!/bin/bash NIC="eth1" DST_IP="192.168.1.2" DST_MAC="52:54:00:43:99:65" modprobe pktgen pg() { echo inject > $PGDEV cat $PGDEV } pgset() { local result echo $1 > $PGDEV result=`cat $PGDEV | fgrep "Result: OK:"` if [ "$result" = "" ]; then cat $PGDEV | fgrep Result: fi } # Config Start Here ----------------------------------------------------------- # thread config # Each CPU has its own thread. Two CPUs are tested. We add ens7, eth2 respectfully. PGDEV=/proc/net/pktgen/kpktgend_0 echo "Removing all devices" pgset "rem_device_all" echo "Adding ${NIC}" pgset "add_device ${NIC}" # device config # delay 0 means maximum speed. CLONE_SKB="clone_skb 1000000" # NIC adds 4 bytes CRC PKT_SIZE="pkt_size 64" # COUNT 0 means forever COUNT="count 0" DELAY="delay 0" PGDEV=/proc/net/pktgen/${NIC} echo "Configuring $PGDEV" pgset "$COUNT" pgset "$CLONE_SKB" pgset "$PKT_SIZE" pgset "$DELAY" pgset "dst ${DST_IP}" pgset "dst_mac ${DST_MAC}" # Time to run PGDEV=/proc/net/pktgen/pgctrl echo "Running... ctrl^C to stop" pgset "start" echo "Done" # Result can be viewed in /proc/net/pktgen/eth[3,4] ▷ Change eth1 at the beginning of the script to the network card corresponding to the packet pktgen-dpdk # Fixed ip fixed mac set 0 dst ip 192.168.10.240 set 0 src ip 192.168.10.245/24 set 0 dst mac c8:1f:66:d7:58:ba set 0 src mac a0:36:9f:ec:4a:28 # Variable source ip, variable source mac stop 0 range 0 src ip 192.168.0.1 192.168.0.1 192.168.200.200 0.0.0.1 range 0 dst ip 10.1.1.241 10.1.1.241 10.1.1.241 0.0.0.0 range 0 dst mac c8:1f:66:d7:58:ba c8:1f:66:d7:58:ba c8:1f:66:d7:58:ba 00:00:00:00:00:00 range 0 src mac a0:36:9f:ec:4a:28 a0:36:9f:ec:4a:28 a0:36:9f:ec:ff:ff 00:00:00:00:01:01 range 0 src port 100 100 65530 1 range 0 dst port 100 100 65530 1 range 0 size 64 64 64 0 enable 0 range enable 0 latency start 0 # Send packets at 50% rate set 0 rate 50 Appendix. Reference Documents OpenStack's restrictions on sriov migrate sriov configuration Line speed The above is the full content of this article. I hope it will be helpful for everyone’s study. I also hope that everyone will support 123WORDPRESS.COM. |
<<: Solutions to MySql crash and service failure to start
>>: Springboot uses vue+echarts front-end and back-end interaction to realize dynamic donut chart
Table of contents When declaring multiple variabl...
There are many import methods on the Internet, an...
There is a table student in the mysql database, i...
Often when we open foreign websites, garbled char...
Table of contents 1. Test experiment 2. Performan...
Table of contents vite function Use Environment B...
Write to the css file Copy code The code is as fol...
Copy code The code is as follows: <!--[if !IE]...
Table of contents Preface 1. Usage examples 2. Im...
Preface Every good habit is a treasure. This arti...
Table of contents Preface Rolling principle accom...
Table of contents Preface vue.config.js configura...
1. Upper and lower list tags: <dl>..</dl...
MySQL 8.0.19 supports locking the account after e...
/**************************** * System call******...