0 Replies Latest reply on Feb 6, 2012 11:16 AM by jonathanlynch

    DPDK Max RX throughput for 1 RX Queue/1 Port/ 1 NIC with 0% loss

    Community Member

      Hi All,


      I was reading test report documents (450257_450257_DPDK_Test_Report_Rev0.7.pdf and  450257_450257_DPDK_Test_Report_Rev0.8.pdf) and could not find any benhmarks relating to the Max RX throughput for 1 RX Queue/1 Port with 0% loss. The closest is in section "11 Benchmark Results for the Intel® DPDK Layer 3 Forwarding Tests" but  this shows performance for 1 queue on 4 NIC's for forwarding of  packets using LPM or hashing. I want to see performance for 1 queue / 1 port and 1 NIC.


      I have  attached some quick example/prototype code that I put together using the test-pmd  and sample applications as reference. All the code does is receives  packets and then just frees the mbuf rte_pktmbuf_free()  - so it  basically just counts packets. I have played around with the max_burst  param of rte_eth_rx_burst() and also for various values for  rx_conf.rx_free_thresh but no matter what combinations I tried it could not handle 1 Gbit/200,000 pps of traffic with 0% packet loss. It should be possible to achieve this?

      Im running on a ATCA blade with dual  X5670  @ 2.93GHz,  48Gb RAM DDR3  1066 (6x8gb, one 8  gb quad rank module per  channel) , 5520 tylersburg chipset. The 82599 is positioned on a mezzaine board  connected via PCI  Express:5.0Gb/s:Width x8

       

      I set the BIOS options that are mentioned in the test report.
      I made the following changes to the config to increase the size of the mem pool cache so all buffers would be in the cache.
      +CONFIG_RTE_MEMPOOL_CACHE_MAX_SIZE=65536
      -CONFIG_RTE_MEMPOOL_CACHE_MAX_SIZE=512

       

      Some other relavent params set in the code are as follows:

      #define NUM_RX_DESC (4096) - passed to rte_eth_rx_queue_setup()
      #define MAX_PKT_BURST (64) - passed to rte_eth_rx_burst()

      #define MBUFS_PER_POOL (8192 * 16) - passed to rte_mempool_create()
      #define MBUF_CACHE_SIZE (8192 * 8) - passed to rte_mempool_create()

       

      I ran the app using the following parameters ./drv -c 0x5 so that it runs the processing thread on core 2 as it is setup to. (also used the setup.sh script to allocate the huge pages to be used by the app).

       

      I would very much appreicate if one of  the developers could look over the code/run it very quickly and if they could point out params/config options I could use to improve performance. And what performance I should expect to be able to receive packets at on 1 RX queue/ 1 Port / 1 NIC when all I'm doing is counting packets.

       

      Jonathan