0 Replies Latest reply on Feb 6, 2012 11:16 AM by jonathanlynch

    DPDK Max RX throughput for 1 RX Queue/1 Port/ 1 NIC with 0% loss

    Community Member

      Hi All,

      I was reading test report documents (450257_450257_DPDK_Test_Report_Rev0.7.pdf and  450257_450257_DPDK_Test_Report_Rev0.8.pdf) and could not find any benhmarks relating to the Max RX throughput for 1 RX Queue/1 Port with 0% loss. The closest is in section "11 Benchmark Results for the Intel® DPDK Layer 3 Forwarding Tests" but  this shows performance for 1 queue on 4 NIC's for forwarding of  packets using LPM or hashing. I want to see performance for 1 queue / 1 port and 1 NIC.

      I have  attached some quick example/prototype code that I put together using the test-pmd  and sample applications as reference. All the code does is receives  packets and then just frees the mbuf rte_pktmbuf_free()  - so it  basically just counts packets. I have played around with the max_burst  param of rte_eth_rx_burst() and also for various values for  rx_conf.rx_free_thresh but no matter what combinations I tried it could not handle 1 Gbit/200,000 pps of traffic with 0% packet loss. It should be possible to achieve this?

      Im running on a ATCA blade with dual  X5670  @ 2.93GHz,  48Gb RAM DDR3  1066 (6x8gb, one 8  gb quad rank module per  channel) , 5520 tylersburg chipset. The 82599 is positioned on a mezzaine board  connected via PCI  Express:5.0Gb/s:Width x8


      I set the BIOS options that are mentioned in the test report.
      I made the following changes to the config to increase the size of the mem pool cache so all buffers would be in the cache.


      Some other relavent params set in the code are as follows:

      #define NUM_RX_DESC (4096) - passed to rte_eth_rx_queue_setup()
      #define MAX_PKT_BURST (64) - passed to rte_eth_rx_burst()

      #define MBUFS_PER_POOL (8192 * 16) - passed to rte_mempool_create()
      #define MBUF_CACHE_SIZE (8192 * 8) - passed to rte_mempool_create()


      I ran the app using the following parameters ./drv -c 0x5 so that it runs the processing thread on core 2 as it is setup to. (also used the setup.sh script to allocate the huge pages to be used by the app).


      I would very much appreicate if one of  the developers could look over the code/run it very quickly and if they could point out params/config options I could use to improve performance. And what performance I should expect to be able to receive packets at on 1 RX queue/ 1 Port / 1 NIC when all I'm doing is counting packets.