My board is installed with DH8920 and E5-2658 to implement IPsec decryption and encryption. I wanna reach 20Gbps throughput for IPsec which is declared in DH8920 product document.
DPDK and QAT driver is installed and my application software is very similar with the example "dpdk_qat" provided by QAT1.5 with some changes to handle with the IPsec encapsulation.
QAT is used in user space with DP API.
In my host application I enable 8 logical cores and each core uses 1 instance for qat and has 1 rx queue for dpdk. So since there are only four engines for crypto, every 2 instances with ring size set to 4096(max value I can configure) share 1 engine.
But when I test with the performance of the application I encounter 1 problem. Hopefully someone can help me.
If I Send more than 10Gbps (11Gbps) IPsec encrypted pkts to my board with iperf3 tool, I find that sometimes for most of the instances CPA_STATUS_RETRY is returned after capCySymDpEnqueueOp is called leading to a lot of pkts lost, and after a while it recovered to normal. I know this means the request queue of QAT is full and nothing more can be put into it. Since the document tell us the DH8920 supports 20Gbps, I am really puzzled about:
1) Why could the request queue of QAT be full for somewhile when the traffic is only 11Gbps more or less? If it reaches the max capability why does that happen only for sometimes.
2) When that happen I increases the frequency of polling the responses by icp_sal_CyPollDpInstance, but no use and I find nearly 98% of results of icp_sal_CyPollDpInstance, are CPA_STATUS_RETRY.
3) I just adjust all the parameters in dpdk_qat like polling frequency and the frequency of setting PerformNow to CPA_TRUE as the intel QAT performace document recommanded.
It seems that sometimes for the qat instance there are only requests accumulating without any responses leading to tx queue full and after a while a lot of responses come out in a short time leading to recovery.
How can I find the reason and increase the performance to 20Gbps without any request queue of QAT full? by the way, if the performance is 20Gbps, how many is for 1 instance or 1 crypto engine which is not mentioned in document.
Thanks a lot !