RSS Tuning Whitepaper notes and clarifications

This morning this caught my eye on SQLServerCentral.com: Maximizing SQL Server Throughput with RSS Tuning (it came with a catchy SQLCAT logo!)

So I clicked, I read, I downloaded and read some more. To summarize (BIG summary!), this paper says that in pre-2008 Windows server, the networking protocol stack didn’t scale well to multiple CPUs because incoming traffic (“receive protocol processing”) was handled by a max of one CPU. One!

Windows Server 2003 SP1 and earlier versions do not allow multiple processors to concurrently process receive indications from a single-network adapter.

Restriction is caused by the architecture of the Network Driver Interface Specification (NDIS) in Windows Server 2003…

Receive-Side Scaling is configured by default in Win2008+, but I’m looking to see if we can and should fine tune it from the 4 CPU default on our servers. What’s the gold standard here? What consitutes “optimized”?  This seems to answer that question:

The optimal RSS implementation would be for the network adapter to support PCI v3.0 MSI-X and to have as many hardware receive queues as there are CPUs in the system.

I wrote to the paper’s contact with a couple of questions, and Joe Nievelt answered very promptly: “CPUs” in this case does refer to cores, of course. “Also note that it does not include hyper-threaded logical processors, as RSS will only use one logical processor per physical core.”

I also asked if there was a way to tell whether receive-side queuing is currently an issue for a particular server. His answer (in part), for your and my edification:

The biggest indicator would be a situation where you cannot approach maximum link speed and are not constrained by some other bottleneck such as CPU or disk.  In such a case you will probably see a few cores very near 100% which would correspond to the receive queue processing cores which cannot keep up with the traffic. 

He went on to say that it’s unlikely to run into this kind of thing in my particular situation, “as SQL should be able to scale out to the [full complement of] cores even if there are only 4 processors handling the receive queues”.

Thanks again, Joe!

-Jen McCown
www.MidnightDBA.com/Jen