NFS Read Perforrmance

Response Time/Latency

Below shows a graph of the response time ("latency"), measured in milliseconds per call, measured from the client (10.0.0.17) to the server (10.0.0.18) via 100BaseT network.

(raw data)

Note the knee at 1500 bytes. At this point, the nfs read response requires more than one UDP packet to respond. (The 1500 comes from the ethernet Maximum Transmission Unit (MTU)). Note the absence of bumps at 3000, 4500, etc. The number of packets per response is shown below.

Network Traffic

Below is a graph of the network traffic, measured on the server, in KBytes per second, for various operations. (raw data)

The graph below is similar, except that it shows the number of bytes per operation. Its derived by multiplying the latency times the bitrate above.

The response to the read request scales as the size of the file, and is given exactly as

response bytes = (requested file size) + 106 + 36 *(number of response packets)

The other traffic is constant, and independent of the file size. Each request/response requires exactly one packet, and the number of bytes in each packet is:

null call request = 82 bytes
null call response = 66 bytes
getattr request = 114 bytes
getattr response = 138 bytes
nfs read request = 126 bytes

Note that the getattr and read requests will contain the filepath, and thus the size will vary depending on the filename.

CPU Usage

Below is a graph of the cpu usage on the server, as a function of file size. Clearly, the graphs show a lot of noise; but why is not clear. The cpu usage data was collected with the 'vmstat' command, set to report every 10 seconds. The 'nullcall' and 'getattr' data represent approx. 30 samples taken over 300 seconds. The 'read' data represent 1x to 8x more points over a correspondingly longer time period. We conclude that either there is something noisy about how the kernel keeps cpu usage data, or that the kernel scheduling algorithms are inherently noisy.

The graph below shows the cpu-usage, in microseconds per call. We get this graph by multiplying the percent-busy data by the elapsed-time data. It shows the actual cycles burned to satisfy one request, inclusive of context switches, interrupts, nbetwork processing, etc. Note the null call and getattr call sit nearly on top of one another: this is consistent with earlier data, where the getattr call takes only 2.75 microsecond more than nullcall.

The next graph shows the number of interrupts handled per read operation. It seems to stairstep an extra sixth of an interrupt per UDP packet. Note that getattr and nullcall take two interrupts per operation.

In all cases, (nullcall, getattr, and read) context switches remain constant at three per operation.

Environment

Unless otherwise noted, the experimental setup as follows:

NFS Version 2 protocol
client machine: 10.0.0.17 Intel Pentium-4 1700 MHz, 256 KB cache, 512MB RAM kernel Linux-2.4.3-SGI_XFS_1.0.1
server machine: 10.0.0.18 Intel Pentium-4 1700 MHz, 256 KB cache, 896MB RAM kernel Linux-2.4.3-SGI_XFS_1.0.1
network: Ethernet 100BaseT switched full-duplex (addtron ??)