A few weeks back I was called in to help a customer who was experiencing problems completing Jetstress testing for an Exchange 2010 deployment. It wasn’t an issue of Jetstress reporting failed tests. Rather, they were unable to get through most of their tests without the Jetstress application actually crashing (JetstressWin.exe has stopped working). They would see the following after the Jetstress testing completed but before it could write any log files to disk.
The only Jetstress related error in the Application log was an ESE error with Event ID 482:
JetstressWin (3584) Instance3584.6: An attempt to write to the file “F:\DB\Jetstress006001.edb” at offset 63087017984 (0x0000000eb0478000) for 32768 (0x00008000) bytes failed over 0 seconds with system error 1117 (0x0000045d): “The request could not be performed because of an I/O device error.”. The write operation will fail with error –1022 (0xfffffc02). If this error persists then the file may be damaged and may need to be restored from a previous backup.
During the process of Jetstress completing a test run, it generates a large amount of I/O as it flushes anything in cache to disk. It was at this point that the Jetstress application was crashing. This behavior is normal but it’s an important clue because of the high disk I/O generated.
The customer was using vSphere 4.1 and the Exchange 2010 Mailbox servers were each configured with PVSCSI virtual SCSI controllers using VMDK files. As it turns out, they were hit with the PVSCI bug described in this VMware KB:
Windows 2008 R2 virtual machine using a paravirtual SCSI adapter reports the error: Operating system error 1117 encountered http://kb.vmware.com/kb/2004578
The interesting thing to note here is that although Exchange is specifically called out here in the KB, it doesn’t mention that it may cause the application (in this case Jetstress) to crash. The crashing led the team to troubleshoot Jetstress initially, thinking something was wrong with Jetstress and the various DLLs it requires to run.
At the end of the day the issue was resolved by following the instructions in the KB and changing the virtual SCSI driver to LSI Logic SAS. After making that change there were no subsequent issues with Jetstress.
In case you haven’t read the KB linked above, I want to note that this issue is resolved in all versions of vSphere from 4.1 to 5.0. You’ll need to install the updates described in the KB if you want to use the PVSCSI driver and vSphere 4.1 through 5.0 (it is resolved in vSphere 5.1).
Hopefully this helps anyone who might be experiencing this issue. I also hope it doesn’t dissuade anyone from using the PVSCSI driver for their business critical applications, as it can deliver better performance with lower CPU utilization when high I/O workloads are virtualized.