Work has been pretty fun the last two days.
I’ve been fiddling with a piece of software since mid December that collects data from various input devices in 100 milisecond intervals, charts it live, records it, etc. While it’s doing that it’s also driving output devices. So, the client hooks up a motor (as an example) to the system, instructs the software to run it from 0 to 500 rpm over 5 minutes, keep it there for a half an hour, then let it drop back down all the while taking the temperature every 100msec. It’s kinda neat.
Problem: It tends to “lose” time when run for long periods. I did some optimization on it last month, sent it back, and a 67 hour test took 9 minutes longer than it should have to complete. Believe it or not, that was a sigificant improvement. I did a little more digging, made a couple other changes, and ran a 16 hour test last night that was 33 seconds too long. If you extrapolate that out it means I should be 2 minutes too slow on a 67 hour test. However, they’re looking to be accurate within 3 minutes on a 1,000 hour test.
So, more work is needed.
Now, here’s the funny thing: If I do a 10-20 minute test I come in right on the dot plus or minus a tenth of a second. Obviously the pattern isn’t totally linear.
Alright, so I’m gonna need a better idea of what’s going on inside the software while it’s running. I figure the best place to start is getting an idea of when the software starts missing it’s 100ms performance goal. If I get that then maybe I can figure out if the problem is external to the software and perhaps an issue with other software being installed on the PC. Maybe it’s a virus scanner kicking on in the middle of the night, I don’t know.
So far what I’ve done with the application itself is pretty simple: It keeps track of how many tasks its completed, how many msec that’s taken, and if it falls below an average of 100msec per task the Sleep() steps are skipped (along with a few other tricks) and we run balls-out until we come back to a 100msec average. Crude, but it’s working better than before. Previously if it wasn’t able to start on one of the 100msec intervals it just skipped that round and waited even longer. Not cool.
So, the client sees this, sees that it’s working substantially better, and asks about finding a “hook” into the main application for his I/O code that he’s responsible for. Basically just looking for a signal that the I/O layer needs to kick it up a notch because we’re behind schedule. Sure, I can do that.
The first thing that comes to mind is using a shared memory segment like the Unix SysV IPC stuff, but whatever the Win32 equivalant is. So I start coding that up Wednesday afternoon. I got my learning cap on and figured it all out… but I just know I’m going to have to provide some sample code for the internal developer so he knows how to hook into that shared memory segment. He might know how to do it, and I’m 100% certain he could learn it on his own, but I really should verify that I can read this shared data in case he has any trouble with this.
Then the light bulb goes off in my head Wednesday night as I’m driving to the gym: Why not just make a 2nd application that monitors that shared memory data? I can use that to get instant feed back on the internal status of the main application without having to make it do something as ugly as log them out to a console or file. So, that’s what I did Thursday. The shared memory segment was expanded to include a few more data points that I’d like to continually monitor. I then created a new dialog based application that polls that segment once a second and displays it on screen and logs it out to a text file.
So, now I’ve got the sample code I wanted to provide, I’ve got an application that’ll help me diagnose my specific timing issue, and the client will have an application that they can use to do the same thing in the future!
Yay!
I had so much fun doing this (seriously) that I figured I’d share it with y’all. This is the most fun I’ve had coding in a LOOONG time.