I had problems with this as well. Aside from cleaning up Windows and the look ahead setting, you can try running the SS at the lowest frequency that will get the job done.
For example, a typical setup could be 5mm ballscrews and G540. At 64kHz your max feedrate would be 377ipm... IF you can spin your steppers at 1920rpm with enough torque. One possible solution is to increase the microstep resolution on the stepper drive if possible.
Figure out the frequency you need for max rapids with the setup you have and set the SS to the closest frequency, and if it runs fine you can raiae the frequency till you get a buffer under run, then you can dial it back.
Really, if your calculated frequency for rapids is 40-50kHz, you probably don't need a SS - most processors from a few years ago pump this out easily with Mach3 DriverTest. Plus the lowly parallel port is capable of 2-2.5Mb/s... so that's not the bottleneck, it's Windows. But that's the price we pay for being able to use Windows with an infinite amount of possible configurations...
The LookAhead feature in Mach3 really helps with smoothing out movement especially with 3D moves. Default in Mach3 is an abysmal 10 lines, but with today's computers can be aet to 200. The max is 1000 but I wouldn't go too far with it.