To determine the answer for certain, you would need to use an oscilloscope to examine the relative timing of the step and direction pulses (at least at the step and direction input pins to your drivers).
According to the instructions that you linked to (page 13), the signal applied to the direction pin of the driver must be present (and stable) at least 10 microseconds ahead of each step pulse. It looks like both the step and direction pulses are handled through opto-isolated inputs on your driver.
In microstepping modes, your driver apparently handles both the leading and trailing edge of the step pulse. The documentation also states that a 50% duty cycle on the step pulse gives the best operation of the driver's "smoothing" feature. My understanding of Sherline mode (which, admittedly, could be flawed) is that it gives both a longer-duration step pulse and uses a 50% duty cycle for the step pulse.
So that leads to some possible hypotheses about what is going on, but they are really only guesses that would need to be confirmed by testing with an oscilloscope.
Maybe there is inadequate "setup time" for the direction signal vs. the step signal, and maybe there is an asymmetry in the optoisolator response for a positive-going signal vs. a negative-going signal, which might result in a different value of the direction signal being asserted on the leading edge of the step signal vs. the trailing edge (at the point where the direction is being reversed). A different driver that only gets clocked on one edge of the step pulse might not react in the same way.
Maybe your driver's smoothing circuitry is not working well with your short step pulse (have you tried increasing the duration of the step pulse to perhaps 10 or 15 microseconds within Mach 3?).
Perhaps you could perform some experiments in a non-microstepping mode (full steps or half steps) to see if the problem goes away when the driver is not doing the step smoothing.