Do a search on phase accumulators for the background but basically...
if you have a finite sized accumulator (i.e. adder register) and add a smaller or same sized value to it at regular intervals it will overflow at a frequency equal to the ratio of 2 raised to the power of the register size, to the addend. The overflow (carry out) can be used as a step signal for your axis.
For example, using an 8 bit accumulator, add 3 to the accumulating sum once every millisecond and the carry out from the accumulator will occur every 256/3 milliseconds.
similarily by adding 256 every millisecond you will have a step every 256/256=1 millisecond.
Note that the frequency resolution is 1 bit :-)
The computational overhead involves a load register, an add, a branch if carry set, and an output bit.... very fast ;-)
Also note that the vector velocity(below) is only calculated once before the relatively long (time) move.
So to succesfully interpolate your two axes:
1. select a maximum velocity and use as the frequency of your interrupt timer.
2. calculate the vector velocity = feedrate / sqrt((dist X * dist X) + (dist Y * dist Y)).
so that feed X = dist X * vector velocity
and feed Y = dist Y * vector velocity
3. use an X and a Y accumulator in your interrupt routine to sum feed X and feed Y
4. track your distance travelled to end motion with a zero velocity input.
The motion will complete approximately simultaneously within the tolerances of your arithmetic.