I've placed another pull request that only incorporates the timer / callback and leaving out the ring buffer as it had some issues. The code change is also much less:
https://github.com/micropython/micropython/pull/1713
it seems that by using SOF Irqs (start of frame) together with a txComplete callback a drastic speed improvement of up to 30x in VCP mode is achieved (see pull request for some numbers). Especially rshell is profiting, possibly also other applications like Slip.