Trying out a plugin capture (using the Blackhole virtual audio cable, which seems to work a treat), switched the GPU acceleration off (I'm on a GPU-weak Macbook Air M3), but switched it back on when there wasn't any progress after 5 minutes. Using optimized training, estimated time is around 40 minutes. Wouldn't mind doing this more often if only it wouldn't clog up the entire machine so much. Anything I'm doing comes with brutal latency, be it tabbing through things, typing or whatsoever.
In case this capture comes out ok, I'll possibly queue some up and batch process them on a day when I'm not at home.