Well to be honest I expected the difference to be more noticeable too, there’s definitely a couple bottlenecks that need to be adressed to make an “wow” kind of difference especially for those more expensive brushes (like those using texture patterns), but it’s not so insignifficant under the right conditions.
As far as I’ve discovered, there’s 2 main aspects of the painting vectorized:
- The creation of the mask for “auto” brush tips (default, soft and gaussian…), which of course are cached for static brush sizes, but dynamic brush sizes need to recalculate those a lot.
- The most heavily used blending modes, the simple “Over” blender that is called Normal in Krita, and the special blending that is used to combine dabs in Wash painting mode, called “AlphaDarken”.
I can say that the blending functions themselevs are ~6-8 times faster with AVX2+FMA on my aging Haswell CPU, but there’s just a bunch more to be done until the brush stroke is on your screen.
Blending currently has some further limitations, they are only implemented for 8-bit channels and 32-bit float channels, and for the latter AlphaDarken is disabled due to a bug (that I just found btw.) and alpha-locking aswell as disabling some channels makes it fall back to scalar code.
Unfortunately, fixing the float bug in AlphaDarken didn’t help much, turns out converting the float image to your display color space by LCMs is REALL expensive, that’s eating all the performance
I’m currently working on 16-bit integer support. In the “FreehandStrokesBenchmark” which simulates the whole painting process (with 4 layers IIRC) I’m seeing
30-60% (correction now:) ~45-90% better performance, yet that’s rather hard to actually feel in Krita.
Though as you keep stacking layers, the effect becomes more noticeable.
I’ve also managed to include alpha locking (also use for inherit alpha)
I’m also trying to figure out why Krita will rarely actually reach 100% CPU usage when painting…memory bandwidth may be a limiting factor too.