Fighting with new improved "resource system"

raghukamath · January 13, 2022, 5:29pm

It was built from ground up, hence it took time. As a first new release of a system created from scratch I would expect it to have errors and bugs. Unless they release how else they would know the problem. It will be improved upon.

Grum999 · January 13, 2022, 8:58pm

Query rewritten

Tested on 2 different databases.

Note: As I made my test on a SQL browser, I’ve just replaced bind values with hard coded string to be able to execute it easily
:resource_type = “paintoppresets”
:language = “en_GB”

Query	Database 1	Database 2
Original	4879ms	3706ms
Rewritten	1481ms	485ms
Returned rows	2645	2645

All queries from all databases return the same number of rows, as expected
Between database 1 & 2, the only difference is I’ve added some indexes on tables
– Adding indexes allows faster execution in all cases
– Rewritten query (with/without index) is faster to execute too

Original query

This is the original query, used for comparison.

SELECT tags.id                  as tag_id
                              ,      tags.url                 as tag_url
                              ,      tags.active              as tag_active
                              ,      tags.name                as tag_name
                              ,      tags.comment             as tag_comment
                              ,      resources.id             as resource_id
                              ,      resources.status         as resource_active
                              ,      storages.active          as resource_storage_active
                              ,      resources.name           as resource_name
                              ,      resources.filename       as resource_filename
                              ,      resources.tooltip        as resource_tooltip
                              ,      resources.thumbnail      as resource_thumbnail
                              ,      resources.status         as resource_status
                              ,      resources.md5sum         as resource_md5sum
                              ,      resources.storage_id     as storage_id
                              ,      storages.active          as storage_active
                              ,      storages.location        as location
                              ,      resource_types.name      as resource_type
                              ,      tag_translations.name    as translated_name
                              ,      tag_translations.comment as translated_comment
                              FROM   resources
                              ,      resource_types
                              ,      storages
                              ,      tags
                              ,      resource_tags
                              LEFT JOIN tag_translations ON tag_translations.tag_id = tags.id AND tag_translations.language = "en_GB"
                              WHERE  tags.id                    = resource_tags.tag_id
                              AND    tags.resource_type_id      = resource_types.id
                              AND    resources.id               = resource_tags.resource_id
                              AND    resources.resource_type_id = resource_types.id
                              AND    resources.storage_id       = storages.id
                              AND    resource_types.id          = resources.resource_type_id
                              AND    resource_types.name        = "paintoppresets"
                              AND    resource_tags.active       = 1
                              GROUP BY tags.id
                              ,        resources.name
                              ,        resources.filename
                              ,        resources.md5sum

Rewritten query

This is the rewritten query.

-- Work on a first query (WITH statement), on which we apply the GROUP BY to be sure to retrieve unique rows only
-- We only work with the 3 expected tables to reduces number of join and table/access
-- Also as GROUP BY id an aggregate function, we retrieve the greatest **resource_id**
-- /!\ This is an arbitrary choice, a min() could also be applied
--     By using a min() or max() method, we ensure that query always use the same method to return data from database
--    But I consider this as something not normal... if more than resource can be returned, why use one more than another one? may be selection criteria must be reviewed?
WITH initial_selection AS (
    -- initial "GROUP BY" columns
	SELECT tags.id
	       , resources.name
	       , resources.filename
	       , resources.md5sum
		   -- we need resource type later, as already retrieved here, just return it to avoid an 
		   -- additional join in secondary select statement
	       , resource_types.id as resource_type_id
           -- select on of resource, and return id as it will be mandatory in second SELECT statement to return resource 
	       , max(resources.id) as resource_id
	FROM  resource_types
		JOIN resources 
			 ON resources.resource_type_id = resource_types.id
		JOIN resource_tags 
			 ON resource_tags.resource_id  = resources.id
			AND resource_tags.active       = 1
		JOIN tags
			 ON tags.id                    = resource_tags.tag_id
			AND tags.resource_type_id      = resource_types.id
	-- ==> replace "paintoppresets"	with bind value :resource_type
	WHERE resource_types.name = "paintoppresets"	
GROUP BY tags.id
,        resources.name
,        resources.filename
,        resources.md5sum
-- need to add resource_types.id in group by
,        resource_types.id
)
SELECT 	-- use returned values from first SELECT statement
		initial_selection.id as tag_id
	   ,initial_selection.name as resource_name
	   ,initial_selection.filename as resource_filename
	   ,initial_selection.md5sum as resource_md5sum
	   ,initial_selection.resource_id as resource_id
	   -- And then other columns (from initial query)
	   -- 
       ,      tags.url                 as tag_url
       ,      tags.active              as tag_active
       ,      tags.name                as tag_name
       ,      tags.comment             as tag_comment
	   --
       ,      resources.status         as resource_active
       ,      resources.tooltip        as resource_tooltip
       ,      resources.thumbnail      as resource_thumbnail
	   -- resource_status == resource_active; the infirmation is returned twice
	   -- with different name, is it normal.
       ,      resources.status         as resource_status
       ,      resources.storage_id     as storage_id
	   --
       ,      storages.active          as storage_active
       ,      storages.location        as location
	   --
       ,      tag_translations.name    as translated_name
       ,      tag_translations.comment as translated_comment
	   -- I keep it because it was in original query, but for me it's not needed...
	   -- ==> replace with bind values
	   --, :resource_type as resource_type
	   ,"paintoppresets"	as resource_type
-- Second statement just complete data with other tables
FROM initial_selection 
	JOIN tags 
		ON tags.id = initial_selection.id
	   AND tags.resource_type_id = initial_selection.resource_type_id
	JOIN resources 
		ON resources.id = resource_id
    JOIN storages 
		ON storages.id = resources.storage_id
	LEFT JOIN tag_translations 
		ON tag_translations.tag_id =  initial_selection.id
		-- need to replace "en_GB" with bind value :language
	   AND tag_translations.language = "en_GB"

Database 2 indexes

Here statement for additional indexes that allows to optimize query

CREATE INDEX `tmp_resource_tags_actid` ON `resource_tags` ( `active` DESC, `resource_id` ASC );
CREATE INDEX `tmp_resources_typeid` ON `resources` ( `resource_type_id` ASC );
CREATE INDEX `tmp_tags_translation_langid` ON `tag_translations` ( `language` ASC, `tag_id` ASC );
CREATE INDEX `tmp_tags_typeidid` ON `tags` ( `resource_type_id` ASC, `id` ASC );

Databases statistics

Some statistics about records in tables in which tests has been made.

Table	Records
resource_tags	3665
resources	1122
tags_translations	230
tags	46
tags_storages	46

These optimization on database query doesn’t fix the main problems:
– need to understand why there’s more than one call per brush to update one tag in this loops (and probably this one and this one)
– need to call resetQuery() only one time when all tags are updated (with a signal or maybe a flag “updateInProgress” to avoid resetQuery() being executed in this case)
– possibly start a transaction
– probably, there’s other places where queries need to be optimized (like this one?)

I suppose something like that should work, but I’m not good enough in C++ to implement everything and also doesn’t have enough knowledge about current resource management implementation to tell if it’s a good solution or not

    // start database transaction 
    // within a started transaction, each call to resetQuery() exit method immediately
    m_tagResourceModel->beginTransaction()

    Q_FOREACH(int resourceId, m_resourceIds) {
        m_tagResourceModel->tagResource(tagsp, resourceId);
    }

    // commit database transaction
    m_tagResourceModel->endTransaction()

    // udpate model
    m_tagResourceModel->resetQuery()

Concerning measurement time I’ve made:

What I did was:

    auto begin = std::chrono::high_resolution_clock::now();
    Q_FOREACH(int resourceId, m_resourceIds) {
        m_tagResourceModel->tagResource(tagsp, resourceId);
    }
    auto end = std::chrono::high_resolution_clock::now();
    auto elapsed = std::chrono::duration_cast<std::chrono::nanoseconds>(end - begin);
    
    std::cout << "Global elapsed time: " << elapsed.count() * 1e-9 <<"\n";

This allows to easily see how much time this action need to be executed

If needed I can provide my resourcecache.sqlite database file (156MB)

Good luck

Grum999

Grum999 · January 13, 2022, 9:29pm

There’s bugs about performances yes.

From a user point of view I agree it could be like something worse than before.
Maybe Krita 5 was released to early, maybe not: even with a beta 6 or beta 7, not sure someone will gave a feedback on this specific problem

From a developer point of view, even if new system still need some fix, the new system is stronger and might be easier to extend for future improvements.
And probably, bugs might be easier to fix: on my side I never took a look on resource system before this morning.
It was not so hard to locate from where the problem came and made a proposition of bug fix.

After I’m not able to fix everything here, not enough knowledge for that and not experience to determinate all possible place on which there could be impacts; but my analysis will help I hope to fix the problem: at least, time I spent on subject to determinate where are the problem of performance is time developers won’t have to spent on their side.

Grum999

Grum999 · January 13, 2022, 10:15pm

Another case:

– First select of one preset in listview: elapsed time: 1.99095s
– Next select of one preset in listview: elapsed time: 0.1s
– Select the 143 preset in listview: elapsed time: 7.28s

Seems to be the call to KisWdgTagSelectionControllerOneResource::setResourceIds() (from wdgtagselection.cpp)

And then seems to be the call to KisWdgTagSelectionControllerOneResource::updateView() (from wdgtagselection.cpp)

And then seems to be the following loop:

    Q_FOREACH(int resourceId, m_resourceIds) {
        m_tagResourceModel->setResourcesFilter(QVector<int>() << resourceId);
        for (int i = 0; i < m_tagResourceModel->rowCount(); i++) {
            QModelIndex idx = m_tagResourceModel->index(i, 0);
            KisTagSP tag = m_tagResourceModel->data(idx, Qt::UserRole + KisAllTagResourceModel::Tag).value<KisTagSP>();
            tagsCounts[tag->url()] += 1;
        }
    }

From here what I can see:
– The same Q_FOREACH(int resourceId, m_resourceIds) than previously seen, where there’s 15 iteration per brush
15 * 143 = 2145 !!
– In this loop there’s another loop which iterate on 16 - 17 items
2145 * 16 = 34320 total

– Didn’t check yet what we have behind:
m_tagResourceModel->setResourcesFilter()
m_tagResourceModel->rowCount()
m_tagResourceModel->data()

And I need to go to sleep
And I normally won’t be at home this week-end; so I’ll try to continue to analyze this part next week…

Grum999

tiar · January 13, 2022, 11:31pm

Tbh, the problem with tagging is not really the database itself. I did try to work on it today. And I crashed head-first with the same problem as always… beginInsertRows.

if we don’t want to lose selections on views (and we don’t), we need to use beginInsertRows when adding new rows to the model.
beginInsertRows must be called before changing data.
beginInsertRows only works for one block of data (can’t be rows in different places; if it’s multiple rows, they must be one after the other).
KisTagResourcesModel uses QSqlQuery internally as a data source
when you untag, now the tag-resource pair is just marked “inactive”, not removed
but the model only shows the active pairs

Altogether, it means that to tag multiple resources, in case some of them were tagged with that tag before, there is a need to update the database multiple times, and after all of them you gotta make resetQuery to get the current query state…

(I really dislike Qt’s model-view system, I find it really badly designed at points).

I talked wih @Lyn3d today, and I, again, think whether just using resetModel() and then restoring the selection wouldn’t be a better idea… I mean I’m not sure if it would be faster, I doubt it, but at least code would be cleaner

Grum999 · January 14, 2022, 9:19am

For sure, but database access can really be improved.

Dividing database access by 10 is something that can’t be ignored even if it’s not the main origin of problem
Especially here as you can see, when user’s resources start to be “heavy” with hundreds of presets and a lot of tags.

I’m not comfortable with Qt model-view too, but I have treeview with more than 12000 items for which respond time is fast.

I’ve to admit I was a little bit lost in model implement for resource manager.
C++ is not my favorite language, the SQL part was easier to analyze for me

But for what I saw and understood of implementation, it seems that database access are made within model.
On my side if I had to implement it, I think I’ll decorelate database from model.

Do everything in a dedicated class for insert/update/delete/select
Provide result to model once everything is done

But once again, I’m not comfortable with Qt model-view and also don’t have the knowledge about entire implementation of resource manager in Krita.
Here I only see one managing window, but I suppose this is also used everywhere a resource (tag, pattern, preset, SeExpr, …) is used

Grum999

Mernandre · January 14, 2022, 11:30am

I’m not sure if this has been talked about or if it even has relevance, because I really can’t get through the programmer-talk; but might as well mention it. I also don’t think it’s worth creating a new thread for it, so I’ll just post here.

Sometimes, when you create a new/personal tag and add brushes to it, if you then remove them it will add brushes from some other tag that already exists(I think it might be random which tag it selects?). If one changes the view to another tag and back to the newly created one, they will then vanish; so it seems to me it is merely a visual bug, but I have had krita crash when this occurred.

The second thing is, when I first installed krita 5.0 it loaded really fast; after adding a couple of bundles to it, the load times are increased quite significantly. It doesn’t bother me that much, eventually I’ll create one big bundle that suits my needs and delete the others(but it would be nice to have an option to remove and not simply deactivate the existing bundles from within krita), from my testing, simply deactivating them doesn’t do anything for load times.

Grum999 · January 14, 2022, 11:56am

Fort what I can see, the KisAllTagResourceModel::resetQuery() is executed 8 times at startup on my side.
This could explain why startup time increase when bundle resources are added.

Grum999

tiar · January 14, 2022, 11:57am

Are you going to make a MR with the SQL query and the indeces? The indeces code would be in files create_index_[name].sql in libs/resources/sql, and the query can just be replaced. Indeces are no brainer but the sql query probably will probably need to be checked by someone closely.

Dmitry gave me today a pointer about tagging multiple resources at once issue so I’m just continuing wokring on that, and then will explore other things you wrote about. (Which is awesome, btw, you are so helpful here).

tiar · January 14, 2022, 12:09pm

It’s probably this bug: 448359 – The error of krita's own tag, you can subscribe to it if you want to know the progress. I think I know the cause.

Grum999 · January 14, 2022, 1:41pm

My local copy and my gitlab repository are in a state I’m still unable to fix it.
For my tests made here, I just downloaded last source code version and made modifications to it.
So I think it’s better to let you or another developper to check validity of my sql query modification and decide to use it or not.

that’s better yes
because I’m still wondering about real use and impact of the famous Group By clause

I hope you’ll get the solution
I won’t be able to look at it more before monday night but if something in my analysis need some additional details I can try to check

Grum999

tiar · January 14, 2022, 2:01pm

You were so helpful with the investigation that I would be surprised if we couldn’t improve the perofrmance at least a little bit it would be awesome howevery if you could send me (for example in private message) how we should credit you, I could just make a MR with your commits (for which I’d need a name, surname and email, at least the email cannot be fake because of KDE’s rules, there must be a way to contact you after some time if there are any issues with licensing or whatnot; I mean in theory name and surname shouldn’t be fake either… but note that then they become kinda public). Since you practically gave me two full patches (for indeces and sql query), that is the most appropriate approach.

Drayldan · January 15, 2022, 10:15pm

Unfortunately Rakurri’s brushes are somewhat heavy, maybe labeling them in small groups helps.

Well, but that might be a special case as I just can’t manually add any tag for one (!) or more brushes for that “Rakurri Brush Set v2.0”. :o That’s impossible in this case. :o I don’t know why.

Is there a way to create any log for devs (about this activity) ? I am not any programmer, tbh.

tiar · January 17, 2022, 2:09pm

What would be helpful is if you could copy your resource folder (make super sure you did copy it) into a safe place (like your Documents folder), then go to the old place and rename it to krita_backup (you’ll have two old copies). Restart Krita, import Rakurri bundle, and then try to tag. If it’s fast, then it’s about the amount of resources and that’s the reason developers didn’t notice it while developing (I never had many resources, especially since I had to clear the resource folder multiple times).

Grumm above did an excellent job pinpointing various stress points. I am already working on that. I hope it will be ready for 5.0.3. (though I don’t know when 5.0.3 will be). Maybe when I finish it, it will be a good reason to make a new release and 5.0.3 will be made only because of those changes

There are more people who have trouble with performance with many resources so much that they go back to Krita 5.0-beta2 or even 4.x to avoid them, so for me it’s a priority to fix this issue. (There are other important things too but other people work on them).

Oh and it would be useful if you could put the database file in a zip archive and share it somehow (there are those “30 days share” websites which you could use, or you can send me an email on tamtamy tymona @ gmail com, no dots or spaces or any special characters before @), if it’s not private. The information that is stored there is: names of all resources; locations of all resources (related to resource folder, so it doesn’t contain system username); dates and times of creating and modifying them; icons for resources (like presets icons - this is what you see in dockers). That would give me an idea how many resources you have.

Drayldan · January 23, 2022, 7:02pm

Well, by chance I made it!
By some miracle I set the tags one time
for Rakurri’s brushes (one after another brush).
So, I don’t really want to change anything yet, but maybe I’ll have time to do some more tests.

tiar · January 28, 2022, 5:07pm

Task for performance improvements: ⚓ T15245 Performance improvements plans for Resource Rewrite

Two MRs in progress (one nearly finished, one still needs work):
Draft: Improve performance of query in KisTagResourceModel (!1302) · Merge requests · Graphics / Krita · GitLab (nearly finished)
Draft: Tag and untag multiple resources at once (!1312) · Merge requests · Graphics / Krita · GitLab (still needs work, but the main coding is done, at least for the tagResource() and untagResource()).

From what we (me and @Grum999) achieved so far is:

~4x speed improvements in the first MR
even up to ~140x!!! speed improvements when tagging all resources at once (such big number was of course only when I was tagging all Rakurri’s brushes at once and it was I think already an 8th tag or something - but it went from 6 minutes of waiting (!) to just a little while).
(those speed improvements will combine, since the changes are independent)

And in case someone wonders why it wasn’t done like that from the get go… well, one of the lessons programmers learn is to never optimize too early (it’s called premature optimization if you want to look it up). The CPU is really fast; quite often the program just waits for the user to do something, not the other way around. Unless you know for a fact something will be used huge amounts of times, it’s best to write it in the simplest possible form (especially in GUI code - CPU is much faster than human’s reflexes or time sense). Otherwise it will probably turn out that you spent effort, time or other resources on making something just 1ms faster, even when it was already instant for the user. Often, the optimized code is longer and more complex (like in this case), and when it’s more complex, it’s more likely to contain bugs.

For comparison, I was working on those optimization and I think at least two people told me that it would be a good idea to move query->prepare() into a different place so it’s run only once instead of every time. As an idea, it’s not bad, but when I checked how much time it actually takes… using the numbers I have on hand, it took only 1ms, when the whole operation took 7527ms. So… I don’t think it’s worth spending time to gain just 1ms there

Rakurri · January 28, 2022, 5:30pm

Wow!! That’s amazing, great work you two!

system · February 19, 2022, 7:32pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

tiar · February 22, 2022, 2:37pm

I finished and merged both of those MRs, now tagging multiple brushes at once should be much, much faster. It’s already available in Krita Next, and hopefully tomorrow or the next day it will be available in Krita Plus as well. It will also be available in the next beta version that will be released next week or so.

CC @Drayldan