[depot_autoremove] Removing outdated PGKs from the depot

Alice Domage alice.domage at gapfruit.com
Thu Mar 16 18:36:12 CET 2023


Dear Norman,

> Maybe it is beneficial to break down the problem even further. In 
> fact, depot archive types do not arbitrary depend on one another. 
> Specifically, binary archives cannot depend on each other. Also raw 
> archives have no dependencies. Src archives can only depend on api 
> archives but not on other src archives. Also api archives cannot have 
> dependencies. For this current discussion, I'd leave out src and api 
> archives anyway.
>
> The only case where a dependency tree of multiple levels is formed are 
> pkg archives depending on other pkg archives. With this observation, I 
> would only look at pkg archives at first. Scanning the depot for the 
> list of pkg archives should be quick enough. For each pkg, I would 
> ask: "should this pkg be removed?". The answer is given by the 
> <config>. To implement this step, there is no need to build an 
> internal data structure.
>
> Then, after having removed pkg archives, I'd read the content of all 
> remaining 'archives' files present in the depot, putting each line 
> into a dictionary (removing duplicates that way). Now we know all 
> archives that are still required.
>

Sorry, I was not very clear. I agree, at first we only traverse archives 
of type PKG to collect 'archives' dependency files.

> With this list (dictionary) gathered, we can again go through the 
> depot. For each bin or raw archive, we'd look whether it is featured 
> in our list or not. If not, we can remove the sub directory. For each 
> pkg archive, we look if it is either featured in our list or if it is 
> tagged as manually installed by the user. If neither is the case, we 
> can remove it as well, and remember that we should do another 
> iteration of garbage collection (now with the pkg removed, further 
> removals may become possible).
>

There is no need to create a complete implementation of a Graph data 
structure. As you describe with the Dictionary, I have something similar 
in mind to collect archives dependencies. I have named the top-level 
class that holds the Dictionary "graph". I should not if this is confusing.

The Dictionary would be used to associate an archive path with a list of 
PKG archive types it is referenced in. Thus, archives with no references 
after PKG deletion are identified, and archives referenced by a deleted 
PKG but still referenced by any other PKG(s) can be kept.


> But what if a pkg was manually installed by the user (lets say 
> "blue_backdrop") and also happens to be a dependency of another 
> dependent pkg (like "blue_backdrop_with_logo") installed separately?
>
> In this case, I would expect to keep the "blue_backdrop" when 
> uninstalling only the dependent pkg "blue_backdrop_with_logo". If the 
> "blue_backdrop" had been installed as a mere side effect of installing 
> "blue_backdrop_with_logo", I would expect to have it automatically 
> removed along with "blue_backdrop_with_logo".
>
> To take this decision, I think we have to preserve the information of 
> how each pkg entered the depot. Hence, my suggestion to explicitly 
> mark the pkg archives that entered the depot by user intent.
>

You are correct. I missed that. Thank you for explaining in details!




I have a first implementation on the 'depot_remove' [1] branch. It can 
be improved or changed. Please note that this is a partial 
implementation. There are some TODOs comments. I also commented on it as 
much as possible for clarity.


[1] https://github.com/a-dmg/genode/tree/depot_remove


Points that remain to be addressed:

  - Identify BIN archives, and provide 'arch' attribute to the 
configuration for this purpose.

  - Make the PKG deletion in place to remove PKG references in the 
Dictionary.

  - Collect orphan archive reference by no PKG. Make that last step 
optional? As it requires traversing the depot for any other archive 
types. I am questioning myself if this is necessary.

  - The configuration does not implement all config's nodes as 
discussed, only '<remove />' for instance.


You might be interested in the 'depot.h' file. I would suggest reading 
it from bottom to top. You can use the 'depot_remove' runscript, which 
has debug logs describing what's happening.


Let me know what you think about it. If you want it simplified, and if 
you have further suggestions?


I hope this is digestible enough for a pleasant review. Thank you very 
much for your time.


Cheers,

Alice






More information about the users mailing list