[depot_autoremove] Removing outdated PGKs from the depot
Alice Domage
alice.domage at gapfruit.com
Thu Mar 16 18:36:12 CET 2023
Dear Norman,
> Maybe it is beneficial to break down the problem even further. In
> fact, depot archive types do not arbitrary depend on one another.
> Specifically, binary archives cannot depend on each other. Also raw
> archives have no dependencies. Src archives can only depend on api
> archives but not on other src archives. Also api archives cannot have
> dependencies. For this current discussion, I'd leave out src and api
> archives anyway.
>
> The only case where a dependency tree of multiple levels is formed are
> pkg archives depending on other pkg archives. With this observation, I
> would only look at pkg archives at first. Scanning the depot for the
> list of pkg archives should be quick enough. For each pkg, I would
> ask: "should this pkg be removed?". The answer is given by the
> <config>. To implement this step, there is no need to build an
> internal data structure.
>
> Then, after having removed pkg archives, I'd read the content of all
> remaining 'archives' files present in the depot, putting each line
> into a dictionary (removing duplicates that way). Now we know all
> archives that are still required.
>
Sorry, I was not very clear. I agree, at first we only traverse archives
of type PKG to collect 'archives' dependency files.
> With this list (dictionary) gathered, we can again go through the
> depot. For each bin or raw archive, we'd look whether it is featured
> in our list or not. If not, we can remove the sub directory. For each
> pkg archive, we look if it is either featured in our list or if it is
> tagged as manually installed by the user. If neither is the case, we
> can remove it as well, and remember that we should do another
> iteration of garbage collection (now with the pkg removed, further
> removals may become possible).
>
There is no need to create a complete implementation of a Graph data
structure. As you describe with the Dictionary, I have something similar
in mind to collect archives dependencies. I have named the top-level
class that holds the Dictionary "graph". I should not if this is confusing.
The Dictionary would be used to associate an archive path with a list of
PKG archive types it is referenced in. Thus, archives with no references
after PKG deletion are identified, and archives referenced by a deleted
PKG but still referenced by any other PKG(s) can be kept.
> But what if a pkg was manually installed by the user (lets say
> "blue_backdrop") and also happens to be a dependency of another
> dependent pkg (like "blue_backdrop_with_logo") installed separately?
>
> In this case, I would expect to keep the "blue_backdrop" when
> uninstalling only the dependent pkg "blue_backdrop_with_logo". If the
> "blue_backdrop" had been installed as a mere side effect of installing
> "blue_backdrop_with_logo", I would expect to have it automatically
> removed along with "blue_backdrop_with_logo".
>
> To take this decision, I think we have to preserve the information of
> how each pkg entered the depot. Hence, my suggestion to explicitly
> mark the pkg archives that entered the depot by user intent.
>
You are correct. I missed that. Thank you for explaining in details!
I have a first implementation on the 'depot_remove' [1] branch. It can
be improved or changed. Please note that this is a partial
implementation. There are some TODOs comments. I also commented on it as
much as possible for clarity.
[1] https://github.com/a-dmg/genode/tree/depot_remove
Points that remain to be addressed:
- Identify BIN archives, and provide 'arch' attribute to the
configuration for this purpose.
- Make the PKG deletion in place to remove PKG references in the
Dictionary.
- Collect orphan archive reference by no PKG. Make that last step
optional? As it requires traversing the depot for any other archive
types. I am questioning myself if this is necessary.
- The configuration does not implement all config's nodes as
discussed, only '<remove />' for instance.
You might be interested in the 'depot.h' file. I would suggest reading
it from bottom to top. You can use the 'depot_remove' runscript, which
has debug logs describing what's happening.
Let me know what you think about it. If you want it simplified, and if
you have further suggestions?
I hope this is digestible enough for a pleasant review. Thank you very
much for your time.
Cheers,
Alice
More information about the users
mailing list