[depot_autoremove] Removing outdated PGKs from the depot
Alice Domage
alice.domage at gapfruit.com
Fri Mar 10 17:43:42 CET 2023
Dear Norman,
Thank you for sharing your plan. It enlights me on the big picture. I
agree, it does not have to be bound in usage with the
'depot_download_manager'.
My motivation was to ensure that a depot clean-up task would not
interfere with others. Likely this would be part of an automated process
for us.
We can manage it out of the 'depot_download_manager' picture.
> Would that configuration interface satisfy your needs?
Yes! The proposed configuration scheme fits our needs and yours, I
assume, for using it interactively in Sculpt.
> For the directory traversal and file operations, it may be useful to
> take the implementation of the depot_query and fs_tool components as
> inspiration.
Thanks for pointing those out. It will be helpful!
> There is one open question, though: A pkg archive file can refer to
> other pkgs, which are implicitly installed.
The way I envision the implementation is as follows:
1. It creates a graph representing the depot state by traversing it. The
graph is implemented with a dictionary.
Each node uses as a key a 'Depot::Archive::Path' and as a value a list
of 'Depot::Archive::Path' that are dependencies neighbours.
Graph nodes can be of any archive type.
2. First, it goes through the packages. As you said, it registers
dependencies. It also creates nodes for any dependencies
archive pointing to their referenced 'pkgs'. Thus, this creates loops in
the graph between dependencies.
3. It iterates over its config and performs the required actions.
4. When a package is deleted, it traverses the neighbour dependencies
list. Colours them for deletion, and remove the package reference.
If a node has an empty list of neighbours, it can be deleted safely, as
it isn't in use any more.
> It would be nice to include such implicitly installed pkgs in the
> garbage connection.
When a package depends on another package, it will be coloured for
deletion as any other dependency.
However, there is a pitfall. If a package has another 'pkg' in its
dependencies, it is unclear if it is here because
it is present in the 'archives' list or because it is a dependency itself.
This can be solved by comparing the node neighbours list with the
'pkg/<name>/archives'. If it matches, the current 'pkg' node
can be coloured for deletion. Otherwise, it means that this 'pkg' is
also a dependency of another 'pkg'. Thus it is not coloured for deletion.
This way, I believe there is no need for persistent annotation of 'pkg'
dependencies by the 'depot_download_manager'. I am concerned by the
performance of such an algorithm and would have to finish a first
implementation for certainty. As the dictionary is implemented with an
AVL, it should perform in a reasonable time.
> Do you think that the rough plan above is sensible?
It looks good to me. I will proceed in this direction.
Cheers,
Alice
More information about the users
mailing list