[depot_autoremove] Removing outdated PGKs from the depot

Alice Domage alice.domage at gapfruit.com
Fri Mar 10 17:43:42 CET 2023


Dear Norman,

Thank you for sharing your plan. It enlights me on the big picture. I 
agree, it does not have to be bound in usage with the 
'depot_download_manager'.
My motivation was to ensure that a depot clean-up task would not 
interfere with others. Likely this would be part of an automated process 
for us.
We can manage it out of the 'depot_download_manager' picture.

> Would that configuration interface satisfy your needs?

Yes! The proposed configuration scheme fits our needs and yours, I 
assume, for using it interactively in Sculpt.

> For the directory traversal and file operations, it may be useful to 
> take the implementation of the depot_query and fs_tool components as 
> inspiration.

Thanks for pointing those out. It will be helpful!

> There is one open question, though: A pkg archive file can refer to 
> other pkgs, which are implicitly installed. 

The way I envision the implementation is as follows:

1. It creates a graph representing the depot state by traversing it. The 
graph is implemented with a dictionary.
Each node uses as a key a 'Depot::Archive::Path' and as a value a list 
of 'Depot::Archive::Path' that are dependencies neighbours.
Graph nodes can be of any archive type.

2. First, it goes through the packages. As you said, it registers 
dependencies. It also creates nodes for any dependencies
archive pointing to their referenced 'pkgs'. Thus, this creates loops in 
the graph between dependencies.

3. It iterates over its config and performs the required actions.

4. When a package is deleted, it traverses the neighbour dependencies 
list. Colours them for deletion, and remove the package reference.
If a node has an empty list of neighbours, it can be deleted safely, as 
it isn't in use any more.

> It would be nice to include such implicitly installed pkgs in the 
> garbage connection. 

When a package depends on another package, it will be coloured for 
deletion as any other dependency.


However, there is a pitfall. If a package has another 'pkg' in its 
dependencies, it is unclear if it is here because
it is present in the 'archives' list or because it is a dependency itself.


This can be solved by comparing the node neighbours list with the 
'pkg/<name>/archives'. If it matches, the current 'pkg' node
can be coloured for deletion. Otherwise, it means that this 'pkg' is 
also a dependency of another 'pkg'. Thus it is not coloured for deletion.


This way, I believe there is no need for persistent annotation of 'pkg' 
dependencies by the 'depot_download_manager'. I am concerned by the 
performance of such an algorithm and would have to finish a first 
implementation for certainty. As the dictionary is implemented with an 
AVL, it should perform in a reasonable time.

> Do you think that the rough plan above is sensible?

It looks good to me. I will proceed in this direction.

Cheers,
Alice



More information about the users mailing list