Dear Norman,
Thank you for sharing your plan. It enlights me on the big picture. I agree, it does not have to be bound in usage with the 'depot_download_manager'. My motivation was to ensure that a depot clean-up task would not interfere with others. Likely this would be part of an automated process for us. We can manage it out of the 'depot_download_manager' picture.
Would that configuration interface satisfy your needs?
Yes! The proposed configuration scheme fits our needs and yours, I assume, for using it interactively in Sculpt.
For the directory traversal and file operations, it may be useful to take the implementation of the depot_query and fs_tool components as inspiration.
Thanks for pointing those out. It will be helpful!
There is one open question, though: A pkg archive file can refer to other pkgs, which are implicitly installed.
The way I envision the implementation is as follows:
1. It creates a graph representing the depot state by traversing it. The graph is implemented with a dictionary. Each node uses as a key a 'Depot::Archive::Path' and as a value a list of 'Depot::Archive::Path' that are dependencies neighbours. Graph nodes can be of any archive type.
2. First, it goes through the packages. As you said, it registers dependencies. It also creates nodes for any dependencies archive pointing to their referenced 'pkgs'. Thus, this creates loops in the graph between dependencies.
3. It iterates over its config and performs the required actions.
4. When a package is deleted, it traverses the neighbour dependencies list. Colours them for deletion, and remove the package reference. If a node has an empty list of neighbours, it can be deleted safely, as it isn't in use any more.
It would be nice to include such implicitly installed pkgs in the garbage connection.
When a package depends on another package, it will be coloured for deletion as any other dependency.
However, there is a pitfall. If a package has another 'pkg' in its dependencies, it is unclear if it is here because it is present in the 'archives' list or because it is a dependency itself.
This can be solved by comparing the node neighbours list with the 'pkg/<name>/archives'. If it matches, the current 'pkg' node can be coloured for deletion. Otherwise, it means that this 'pkg' is also a dependency of another 'pkg'. Thus it is not coloured for deletion.
This way, I believe there is no need for persistent annotation of 'pkg' dependencies by the 'depot_download_manager'. I am concerned by the performance of such an algorithm and would have to finish a first implementation for certainty. As the dictionary is implemented with an AVL, it should perform in a reasonable time.
Do you think that the rough plan above is sensible?
It looks good to me. I will proceed in this direction.
Cheers, Alice