Hello Alice,
I have a first implementation on the 'depot_remove' [1] branch. It can be improved or changed. Please note that this is a partial implementation. There are some TODOs comments. I also commented on it as much as possible for clarity.
thanks a lot for sharing!
As I'm short of time until the end of this month, I only had a cursory glance. I may miss parts of the picture but the solution looks more complex than I thought. For example, I'm unable to quickly assess if cyclic dependencies (two bad pkgs that refer to each other in their archives files) may pose a risk.
In my perception, the complexity comes from the approach of building up an internal representation (introducing notions of graph, vertex, edges, neighbor along the way) instead of working with the plain file system directly.
- Collect orphan archive reference by no PKG. Make that last step optional? As it requires traversing the depot for any other archive types. I am questioning myself if this is necessary.
That's what I had in mind in the first place - operating like a garbage collector. If we find that we ultimately need to traverse the depot anyway to implement this, I wonder what is gained by building up a cached internal representation of the depot structure beforehand. I foresee that we'd end up at a much more straight-forward solution by simply traversing the depot, and iterating this process until no further work can be done (no orphaned content remains in the depot), like I described in my previous posting.
If you don't find the idea worth pursuing, can you share why? Or may you give implementing it a try to see which version makes you more comfortable in terms of simplicity?
Cheers Norman