Hi Wolfgang,
* Wolfgang Schmidt <wolfgang.schmidt@...243...> [2014-12-24 09:28:03 +0100]:
i would propose something quite boring:
Automated tests of each component and script and giving the results on a web page so one can easily see if everything is up and running together with an update of the documentation.
Why? […]
This means at the moment if I use something and anything goes wrong there are many possible suspects:
- Documentation is outdated.
- Documentation has something missing which is written in some release
note/documentation
- Information from last release note is also already outdated
- I have done something wrong
- some component does not work as expected (e.g. from time to time you read
"haven't used it for some time, script no longer works")
- problem with the correct setup or the way components work together.
Every issue with a component can be a possible security bug. […]
How? […]
Actually, it is not boring at all so let me elaborate on this topic in more detail for a moment :-)
As a matter of fact we are already employing some kind of CI and run automated tests. Every night the current staging branch is built and most of all run scripts are executed in Qemu as well as on a variety of real machines (¹). Thereby we are promptly notified if something does not build properly for and/or does not run well on a specific platform. Most of the time we can isolate and fix the problem quickly.
In our experience and obviously yours problems arise if you try to use components in a way that is not very well tested. This includes more sophisticated scenarios as well as using drivers on untested hw. To mitigate the problems one needs to do more extensive testing. On the scenario part that is exactly the problem we want to address.
As Norman remarked in his E-Mail, so far we have usually focused our time on creating components for a particular scenario and arranging them to perform a certain task. Since time is what we mainly lack most not all components are as well tested as we would like them to be. The components which basically form the foundation of Genode (base and os) are getting preferential treatment in that regard, though. That said, the more you move away from the components that form the foundation, the more it gets difficult for us to test them and it is more likely that the component in question “does not get much love”. For instance drivers are a problem because we can only test them on hw we actually have at our disposal. That is the part where it is instrumental and more than welcome to have the community test them on unknown hw. Therefore, if someone reports a problem we try to fix it as time permitts. To simplify this process it is helpful to provide as much information as possible (used version of Genode, kernel, …) (²).
Now, at this point let us talk about the documentation since it should help a user to provide the information and to use the system and OTOH a developer to work with Genode. To sum up your remarks about it I would call the current stage merely inconvenient regarding its accessability. The API description mostly deals with the foundation of Genode whereas most features are described in the release notes. This makes accessing the information one seeks arguably laborious. It also is to some extend only valid for the corresponding master branch (³). (To be honest, I personally look more at the source code than the actual documentation itself and therefore do not know at which point it currently is outdated). If the real documentation(⁴) is out-dated or inaccurate, please point out where so we can fix it. Likewise, if there is something that is not as clear as it should be, I encourage you to bring it to our attention.
All in all reading your and the other E-Mails regarding this topic, I think we are on the right track for planing the next year. We already discussed internally to provide access to our buildbot and autopilot log files for the Genode community which apparently is desired. Making the system as a whole more accessible goes hand in hand with addressing the currently voiced issues.
Cheers Josef
(¹) To get a bit more technical we use the well-known buildbot tool for building all components and use our own autopilot tool afterwards to execute the scenarios. The testing facility consists of several ARM boards (among others, Arndale, Panda boards and by now even a RPI) as well as a few common x86 machines ranging from older c2d to i-core series 3rd generation hw.
(²) In that regard I treat a log file of the executed scenario as mandatory and the used run script as sufficient information. Providing the run script enables us to examine the issue more directly and on certain accounts it can be turned into a regression test.
(³) Therefore it makes the life of an developer easier if he/she works with the master branch only.
(⁴) Altough we document the current state of Genode comprehensively in the release notes, these are merely a timed snapshot that tend to be treated as documentation, IMHO.
PS:
Regarding your problems with the blkcache and rump_fs, we are aware of certain issues with this combination and the reason why it is not already fixed is, alas, simply a lack of time and priority. We do not trigger this behaviour in our current use cases or — for better or the worse — are able to mitigate its effects, therefore it gets a low priority. Since it has a low priority someone will eventually work on fixing it when time permitts. Sure — that is suboptimal but that is sadly just the way it (currently) is. Well in any case, rest assure that it has a higher priority by now.