Roadmap 2015

Wed Dec 24 16:37:09 CET 2014

Hi Wolfgang,

* Wolfgang Schmidt <wolfgang.schmidt at ...243...> [2014-12-24 09:28:03 +0100]:
> i would propose something quite boring:
>
> Automated tests of each component and script and giving the results on a web 
> page so one can easily see if everything is up and running together with an 
> update of the documentation.
> 
> Why?
> […]
> 
> This means at the moment if I use something and anything goes wrong there 
> are many possible suspects:
> - Documentation is outdated.
> - Documentation has something missing which is written in some release 
> note/documentation
> - Information from last release note is also already outdated
> - I have done something wrong
> - some component does not work as expected (e.g. from time to time you read 
> "haven't used it for some time, script no longer works")
> - problem with the correct setup or the way components work together.
> 
> Every issue with a component can be a possible security bug.
> […]
>
> How?
> […]

Actually, it is not boring at all so let me elaborate on this topic
in more detail for a moment :-)

As a matter of fact we are already employing some kind of CI and run
automated tests. Every night the current staging branch is built and
most of all run scripts are executed in Qemu as well as on a variety
of real machines (¹).  Thereby we are promptly notified if something
does not build properly for and/or does not run well on a specific
platform. Most of the time we can isolate and fix the problem quickly.

In our experience and obviously yours problems arise if you try to
use components in a way that is not very well tested. This includes
more sophisticated scenarios as well as using drivers on untested hw.
To mitigate the problems one needs to do more extensive testing. On
the scenario part that is exactly the problem we want to address.

As Norman remarked in his E-Mail, so far we have usually focused our
time on creating components for a particular scenario and arranging
them to perform a certain task. Since time is what we mainly lack most
not all components are as well tested as we would like them to be.
The components which basically form the foundation of Genode (base
and os) are getting preferential treatment in that regard, though.
That said, the more you move away from the components that form the
foundation, the more it gets difficult for us to test them and it is
more likely that the component in question “does not get much love”.
For instance drivers are a problem because we can only test them on hw
we actually have at our disposal. That is the part where it is
instrumental and more than welcome to have the community test them on
unknown hw. Therefore, if someone reports a problem we try to fix it as
time permitts. To simplify this process it is helpful to provide as
much information as possible (used version of Genode, kernel, …) (²).

Now, at this point let us talk about the documentation since it should
help a user to provide the information and to use the system and OTOH
a developer to work with Genode. To sum up your remarks about it I
would call the current stage merely inconvenient regarding its
accessability. The API description mostly deals with the foundation of
Genode whereas most features are described in the release notes. This
makes accessing the information one seeks arguably laborious. It also
is to some extend only valid for the corresponding master branch (³).
(To be honest, I personally look more at the source code than the
actual documentation itself and therefore do not know at which point
it currently is outdated). If the real documentation(⁴) is out-dated or
inaccurate, please point out where so we can fix it. Likewise, if there
is something that is not as clear as it should be, I encourage you to
bring it to our attention.

All in all reading your and the other E-Mails regarding this topic,
I think we are on the right track for planing the next year. We already
discussed internally to provide access to our buildbot and autopilot
log files for the Genode community which apparently is desired. Making
the system as a whole more accessible goes hand in hand with addressing
the currently voiced issues.

Cheers
Josef

(¹) To get a bit more technical we use the well-known buildbot tool
    for building all components and use our own autopilot tool
    afterwards to execute the scenarios. The testing facility consists
    of several ARM boards (among others, Arndale, Panda boards and by
    now even a RPI) as well as a few common x86 machines ranging from
    older c2d to i-core series 3rd generation hw.

(²) In that regard I treat a log file of the executed scenario as
    mandatory and the used run script as sufficient information.
    Providing the run script enables us to examine the issue more
    directly and on certain accounts it can be turned into a regression
    test.

(³) Therefore it makes the life of an developer easier if he/she works
    with the master branch only.

(⁴) Altough we document the current state of Genode comprehensively
    in the release notes, these are merely a timed snapshot that tend
    to be treated as documentation, IMHO.

PS:

Regarding your problems with the blkcache and rump_fs, we are aware
of certain issues with this combination and the reason why it is not
already fixed is, alas, simply a lack of time and priority. We do not
trigger this behaviour in our current use cases or — for better or
the worse — are able to mitigate its effects, therefore it gets a low
priority. Since it has a low priority someone will eventually work on
fixing it when time permitts. Sure — that is suboptimal but that is
sadly just the way it (currently) is. Well in any case, rest assure
that it has a higher priority by now.