Hi Ben,
IMO you're bringing up a very relevant and important question.
The file type identification could be severe for the users, as file parsing and media rendering libraries do not have a particularly good track record when it comes to security [1]. I guess this would not be as dramatic in your envisioned architecture, in case another component enforce read-only file access and results are exported through a report ROM.
Still, in the shared scenario we must assume the identification component to run arbitrary attacker-controlled code if it accesses one single malicious file. Then it could (a) attack its client through malformed ROM content and (b) provide valid, but misleading file information as you mention.
Case (a) could be mitigate through a trusted filter component between the indexer and its client. Case (b), however, may trick the user into treating one file type as another. Like, clicking on an executable masqueraded as a harmless text file. I have no real good scenario on that, yet, but it definitely feels bad.
It feels much worse when considering general-purpose image rendering or font rendering. In that case, none of your output handled by that central component would be trusted anymore. An attacker could trick you into digitally singing a contract you had no intention to sign, spending money etc.
I'd encourage you to actually *measure* the impact of file type identification with one fresh process per file versus one central component for all files it that's feasible. If the result is as poor as we expect, it would be interesting to find ways to improve performance while keeping separation.
To reflect a question back to the list: What would be a good concept to support a scenario where a complex component (like Ben's file identification or a heavyweight JVM) is preinitialized and used as a boilerplate to quickly spawn independent child processes?
Cheers, Alex
[1] https://nvd.nist.gov/vuln/detail/CVE-2017-11421
On 20.12.2017 03:16, Nobody III wrote:
For example, a file manager would benefit from content-based file type identification. However, identifying files within the file manager (e.g. using libmagic) could pose a security risk. However, using a discrete component for this could have a noticeable effect on performance. If we use a single component instance for all of the files, the only significant added overhead would be from IPC, AFAIK. This seems to be acceptable in terms of performance, but a malicious file may be able to cause the component to misidentify other files in the directory, which could be a security risk. A more secure method would be to run an instance of the component for each individual file in the directory. However, this may substantially reduce performance for large directories, depending on the overhead of component creation. Would performance be an issue here? (And am I overestimating the risk of file misidentification?)
Similar cases include icon/thumbnail rendering, general-purpose image loading, and text rendering from many different sources. Would there be any notable differences for these?