Show HN: I built a Rust crate for running unsafe code safely

(github.com)

112 points | by braxxox a day ago

63 comments

woodruffw a day ago
I don't think this meets the definition of "safe" in "safe" Rust: "safe" doesn't just mean "won't crash due to spatial memory errors," it means that the code is in fact spatially and temporally memory safe.
In other words: this won't detect memory unsafety that doesn't result in an abnormal exit or other detectable fault. If I'm writing an exploit, my entire goal is to perform memory corruption without causing a fault; that's why Rust's safety property is much stronger than crash-freeness.
[-]
- mirashii a day ago
  Even better, this library, with its use of unsafe and fork underneath, introduces a whole new class of undefined behavior to a program by providing a safe interface over an unsafe API without actually enforcing the invariants necessary for safety.
  In order for the fork() it calls to be safe, it needs to guarantee a bunch of properties of the program that it simply cannot. If this gets used in a multithreaded program that calls malloc, you've got UB. There's a long list of caveats with fork mentioned in some other comments here.
  In my view, this is not serious code and should be regarded as a joke. There's no actual value in this type of isolation.
  [-]
  - woodruffw a day ago
    Yep. I wanted to start from the high-level point of "safe doesn't mean doesn't crash," but you're right that the technique itself is unsound.
    [-]
    - pclmulqdq a day ago
      In rust terminology, "safe" actually implies more frequent crashes on untrusted inputs.
      [-]
      - mirashii 12 hours ago
        No, "safe" implies that there's no undefined behavior across all inputs. Whether that's a crash or not is still up to the implementer of the code in question, same as any other language. It is your choice whether to use interfaces that crash or do not crash, that is not forced upon you by the language.
      - woodruffw 18 hours ago
        Why do you think this? The closest thing in “common” Rust would be unwraps/panics, but these are (1) not crashes per se, and (2) probably not more common than they would be in an equivalent C codebase.
        [-]
        pclmulqdq 18 hours ago
        "Panics are not crashes" is a new one. I'm referring to the fact that the rust code panics at the slightest sign of discomfort.
        And they are very much more common than in most C codebases. C codebases are generally often overly permissive in what they accept (hence to security bugs). Rust made a different trade.
        [-]
        woodruffw 4 hours ago
        > "Panics are not crashes" is a new one. I'm referring to the fact that the rust code panics at the slightest sign of discomfort.
        In this context, I'm using "crash" to mean something like a program fault, i.e. an uncontrolled termination orchestrated by the kernel rather than the program itself. Rust programs generally terminate in a controlled manner, even if that manner is analogous to an unchecked exception.
        It's also not my experience that Rust code, on average, panics on abnormal inputs. I've seen it happen, but the presence of e.g. safe iterators and access APIs means that you see a lot less of the "crash from invalid offset or index" behavior you see in C codebases.
        (However, as pointed out in the adjacent thread, none of this really has anything to do with what "safe" means in Rust; controlled termination is one way to preserve safety, but idiomatic Rust codebases tend to lean much more heavily in the "error and result types for everything" direction. This in and of itself is arguably non-ideal in some cases.)
        mubou 11 hours ago
        > I'm referring to the fact that the rust code panics at the slightest sign of discomfort.
        That's kind of up to you as the developer though. I generally avoid writing functions that can panic -- I'd even argue any non-test code that panics is simply poorly written, because you can't "catch" a panic like you can in a high-level language. Better to return an error result and let the calling code decide how to handle it. Which often means showing an error to the user, but that's better than an unexpected crash.
  - wizzwizz4 a day ago
    Well, you can close all file descriptors (except the pipe used for sending the return value back to the parent), re-mmap all files with MAP_PRIVATE, and then use SECCOMP_SET_MODE_STRICT to isolate the child process. But at that point, what are you even doing? Probably nothing useful.
    If there were a Quick Fix for safety, we'd probably have discovered it by now.
    [-]
    - jmillikin a day ago
      > use SECCOMP_SET_MODE_STRICT to isolate the child process. But at that > point, what are you even doing? Probably nothing useful.
      The classic example of a fully-seccomp'd subprocess is decoding / decompression. If you want to execute ffmpeg on untrusted user input then seccomp is a sandbox that allows full-power SIMD, and the code has no reason to perform syscalls other than read/write to its input/output stream.
      On the client side there's font shaping, PDF rendering, image decoding -- historically rich hunting grounds for browser CVEs.
      [-]
      - Animats a day ago
        The classic example of a fully-seccomp'd subprocess is decoding / decompression.
        Yes. I've run JPEG 2000 decoders in a subprocess for that reason.
      - WesolyKubeczek a day ago
        Well, it seems that lately this kind of task wants to write/mmap to a GPU, and poke at font files and interpret them.
- NoahKAndrews a day ago
  It's not just that it won't crash, it means that an exploit in the unsafe code won't allow corrupting memory used by the rest of the program
  [-]
  - woodruffw a day ago
    This is pretty immaterial from an exploit development perspective:
    1. The forked process has a copy of the program state. If I'm trying to steal in-process secrets, I can do it from the forked process.
    2. The forked process is just as privileged as the original process. If I'm trying to obtain code execution, I don't care which process I'm in.
    This is why Chrome at al. have full-fledged sandboxes that communicate over restricted IPC; they don't fork the same process and call it a day.
nextaccountic a day ago
There is a way to sandbox native code without forking to a new process, and it looks like this
https://hacks.mozilla.org/2020/02/securing-firefox-with-weba...
Firefox employs processes for sandboxing but for small components they are not worth the overhead. For those they employed this curious idea: first compile the potentially unsafe code to wasm (any other VM would work), then compile the wasm code to C (using the wasm2c tool). Then use this new C source normally in your program.
All UB in the original code becomes logical bugs in the wasm, that can output incorrect values but not corrupt memory or do things that UB can do. Firefox does this to encapsulate C code, but it can be done with Rust too
[-]
- panstromek a day ago
  That's actually a pretty clever idea, I never realized you can that. Thanks for sharing.
  [-]
  - int_19h 13 hours ago
    Note that the reason why this works for sandboxing is that wasm code gets its own linear memory that is bounds-checked. Meaning that the generated C code will contain those checks as well, with the corresponding performance implications.
- dmitrygr a day ago
  You can skip all this nonsense with
```
    -fsanitize=undefined
```
  [-]
  - Georgelemental a day ago
    Not foolproof, doesn’t catch everything.
  - rcxdude 12 hours ago
    The sanitize tools are not intended to be hardening tools, just debugging/testing tools. For instance, they may introduce their own vulnerabilities.
  - cyberax a day ago
    It won't do anything for data races, for example.
destroycom a day ago
This isn't mentioned anywhere on the page, but fork is generally not a great API for these kinds of things. In a multi-threaded application, any code between the fork and exec syscalls should be async-signal-safe. Since the memory is replicated in full at the time of the call, the current state of mutexes is also replicated and if some thread was holding them at the time, there is a risk of a deadlock. A simple print! or anything that allocates memory can lead to a freeze. There's also an issue of user-space buffers, again printing something may write to a user-space buffer that, if not flushed, will be lost after the callback completes.
pjmlp a day ago
Rather design the application from the start to use multiple processes, OS IPC and actual OS sandboxing APIs.
Pseudo sandboxing on the fly is an old idea and with its own issues, as proven by classical UNIX approach to launching daemons.
[-]
- vlovich123 a day ago
  What are the sandboxing APIs you’d recommend on Linux, Mac, & Windows? I haven’t been able to find any comprehensive references online.
  [-]
  - MaulingMonkey a day ago
    My starting point would be Chromium's documentation, as - presumably - chrome is one of the most widely used and battle tested, user-facing, third party sandboxes running on end user machines.
    Windows: https://chromium.googlesource.com/chromium/src/+/main/docs/d...
    Linux: https://chromium.googlesource.com/chromium/src/+/main/sandbo...
    OS X: https://chromium.googlesource.com/chromium/src/+/main/sandbo...
    With the caveat that I wouldn't necessairly assume this is the cutting edge at this point, and there might be other resources to invest in for server-side sandboxing involving containers or hypervisors, and that I've only actually engaged with the Windows APIs based on that reading.
    I wrote `firehazard` ( https://docs.rs/firehazard/ , https://github.com/MaulingMonkey/firehazard/tree/master/exam... ) to experiment with wrapping the Windows APIs, document edge cases, etc. - although if the long list of warnings in the readme doesn't scare you away, it'll hopefully at least confirm I hesitate to recommend my own code ;)
  - woodruffw a day ago
    macOS provides native sandboxing; you can use capabilities at the app level[1] or the sandbox-exec CLI to wrap an existing tool.
    For Windows, you probably want WSB[2] or AppContainer isolation[3].
    For Linux, the low-level primitives for sandboxing are seccomp and namespaces. You can use tools like Firejail and bubblewrap to wrap individual tool invocations, similar to sandbox-exec on macOS.
    [1]: https://developer.apple.com/documentation/xcode/configuring-...
    [2]: https://learn.microsoft.com/en-us/windows/security/applicati...
    [3]: https://learn.microsoft.com/en-us/windows/win32/secauthz/app...
    [-]
    - amarshall a day ago
      Linux also has Landlock now.
      macOS sandboxing is notoriously under-documented, has sharp edges, and is nowhere near as expressive as Linux sandboxing.
      [-]
      - anonzzzies a day ago
        To save a search; https://docs.kernel.org/userspace-api/landlock.html
      - woodruffw a day ago
        Thanks! Landlock is the one I couldn't remember.
        Agreed about macOS's sandboxing being under-documented.
wavemode a day ago
If you can afford to sacrifice that much performance just to run some potentially unsafe code, then you can probably afford to not be writing Rust in the first place and instead use a garbage-collected language.
[-]
- colinrozzi a day ago
  I think it is basically a garbage collector, just one that operates on a per-function level instead of at the general level of the program
  [-]
  - Rohansi 20 hours ago
    It's not really though because the UB still exists and can be exploited, just in a forked process.
- woodruffw a day ago
  This is presumably needed at the integration point, i.e. you already have some C/C++ code being integrated into Rust. So "write it in a different language" is not helpful advice, since all of the code in question is already written.
  (However, the technique here is itself not sound, per the other threads.)
djha-skin a day ago
This is cool from a theoretical perspective, but `fork()` can be prohibitively expensive, at least on the hot path. This is a cool tool that should be used with care.
[-]
- resonious a day ago
  The author seems aware of this given their "Run your code 1ms slower" remark in the use cases section.
- VWWHFSfQ a day ago
  Which pretty much makes this whole thing pointless since a lot of unsafe code exists purely for performance reasons.
  [-]
  - tombert a day ago
    That's what I was thinking; isn't 1 millisecond kind of an eternity in terms of computer performance?
    I'm sure there's still value in this project, but I'm not sure I'm versed enough in Rust to know what that is.
  - deciduously a day ago
    Far from all of it, seems like a big leap to write it off as pointless because of one subset.
Svetlitski a day ago
This is likely to violate async-signal-safety [1] in any non-trivial program, unless used with extreme care. Running code in between a fork() and an exec() is fraught with peril; it's not hard to end up in a situation where you deadlock because you forked a multi-threaded process where one of the existing threads held a lock at the time of forking, among other hazards.
[1] https://man7.org/linux/man-pages/man7/signal-safety.7.html
slashdev a day ago
I'd love to know what horrible library / code the author was using where sandboxing it like this seemed like the best alternative.
null_investor a day ago
Forking and this package can be useful if you know that the unsafe code is really unsafe and have no hope of making it better.
But I wouldn't use this often. I'd be willing to bet that you'd lose all performance benefits of using Rust versus something like Python or Ruby that uses forking extensively for parallelism.
[-]
- braxxox a day ago
  > have no hope of doing better
  Yeah, this is really the main use case. Its a relatively simple solution when you can't do any better.
  I think that's particularly helpful when you're invoking code you don't control, like calling into a some arbitrary C library.
TheDong 18 hours ago
This also means the function might not do what you want, i.e. if it takes a `&mut T` argument, that argument can't actually be mutated, and anything that relies on interior mutability, even if it's not a mut argument, also won't work.
Rust allows memory-impure things, like interior mutability of arguments, so you can get different (i.e. incorrect) results when using this to run otherwise fine rust code.
For example:
```
    fn some_fn(x: &mut i32) {
      *x = 2;
    }

    fn main() {
      let mux x = 1;
      mem_isolate::execute_in_isolated_process(|| {
        some_fn(&mut x);
      }).unwrap();
      println!("{x}"); // prints '1' even though without 'mem_isolate' this would be 2
    }
```
dijit a day ago
this seems like a good place to ask, I don’t write very much unsafe Rust code… but when I do, it’s because I’m calling the Win32 API.
Tools like valgrind do not work on windows, and I am nowhere near smart enough to know the entire layout of memory that should exist.
When using Windows and calling system system functions, there’s a lot of casting involved; to convert wide characters and DWORDS to rust primitives for example. And given that I don’t have a good debugging situation, I’m terrified that I’m corrupting or leaking memory.
does anyone know any good tools that work on windows to help me out here?
[-]
- bdhcuidbebe 15 hours ago
  > I don’t write very much unsafe Rust code… but when I do, it’s because I’m calling the Win32 API.
  Check out windows-rs instead.
  https://github.com/microsoft/windows-rs
  [-]
  - dijit 10 hours ago
    I did look at that first.
    In my case I am looking for NetUserAdd and associated functionality, which doesn’t exist in any wrapper crate I could find- since that would have been significantly easier than what I ended up needing to do.
    But, how do they test their unsafe bits?
- pjmlp a day ago
  There are plenty of tools, but they are C and C++ specific.
  Starts with Visual C++ analysers, SAL annotations, hardned runtime.
  Then commercial tooling like PVS Studio, Parasoft for example.
- wizzwizz4 a day ago
  The easy solution is, don't call system functions. Instead:
  • Work out what you want to do, conceptually.
  • Design a safe abstraction that would allow you to do that. (Consult the Win32 API documentation for concepts to use.)
  • Implement that abstraction using the Win32 API.
  That last step is way easier than trying to use the Win32 API throughout your program, you'll end up with significantly less unsafe code, and if anything does go wrong, it's much easier to fix.
  [-]
  - dijit a day ago
    that’s what I’m doing already, the issue is that unsafe code exists at all.
    In order to call the win32 API one must create structs and pass pointers to them into the function call.
    sending data is actually quite easy. But reading back data is quite difficult, in some cases you may have a list of something.
    Regardless, Rust is not helping me anymore in those bits, and since all of the tools that find memory issues target primarily C++, and rust mangles certain things for C+ + toolchains - I find myself a little bit stuck, I’m not a genius and I’ll take all the help I can get.
    [-]
    - wizzwizz4 a day ago
      The Win32 API's object model is (mostly) compatible with Rust's. Handles play well with OBRM. Does the winsafe crate provide the interfaces you need? https://docs.rs/winsafe/
      [-]
      - dijit 10 hours ago
        if it does, I can’t find it.
        I was looking for NetUserAdd and associated commands.
corank a day ago
> It forces functions to be memory pure (pure with respect to memory), even if they aren't.
What if the unsafe code is not supposed to be pure but mutates some memory? For example, does this allow implementing a doubly-linked list?
kelnos 21 hours ago
Please please please add a big huge warning to your crate that it should never be used in multi-threaded programs. fork() is not safe when there is more than one thread present, as the child process can easily deadlock (or worse) if the fork() happens at just the wrong time with respect to what other threads are doing.
jesprenj a day ago
Why use a pipe to communicate instead of shared memory?
[-]
- im3w1l a day ago
  It's much easier to reason about a child process sending you possibly corrupt objects over a pipe, compared to a child process possibly corrupting shared memory as you are reading it. I've read enough about processor level memory barriers to understand I don't really understand that at all.
cryptonector 16 hours ago
If you want this to be fast when used in processes with large resident set sizes create a thread and there use `vfork()` rather than `fork()`.
syrusakbary a day ago
This is super interesting! I would be very curious to see how we can get into even more safety when running WebAssembly in Wasmer with this crate (similar to V8 isolates).
Awesome work!
teknopaul a day ago
Hammer, nut.
Clever trick tho if you are in a bind.
loeg a day ago
As a joke, it's funny. Obviously you would not want to actually deploy this. I feel like most comments are too quick to criticize using this in prod (don't!) and missing the point.
[-]
- krick a day ago
  It's much more problematic how many comments praise it not as a joke. And, honestly, it doesn't seem like it was intended as a joke. It's a legitimately bad idea, that is treated as a good idea by some scary number of people.
m00dy a day ago
>>We call this trick the "fork and free" pattern. It's pretty nifty.
It should be called "fork and see" pattern instead :D
[-]
- chuckadams 20 hours ago
  “fork around and find out”