Interesting article! One thing that made me literally LOL was the fact that several exploits were enabled via a Google "style recommendation" that caused on-heap length fields to be signed and thus subject to sign-extension attacks.
The conversation-leading-up-to-that played out a bit like this in my head:
Google Engineer #1: Hey, shouldn't that length field be unsigned? Not like a negative value ever makes sense there?
GE#2: Style guide says no
GE#1: Yeah, but that could easily be exploited, right?
GE#2: Maybe, but at least I won't get dinged on code review: my metrics are already really lagging this quarter
GE#1: Good point! In fact, I'll pre-prepare an emergency patch for that whole thing, as my team lead indicated I've been a bit slow on the turnaround lately...
> The fact that unsigned arithmetic doesn't model the behavior of a simple integer, but is instead defined by the standard to model modular arithmetic (wrapping around on overflow/underflow), means that a significant class of bugs cannot be diagnosed by the compiler.
Fair enough, but signed arithmetic doesn't model the behavior of a "simple integer" (supposedly the mathematical concept) either. Instead, overflow in signed arithmetic is undefined behavior. Does that actually lead to the compiler being able to diagnose bugs? What's the claimed benefit exactly?
I believe some logic behind may be that you can't recognize an overflow has happened with unsigned, but with signed you can recognize over and underflows in certain cases by simply checking if it's a non-negative number.
At least I believe Java decided on signed integers for similar reasons. But if it's indeed UB in C++, it doesn't make sense.
> One of the little experiments I tried was asking people about the rules for unsigned arithmetic in C. It turns out nobody understands how unsigned arithmetic in C works. There are a few obvious things that people understand, but many people don't understand it.
No, it's the opposite. UNSIGNED overflow wraps around. SIGNED overflow is undefined behavior.
This leads to fun behavior. Consider these functions which differ only in the type of the loop variable:
int foo() {
for (int i = 1; i > 0; ++i) {}
return 42;
}
int bar() {
for (unsigned i = 1; i > 0; ++i) {}
return 42;
}
If you compile these with GCC with optimization enabled, the result is:
foo():
.L2:
jmp .L2
bar():
mov eax, 42
ret
That is, foo() gets compiled into an infinite loop, while the loop in bar() is eliminated instead. This is because the compiler may assume only in the first case that i will never overflow.
A sanitizer or static analysis or any other tool can unconditionally give you a warning/error on signed integer overflow. Whereas that's invalid for unsigned integers as they have well-defined behavior, and things depend on said overflow (hashing, bitwise magic, temporary wrapping that unwraps later, etc).
Ideally there'd be a third type for unsigned-non-wrapping-integer (and llvm even supports a UB-on-unsigned-wrapping flag for arith ops in its IR that largely goes unused for C/C++), but alas such doesn't exist. Half-relatedly, this previously appeared as a discussion point on Linux (though Linus really did not like the concept of multiple unsigned types and as such it didn't go anywhere iirc).
The signed length fields pre-date the sandbox, and at that point being able to corrupt the string length meant you already had an OOB write primitive and didn't need to get one via strings. The sandbox is the new weird thing, where now these in-sandbox corruptions can sometimes be promoted into out-of-sandbox corruptions if code on the boundary doesn't handle these sorts of edge cases.
I've recently become the maintainer of https://github.com/godotjs/GodotJS (TypeScript bindings + JS runtime for Godot). GodotJS supports numerous runtimes, but V8 is the most well supported. Unfortunately, I have been noticing V8's GC a bit more than I would like recently.
Don't get me wrong, I'm aware V8 wasn't designed with games in mind. QuickJS (which is also supported by GodotJS) is probably the safer bet. Or you know, not JavaScript at all. However, I'm building tooling specifically for kids to make games, and TypeScript is leagues ahead in terms of usability:
Before I make the swap to QuickJS out of necessity, I was hoping to try my hand at tuning V8's GC for my use case. I wasn't expecting this to be easy, but the article doesn't exactly instill me with confidence:
> Simply tuning the system appears to involve a dose of science, a dose of flailing around and trying things, and a whole cauldron of witchcraft. There appears to be one person whose full-time job it is to implement and monitor metrics on V8 memory performance and implement appropriate tweaks. Good grief!
If anyone reading this has experience with tuning V8's GC to minimize stop-the-world GC duration (at the cost of overall memory use, or runtime performance etc.) I'd greatly appreciate any advice that can be offered.
Not really. I've written a bunch of code to try maintain the limited support for it that already exists in GodotJS, but I've never really tried it. Main reason I haven't is I'm dependent on Web Worker(-like) APIs in GodotJS, and they're currently missing for JavaScript Core. But since I actually wrote some of those APIs, that's not really an excuse, I can port them easily enough.
So, yeah, I should really give it a shot. Thanks for the reminder.
The fact that they are working so hard to retrofit a GC into the V8 C++ code IMO calls into question their premise for not using Rust, namely that the majority of V8 security flaws are in the JIT generated code, not V8 itself. That always seemed to be a flimsy excuse to keep using an insecure language for what is supposed to be a very secure language runtime for untrusted code, but now it’s laughable on its face.
I don't envy these engineers having to trace through corruptions and other issues related to moving GC's, just keeping a simple regular toy GC from blowing up can be hard enough sometimes (Maybe they have some better tools, but memory corruptions are inherently prickly to debug).
It's an interesting article because tech articles rarely revisit the past for what kind of decisions were made and why. Thanks! Also always cool to see a Wingo article because I get exposed to a field I know very little about (how garbage collection works).
It's a measure of time spent working on something, to standardise comparisons of work capacity and acknowledge that it's not always full time, especially when aggregating the time from different people. One full time person = 1 FTE.
For example if you work 20 hours a week on project A and 20 hours on project B, then project A will count your contribution as 0.5 FTE while you're assigned to that project.
If you also have two other people working on it full timee, and a project manager working 1 day a week on it, then project A will count the contribution from all three of you as 2.7 FTE. (2.7 = 0.5 + 2 + 0.2).
This example assumes 1fte=40 hours which is not nexessarily the case in all countries or under all collective agreements. 1fte can be 36, 38, or even 48 hours.
I think FTE is mostly used as a 'unit'. E.g. if two people work on something 50% of the time, you get one "FTE-equivalent", as there is roughly one full-time employee of effort put in.
Though in this context it just seems to be the number of people working on the code on a consistent basis.
* “Full Time Employee” (which can itself mean “not a part-timer” in a place that employs both, or “not a temp/contractor” [in which case the “full-time” really means “regular/permanent”]) or
* “Full Time Equivalent” (a budgeting unit equal to either a full time worker or a combination of part time workers with the same aggregate [usually weekly] hours as constitute the standard for full-time in the system being used.)
Interesting article! One thing that made me literally LOL was the fact that several exploits were enabled via a Google "style recommendation" that caused on-heap length fields to be signed and thus subject to sign-extension attacks.
The conversation-leading-up-to-that played out a bit like this in my head:
Google Engineer #1: Hey, shouldn't that length field be unsigned? Not like a negative value ever makes sense there?
GE#2: Style guide says no
GE#1: Yeah, but that could easily be exploited, right?
GE#2: Maybe, but at least I won't get dinged on code review: my metrics are already really lagging this quarter
GE#1: Good point! In fact, I'll pre-prepare an emergency patch for that whole thing, as my team lead indicated I've been a bit slow on the turnaround lately...
Quote from their style guide:
> The fact that unsigned arithmetic doesn't model the behavior of a simple integer, but is instead defined by the standard to model modular arithmetic (wrapping around on overflow/underflow), means that a significant class of bugs cannot be diagnosed by the compiler.
Fair enough, but signed arithmetic doesn't model the behavior of a "simple integer" (supposedly the mathematical concept) either. Instead, overflow in signed arithmetic is undefined behavior. Does that actually lead to the compiler being able to diagnose bugs? What's the claimed benefit exactly?
Tools like UBsan [1] can detect integer overflow in debug builds, and are used internally at Google to run automated tests.
So if you use a signed integer, there is a chance that overflows are caught in tests.
1. https://clang.llvm.org/docs/UndefinedBehaviorSanitizer.html
I believe some logic behind may be that you can't recognize an overflow has happened with unsigned, but with signed you can recognize over and underflows in certain cases by simply checking if it's a non-negative number.
At least I believe Java decided on signed integers for similar reasons. But if it's indeed UB in C++, it doesn't make sense.
For C23 at least: https://gustedt.wordpress.com/2022/12/18/checked-integer-ari...
Gosling on the matter,
> One of the little experiments I tried was asking people about the rules for unsigned arithmetic in C. It turns out nobody understands how unsigned arithmetic in C works. There are a few obvious things that people understand, but many people don't understand it.
-- https://www.artima.com/articles/james-gosling-on-java-may-20...
It’s the opposite in cpp: unsigned integer overflow is undefined but signed overflow is defined as wrapping
No, it's the opposite. UNSIGNED overflow wraps around. SIGNED overflow is undefined behavior.
This leads to fun behavior. Consider these functions which differ only in the type of the loop variable:
If you compile these with GCC with optimization enabled, the result is: That is, foo() gets compiled into an infinite loop, while the loop in bar() is eliminated instead. This is because the compiler may assume only in the first case that i will never overflow.Did you mix up unsigned and signed by mistake? Because in C and C++, the wrapping one is unsigned and the here-be-dragons-on-overflow one is signed.
A sanitizer or static analysis or any other tool can unconditionally give you a warning/error on signed integer overflow. Whereas that's invalid for unsigned integers as they have well-defined behavior, and things depend on said overflow (hashing, bitwise magic, temporary wrapping that unwraps later, etc).
Ideally there'd be a third type for unsigned-non-wrapping-integer (and llvm even supports a UB-on-unsigned-wrapping flag for arith ops in its IR that largely goes unused for C/C++), but alas such doesn't exist. Half-relatedly, this previously appeared as a discussion point on Linux (though Linus really did not like the concept of multiple unsigned types and as such it didn't go anywhere iirc).
The signed length fields pre-date the sandbox, and at that point being able to corrupt the string length meant you already had an OOB write primitive and didn't need to get one via strings. The sandbox is the new weird thing, where now these in-sandbox corruptions can sometimes be promoted into out-of-sandbox corruptions if code on the boundary doesn't handle these sorts of edge cases.
I've recently become the maintainer of https://github.com/godotjs/GodotJS (TypeScript bindings + JS runtime for Godot). GodotJS supports numerous runtimes, but V8 is the most well supported. Unfortunately, I have been noticing V8's GC a bit more than I would like recently.
Don't get me wrong, I'm aware V8 wasn't designed with games in mind. QuickJS (which is also supported by GodotJS) is probably the safer bet. Or you know, not JavaScript at all. However, I'm building tooling specifically for kids to make games, and TypeScript is leagues ahead in terms of usability:
https://breaka.club/blog/why-were-building-clubs-for-kids
Before I make the swap to QuickJS out of necessity, I was hoping to try my hand at tuning V8's GC for my use case. I wasn't expecting this to be easy, but the article doesn't exactly instill me with confidence:
> Simply tuning the system appears to involve a dose of science, a dose of flailing around and trying things, and a whole cauldron of witchcraft. There appears to be one person whose full-time job it is to implement and monitor metrics on V8 memory performance and implement appropriate tweaks. Good grief!
If anyone reading this has experience with tuning V8's GC to minimize stop-the-world GC duration (at the cost of overall memory use, or runtime performance etc.) I'd greatly appreciate any advice that can be offered.
Have you explored using Apple's javascript core engine at all? I know bun was built on it, but I don't know much else about it.
Not really. I've written a bunch of code to try maintain the limited support for it that already exists in GodotJS, but I've never really tried it. Main reason I haven't is I'm dependent on Web Worker(-like) APIs in GodotJS, and they're currently missing for JavaScript Core. But since I actually wrote some of those APIs, that's not really an excuse, I can port them easily enough.
So, yeah, I should really give it a shot. Thanks for the reminder.
The fact that they are working so hard to retrofit a GC into the V8 C++ code IMO calls into question their premise for not using Rust, namely that the majority of V8 security flaws are in the JIT generated code, not V8 itself. That always seemed to be a flimsy excuse to keep using an insecure language for what is supposed to be a very secure language runtime for untrusted code, but now it’s laughable on its face.
I don't envy these engineers having to trace through corruptions and other issues related to moving GC's, just keeping a simple regular toy GC from blowing up can be hard enough sometimes (Maybe they have some better tools, but memory corruptions are inherently prickly to debug).
rr (https://rr-project.org/) and memory watchpoints are a godsend when it comes to analysing heap corruptions.
Absolutely. Things that took hours or days to debug before take mere minutes once I have an rr recording.
It's an interesting article because tech articles rarely revisit the past for what kind of decisions were made and why. Thanks! Also always cool to see a Wingo article because I get exposed to a field I know very little about (how garbage collection works).
What does FTE stand for?:
> From what I can tell, there have been about 4 FTE from Google over this period
It stands for "Full Time Equivalent".
It's a measure of time spent working on something, to standardise comparisons of work capacity and acknowledge that it's not always full time, especially when aggregating the time from different people. One full time person = 1 FTE.
For example if you work 20 hours a week on project A and 20 hours on project B, then project A will count your contribution as 0.5 FTE while you're assigned to that project.
If you also have two other people working on it full timee, and a project manager working 1 day a week on it, then project A will count the contribution from all three of you as 2.7 FTE. (2.7 = 0.5 + 2 + 0.2).
In the Google context, “FTE” actually stands for “Full-Time Employee”, as opposed to “TVC” = “Temp/Vendor/Contractor”.
This example assumes 1fte=40 hours which is not nexessarily the case in all countries or under all collective agreements. 1fte can be 36, 38, or even 48 hours.
Full Time Employee
Is this a codeword for "not contractor"? I heard that at google contractors are second class citizens.
I think FTE is mostly used as a 'unit'. E.g. if two people work on something 50% of the time, you get one "FTE-equivalent", as there is roughly one full-time employee of effort put in.
Though in this context it just seems to be the number of people working on the code on a consistent basis.
FTE can mean either:
* “Full Time Employee” (which can itself mean “not a part-timer” in a place that employs both, or “not a temp/contractor” [in which case the “full-time” really means “regular/permanent”]) or
* “Full Time Equivalent” (a budgeting unit equal to either a full time worker or a combination of part time workers with the same aggregate [usually weekly] hours as constitute the standard for full-time in the system being used.)
Yeah, 1 FTE just equals 40 work-hours.
>at google contractors are second class citizens
This is the case at many companies to avoid contractors being considered employees.
FTE is a TLA.