How are javascript builtin functions implemented in the engine?
Background
Time for a change. This marks the beginning of me learning about browsers!
The only time I’ve touched browsers before was in 2021 where I just regurgitated someone’s writeup for a linear OOBRW ctf challenge in the v8 engine. That was pretty lame and not real research. I now prefer to just explore like a developer rather than go straight in from the security aspect.
The first chapter will be exploring JS Builtins in the v8 engine. It’s kind of my first day doing browser research so don’t mind.
JS Builtins
JS Builtins in v8 are the runtime JS method you can invoke, either as an object method or independent function.
For the purpose of this discussion, we will focus on the Array.prototype.splice() method as an example.
1 | const months = ["Jan", "March", "April", "June"]; |
Torque
Torque is a language created for v8, intended for developers to express features described by the ECMAScript specification, while still allowing low level control over v8 specific optimizations. v8 encourages the use of Torque to implement builtin functions, as a replacement for CodeStubAssembler(CSA).
Some of the JS Builtin APIs have been re-implemented in Torque, including Array.prototype.splice().
The source code can be found in src\builtins\array-splice.tq:
1 | // https://tc39.github.io/ecma262/#sec-array.prototype.splice |
Torque is compiled by the torque compiler(at build time) into C++ code.
For Array.prototype.splice(), it is compiled into gen\torque-generated\src\builtins\array-splice-tq-csa.cc:
1 |
|
This is actually expressed using the CSA macros.
CodeStubAssembler
CSA is a set of macros aimed to describe control flow using TurboFan‘s graph format. It was the preferred way to write builtins, until Torque came out, where it was no longer necessary to write CSA manually since Torque will generate it.
In this generated CSA code:
1 | compiler::CodeAssemblerState* state_ = state(); |
We can see that each CSA function has an internal state that keeps track of how to describe the function in TurboFan graph format.
The TurboFan graph format is TurboFan’s IL, that can be easily compiled into machine code.
The TF_BUILTIN() macro expands to:
1 | // ---------------------------------------------------------------------------- |
The important part is:
1 | void Builtins::Generate_##Name(compiler::CodeAssemblerState* state) { \ |
This defines a function in the namespace v8::internal::Builtins.
In our example it would be:
1 | void Builtins::Generate_ArrayPrototypeSplice(compiler::CodeAssemblerState* state); |
Checkpoint
At this stage, we have described an implementation of Array.prototype.splice() in Torque, which will be compiled into C++ code that uses the CSA macros.
The C++ code globally defines a generator, that is capable of building a TurboFan graph that describes our implementation.
At this stage, we can invoke TurboFan on demand with the generator to get the resulting machine code of Array.prototype.splice().
Emit Machine Code
The part that converts all the CSA expressions into machine code is SetupIsolateDelegate::SetupBuiltinsInternal() in src\builtins\setup-builtins-internal.cc:
1 | // static |
This function first defines many macros to generate machine code from CSA.
The one related to us is:
1 |
|
CompileJSLinkageCodeStubBuiltin() eventually calls into CodeAssemblerTurboshaftCompilationJob::ExecuteJobImpl():
1 | PipelineCompilationJob::Status |
Where turboshaft_pipeline.GenerateCode() is used to compile the TurboFan graph into machine code.
But how is this macro invoked?
It is invoked by this other macro BUILTIN_LIST():
1 | // Build all builtins without jobs first. When concurrent builtin generation |
This macro takes in arguments of macros to call(that we just defined) and organizes them:
1 |
|
And in BUILTIN_LIST_FROM_TORQUE() we see our CSA generator listed:
1 |
|
These definitions are automatically generated by the Torque compiler.
The chain of macros will take the name ArrayPrototypeSplice and construct the generator function name Generate_ArrayPrototypeSplice(), call it to generate the TurboFan graph, and finally compile to machine code from there.
Bind to JS
After compilation, the compilation job calls the installer callback to handle the generated Code object, which points to the emitted machine code.
1 | // Generated builtins are temporarily stored in this array to avoid data races |
v8 first stores it in a temporary array.
Finally after all every function is compiled, these code objects will be bound to their individual global constants.
In our case, we are bound to the constant Builtin::kArrayPrototypeSplice:
1 | // Add the generated builtins to the isolate. |
Now in Genesis::InitializeGlobal() in src\init\bootstrapper.cc:
1 | // Set up %ArrayPrototype%. |
The base array prototype is initialized, and the splice() function we just compiled is added as a method using SimpleInstallFunction().
1 | Handle<JSFunction> SimpleInstallFunction(Isolate* isolate, |
This function creates a JSFunction object, and set its code pointer to the actual machine code.
At this point, we have successfully bound the builtin function Array.prototype.splice() and exposed it to javascript callers.
Cool!