This specification defines asm.js, a strict subset of JavaScript that can be used as a low-level, efficient target language for compilers. This sublanguage effectively describes a safe virtual machine for memory-unsafe languages like C or C++. A combination of static and dynamic validation allows JavaScript engines to employ an ahead-of-time (AOT) optimizing compilation strategy for valid asm.js code.
This specification is working towards a candidate draft for asm.js version 1. A prototype implementation of an optimizing backend for asm.js is in progress for Mozilla's SpiderMonkey engine.
This specification defines asm.js, a strict subset of JavaScript that can be used as a low-level, efficient target language for compilers. The asm.js language provides an abstraction similar to the C/C++ virtual machine: a large binary heap with efficient loads and stores, integer and floating-point arithmetic, first-order function definitions, and function pointers.
The asm.js programming model is built around integer and floating-point arithmetic and a virtual heap represented as a typed array. While JavaScript does not directly provide constructs for dealing with integers, they can be emulated using two tricks:
As an example of the former, if we have
an Int32Array
view of the heap called HEAP32, then we can load the
32-bit integer at byte offset p:
HEAP32[p >> 2]|0
The shift converts the byte offset to a 32-bit element offset, and
the bitwise coercion ensures that an out-of-bounds access is coerced
from undefined back to an integer.
As an example of integer arithmetic, addition can be performed by taking two integer values, adding them with the built-in addition operator, and coercing the result back to an integer via the bitwise or operator:
(x+y)|0
This programming model is directly inspired by the techniques pioneered by the Emscripten and Mandreel compilers.
The asm.js sub-language is defined by a static type system that can be checked at JavaScript parse time. Validation of asm.js code is designed to be "pay-as-you-go" in that it is never performed on code that does not request it. An asm.js module requests validation by means of a special prologue directive, similar to that of ECMAScript Edition 5's strict mode:
function MyAsmModule() {
"use asm";
// module body
}
This explicit directive allows JavaScript engines to avoid performing pointless and potentially costly validation on other JavaScript code, and to report validation errors in developer consoles only where relevant.
Because asm.js is a strict subset of JavaScript, this specification only defines the validation logic—the execution semantics is simply that of JavaScript. However, validated asm.js is amenable to ahead-of-time (AOT) compilation. Moreover, the code generated by an AOT compiler can be quite efficient, featuring:
Code that fails to validate must fall back to execution by traditional means, e.g., interpretation and/or just-in-time (JIT) compilation.
Using an asm.js module requires calling its function to obtain an object containing the module's exports; this is known as linking. An asm.js module can also be given access to standard libraries and custom JavaScript functions through linking. An AOT implementation must perform certain dynamic checks to check compile-time assumptions about the linked libraries in order to make use of the compiled code.
This figure depicts a simple architecture of an AOT implementation that otherwise employs a simple interpreter. If either dynamic or static validation fails, the implementation must fall back to the interpreter. But if both validations succeed, calling the module exports executes the binary executable code generated by AOT compilation.
Within an asm.js module, all code is fully statically typed and limited to the very restrictive asm.js dialect. However, it is possible to interact with recognized standard JavaScript libraries and even custom dynamic JavaScript functions.
An asm.js module can take up to three optional parameters, providing access to external JavaScript code and data:
ArrayBuffer
to act as the asm.js heap.
These objects allow asm.js to call into external JavaScript (and to share its heap buffer with external JavaScript). Conversely, the exports object returned from the module allows external JavaScript to call into asm.js.
So in the general case, an asm.js module declaration looks like:
function MyAsmModule(stdlib, foreign, heap) {
"use asm";
// module body...
return {
export1: f1,
export2: f2,
// ...
};
}
Function parameters in asm.js are provided a type annotation by means of an explicit coercion on function entry:
function diag(x, y) {
x = +x; // x has type double
y = +y; // y has type double
return +sqrt(square(x) + square(y));
}
These annotations serve two purposes: first, to provide the function's type signature so that the validator can enforce that all calls to the function are well-typed; second, to ensure that even if the function is exported and called by external JavaScript, its arguments are dynamically coerced to the expected type. This ensures that an AOT implementation can use unboxed value representations, knowing that once the dynamic coercions have completed, the function body never needs any runtime type checks.
The following is a simple but complete example of an asm.js module.
function DiagModule(stdlib) {
"use asm";
var sqrt = stdlib.Math.sqrt;
function square(x) {
x = +x;
return +(x*x);
}
function diag(x, y) {
x = +x;
y = +y;
return +sqrt(square(x) + square(y));
}
return { diag: diag };
}
In a JavaScript engine that supports AOT compilation of asm.js, calling the module on a true global object would produce a fully compiled exports object:
var fast = DiagModule(window); // produces AOT-compiled version console.log(fast.diag(3, 4)); // 5
By contrast, calling the module on a standard library object
containing something other than the true Math.sqrt would
fail to produce compiled code:
var bogusGlobal = {
Math: {
sqrt: function(x) { return x * 2; }
}
};
var slow = DiagModule(bogusGlobal); // produces purely-interpreted version
console.log(slow.diag(3, 4)); // 50
Validation of an asm.js module relies on a static type system that classifies and constrains the syntax. This section defines the types used by the validation logic.
Validation in asm.js limits JavaScript programs to only use operations that can be mapped closely to efficient data representations and machine operations of modern architectures, such as 32-bit integers and integer arithmetic.
The types of asm.js values are inter-related by a subtyping relation, which can be represented pictorially:
The light boxes represent arbitrary JavaScript values that may flow freely between asm.js code and external JavaScript code.
The dark boxes represent types that are disallowed from escaping into external (i.e., non-asm.js) JavaScript code. (These values can be given efficient, unboxed representations in optimized asm.js implementations that would be unsound if they were allowed to escape.)
The meta-variables σ and τ are used to stand for value types.
The void type is the type of functions that
are not supposed to return any useful value. As JavaScript functions,
they produce the undefined value, but asm.js code is not
allowed to make use of this value; functions with return
type void can only be called for effect.
The double type is the type of ordinary
JavaScript double-precision floating-point numbers.
The signed type is the type of signed
32-bit integers. While there is no direct concept of integers in
JavaScript, 32-bit integers can be represented as doubles, and integer
operations can be performed with JavaScript arithmetic, relational,
and bitwise operators.
The unsigned type is the type of unsigned
32-bit integers. Again, these are not a first-class concept in
JavaScript, but can be represented as floating-point numbers.
The int type is the type of 32-bit integers
where the signedness is not known. In asm.js, the type of a variable
never has a known signedness. This allows them to be compiled as
32-bit integer registers and memory words. However, this
representation creates an overlap between signed and unsigned numbers
that causes an ambiguity in determining which JavaScript number they
represent. For example, the bit pattern 0xffffffff could
represent 4294967295 or -1, depending on the signedness. For this
reason, values of the int type are disallowed from
escaping into external (non-asm.js) JavaScript code.
The fixnum type is the type of integers in the
range [0, 231)—that is, the range of integers such
that an unboxed 32-bit representation has the same value whether it is
interpreted as signed or unsigned.
Even though JavaScript only supports floating-point arithmetic, most operations can simulate integer arithmetic by coercing their result to an integer. For example, adding two integers may overflow beyond the 32-bit range, but coercing the result back to an integer produces the same 32-bit integer as integer addition in, say, C.
The intish type represents the result of a
JavaScript integer operation that must be coerced back to an integer
with an explicit coercion
(ToInt32
for signed integers
and ToUint32
for unsigned integers). Validation requires all intish
values to be immediately passed to an operator or standard library
that performs the appropriate coercion or else dropped via an
expression statement. This way, each integer operation can be
compiled directly to machine operations.
The one operator that does not support this approach is
multiplication. (Multiplying two large integers can result in a large
enough double that some lower bits of precision are lost.) So asm.js
does not support applying the multiplication operator to integer
operands. Instead, the
proposed Math.imul
function is recommended as the proper means of implementing integer
multiplication.
intish, the doublish
type represents operations that are expected to produce
a double but may produce additional junk that must be
coerced back to a number
via ToNumber. In
particular, reading out of bounds from a typed array
produces undefined, and calling FFI functions may produce
arbitrary JavaScript values.
unknown type represents a value returned
from an FFI call. Since asm.js does not allow general JavaScript
values, the result must be immediately coerced to an integer or
double.
extern type represents the root
of all types that can escape back into external JavaScript—in
other words, the light boxes in the above diagram.
Variables and functions defined at the top-level scope of an asm.js module can have additional types beyond the value types. These include:
ArrayBufferView types IntnArray, UintnArray, and FloatnArray;
Function.
The "∧" notation for function types serves to represent
overloaded functions and operators. For example,
the Math.abs function is
overloaded to accept either integers or floating-point numbers, and
returns a different type in each case. Similarly, many of
the operators have overloaded types.
The meta-variable γ is used to stand for global types.
Validating an asm.js module depends on tracking contextual information about the set of definitions and variables in scope. This section defines the environments used by the validation logic.
An asm.js module is validated in the context of a global environment. The global environment maps each global variable to its type as well as indicating whether the variable is mutable:
The meta-variable Δ is used to stand for a global environment.
In addition to the global environment, each function body in an asm.js module is validated in the context of a variable environment. The variable environment maps each function parameter and local variable to its value type:
{ x : τ, … }
The meta-variable Γ is used to stand for a variable environment.
Looking up a variable's type
is defined by:
mut γ or x : imm γ
occurs in Δ
If x does not occur in either environment then the Lookup function has no result.
Validation of an asm.js module is specified by reference to the ECMAScript grammar, but conceptually operates at the level of abstract syntax. In particular, an asm.js validator must obey the following rules:
;) are always ignored, whether in
the top level of a module or inside an asm.js function body.
eval
or arguments.
These rules are otherwise left implicit in the rest of the specification.
All variables in asm.js are explicitly annotated with type information so that their type can be statically enforced by validation.
Every parameter in an asm.js function is provided with an explicit
type annotation in the form of a coercion. This coercion serves two
purposes: the first is to make the parameter type statically apparent
for validation; the second is to ensure that if the function is
exported, the arguments dynamically provided by external JavaScript
callers are coerced to the expected type. For example, a bitwise OR
coercion annotates a parameter as having type int:
function add1(x) {
x = x|0; // x : int
return (x+1)|0;
}
In an AOT implementation, the body of the function can be
implemented fully optimized, and the function can be given two entry
points: an internal entry point for asm.js callers, which are
statically known to provide the proper type, and an external dynamic
entry point for JavaScript callers, which must perform the full
coercions (which might involve arbitrary JavaScript computation, e.g.,
via implicit calls to valueOf).
There are two recognized parameter type annotations:
= x:Identifier|0;= +x:Identifier;
The first form annotates a parameter as type int, and
the second as type double.
An asm.js function's return type is determined by the last
statement in the function body, which for non-void
functions is required to be a ReturnStatement. This
distinguished return statement may take one of four forms:
return +e:Expression;return e:Expression|0;return n:NumericLiteral;return;
The first form has return type double. The second has
type signed. The third has return
type double if n is a floating-point literal,
i.e., a numeric literal with the character . in its
source; alternatively, if n is an integer literal in the
range [-231, 231), the return statement has
return type signed. The fourth form has return
type void.
If the last statement in the function body is not
a ReturnStatement, or if the function body has no non-empty
statements (other than the initial declarations and
coercions—see Function
Declarations), the function's return type is void.
The type of a function declaration
function f:Identifier(x:Identifier…) { x:Identifier = AssignmentExpression;… var y:Identifier = n:NumericLiteral,…… body:Statement…}
is (σ,…) → τ where σ,… are the
types of the parameters, as provided by
the parameter type
annotations, and τ is the return type, as provided by
the return type annotation. The
variable f is stored in
the global environment with
type imm (σ,…) → τ.
The types of variable declarations are determined by their
initializer. A variable initializer may be a floating-point literal,
which is any numeric literal with the character . in
their source, and has type double. Alternatively, an
initializer may be an integer literal in the range [-231,
232), which has type int.
A global variable declaration is a VariableStatement node in one of several allowed forms.
A global program variable is initialized to a literal:
var x:Identifier = n:NumericLiteral;
The global variable x is stored in
the global environment with
type mut τ, where τ is determined in the same way
as local variable type
annotations.
A standard library import is of one of the following two forms:
var x:Identifier = stdlib:Identifier.y:Identifier;var x:Identifier = stdlib:Identifier.Math.y:Identifier;
The variable stdlib must match the first parameter of
the module declaration. The global
variable x is stored in
the global environment with
type imm γ, where γ is the type of
library y or Math.y as specified by
the standard library types.
A foreign import is of one of the following three forms:
var x:Identifier = foreign:Identifier.y:Identifier;var x:Identifier = foreign:Identifier.y:Identifier|0;var x:Identifier = +foreign:Identifier.y:Identifier;
The variable foreign must match the second parameter of
the module declaration. The global
variable x is stored in
the global environment with
type imm Function for the first form, imm
int for the second, and imm double for the third.
A global heap view is of the following form:
var x:Identifier = new stdlib:Identifier.view:Identifier(heap:Identifier);
The variable stdlib must match the first parameter of
the module declaration and the
variable heap must match the third. The
identifier view must be one of the
standard ArrayBufferView
type names. The global variable x is stored in
the global environment with
type imm
view.
A function table is a VariableStatement of the form:
var x:Identifier = [f:Identifier,…];
The length of the array literal must be a power of two and all the
identifiers f must map to the same type imm
(σ,…) → τ in
the global environment. The function
table x is stored in the global environment with
type imm ((σ,…) → τ)[n]
where n is the length of the array literal.
To ensure that a JavaScript function is a proper asm.js module, it must first be statically validated. This section specifies the validation rules. The rules operate on JavaScript abstract syntax, i.e., the output of a JavaScript parser. The non-terminals refer to parse nodes defined by productions in the ECMAScript grammar, but note that the asm.js validator only accepts a subset of legal JavaScript programs.
An asm.js module is a FunctionDeclaration or FunctionExpression node with the following form:
function f:Identifieropt(stdlib:Identifier, foreign:Identifier, heap:Identifieroptoptopt) {
"use asm"; var:VariableStatement… fun:FunctionDeclaration… table:VariableStatement… exports:ReturnStatement}
A module is valid if:
An asm.js module's export declaration is a ReturnStatement returning either a single asm.js function or an object literal exporting multiple asm.js functions.
An export declaration node
return { x:Identifier : f:Identifier,… };
is valid if for each f, Δ(f) = imm
γ where γ is a function type (σ,…) →
τ.
An export declaration node
return f:Identifier;
is valid if Δ(f) = imm γ where γ is
a function type (σ,…) → τ.
An asm.js function declaration is a FunctionDeclaration node
function f:Identifier(x:Identifier,…) { x:Identifier = AssignmentExpression;… var y:Identifier = n:NumericLiteral,…;… body:Statement…}
A function declaration is valid if:
imm (σ,…) → τ;
body statement is valid in Δ and Γ
with expected return type τ.
Each statement is validated in the context of a global environment Δ, a variable environment Γ, and an expected return type τ. Unless otherwise explicitly stated, a recursive validation of a subterm uses the same context as its containing term.
A Block statement node
{ stmt:Statement… }
is valid if each stmt is valid.
An ExpressionStatement node
;
is valid if expr is valid.
An EmptyStatement node is always valid.
An IfStatement node
if ( expr:Expression ) stmt1:Statement else stmt2:Statement
is valid if expr validates as a subtype
of int and stmt1 and stmt2 are both
valid.
An IfStatement node
if ( expr:Expression ) stmt:Statement
is valid if expr validates as a subtype
of int and stmt is valid.
A ReturnStatement node
return expr:Expression ;
is valid if expr validates as a subtype of the expected return type τ.
A ReturnStatement node
return ;
is valid if the expected return type τ is void.
An IterationStatement node
while ( expr:Expression ) stmt:Statement
is valid if expr validates as a subtype
of int and stmt is valid.
An IterationStatement node
do stmt:Statement while ( expr:Expression ) ;
is valid if stmt is valid and expr validates as a
subtype of int.
An IterationStatement node
for ( init:ExpressionNoInopt ; test:Expressionopt ; update:Expressionopt ) body:Statement
is valid if init validates (if present),
test validates as a subtype of int (if
present), update validates (if present), and body is
valid.
A BreakStatement node
break Identifieropt ;
is always valid.
A ContinueStatement node
continue Identifieropt ;
is always valid.
A LabelledStatement node
: body:Statement
is valid if body is valid.
A SwitchStatement node
switch ( test:Expression ) { case:CaseClause… default:DefaultClauseopt }
is valid if
signed;
signed;
A switch statement in asm.js is intended to be compiled
unconditionally (i.e., regardless of its contents) to
a jump
table. Branch instructions can instead be expressed
via if statements.
Note that a switch statement with a sparse set
of case values can result in a large jump
table. Programmers and code generators are therefore expected to be
responsible for breaking up sparse value sets into
multiple switch statements separated
by if-guards if they wish to reduce the size of jump
tables.
Cases in a switch block are validated in the context
of a global environment Δ, a variable
environment Γ, an expected return type τ, and an
expected case type σ. Unless otherwise explicitly stated, a
recursive validation of a subterm uses the same context as its
containing term.
A CaseClause node
case n:NumericLiteral : stmt:Statement…
is valid if
. character;
A DefaultClause node
default : stmt:Statement…
is valid if each stmt is valid.
Each expression is validated in the context of a global environment Δ and a variable environment Γ, and validation determines the type of the expression. Unless otherwise explicitly stated, a recursive validation of a subterm uses the same context as its containing term.
An Expression node:
, expr2:AssignmentExpression
validates at type τ if expr1 validates at some type σ and expr2 validates at type τ.
For a NumericLiteral node:
. character, the expression
validates as type double;
. character and its
numeric value is in the range [-231, 0), the expression
validates as type signed;
. character and its
numeric value is in the range [0, 231), the expression
validates as type fixnum;
. character and its
numeric value is in the range [231, 232), the
expression validates as type unsigned.
Note that integer literals outside the range [-231, 232) are invalid, i.e., fail to validate.
An Identifier node
validates as Lookup(Δ, Γ, x).
A MemberExpression node
[n:NumericLiteral]
validates at type τ if:
ArrayBufferView
type;
. character;
A MemberExpression node
[expr:Expression]
validates at type intish if:
ArrayBufferView
type;
intish;
int.
A MemberExpression node
[expr:Expression >> n:NumericLiteral]
validates at type τ if:
ArrayBufferView
type;
intish;
. character;
An AssignmentExpression node
= expr:AssignmentExpression
validates as type τ if the nested AssignmentExpression validates as type τ and one of the following two conditions holds:
An AssignmentExpression node
= rhs:AssignmentExpression
validates as type τ if lhs validates as type τ, rhs validates as type σ, and σ is a subtype of τ.
A CallExpression node
(arg:Expression,…)
validates as type τ if Lookup(Δ, Γ, f) = … ∧ (σ,…) → τ ∧ … and each arg validates as a subtype of its corresponding σ.
Alternatively, the CallExpression node validates as
type unknown if Lookup(Δ,
Γ, f) = Function and each arg
validates as a subtype of extern.
A CallExpression node
[index:Expression & n:NumericLiteral](arg:Expression,…)
validates as type τ if:
. character;
intish;
A UnaryExpression node
+-~arg:UnaryExpression
validates as type τ if the type of op is … ∧ (σ) → τ ∧ … and arg validates as a subtype of σ.
A UnaryExpression node of the form
~~arg:UnaryExpression
validates as type signed if arg validates as
a subtype of double.
A MultiplicativeExpression node
*/% rhs:UnaryExpression
validates as type τ if the type of op is … ∧ (σ1, σ2) → τ ∧ … and lhs validates as a subtype of σ1 and rhs validates as a subtype of σ2.
A MultiplicativeExpression node
* n:NumericLiteral* expr:UnaryExpression
validates as type intish if the source of n
does not contain a . character and -220
< n < 220 and expr validates as a
subtype of int.
An AdditiveExpression node
+- … +- exprn
validates as type intish if:
int;
Otherwise, an AdditiveExpression node
+- rhs:MultiplicativeExpression
validates as type double if the type of op is
(σ1, σ2) → double
and lhs validates as a subtype of σ1
and rhs validates as a subtype of σ2.
A ShiftExpression node
<<>>>>> rhs:AdditiveExpression
validates as type τ if the type of op is … ∧ (σ1, σ2) → τ ∧ … and lhs validates as a subtype of σ1 and rhs validates as a subtype of σ2.
A RelationalExpression node
<><=>= rhs:ShiftExpression
validates as type τ if the type of op is … ∧ (σ1, σ2) → τ ∧ … and lhs validates as a subtype of σ1 and rhs validates as a subtype of σ2.
An EqualityExpression node
==!= rhs:RelationalExpression
validates as type τ if the type of op is … ∧ (σ1, σ2) → τ ∧ … and lhs validates as a subtype of σ1 and rhs validates as a subtype of σ2.
A BitwiseANDExpression node
& rhs:EqualityExpressionvalidates as type signed if lhs
and rhs validate as type intish.
A BitwiseXORExpression node
^ rhs:BitwiseANDExpressionvalidates as type signed if lhs
and rhs validate as type intish.
A BitwiseORExpression node
| rhs:BitwiseXORExpressionvalidates as type signed if lhs
and rhs validate as type intish.
A ConditionalExpression node
? cons:AssignmentExpression : alt:AssignmentExpression
validates as type τ if:
A parenthesized expression node
( expr:Expression )
validates as type τ if expr validates as type τ.
An AOT implementation of asm.js must perform some internal dynamic checks at link time to be able to safely generate AOT-compiled exports. If any of the dynamic checks fails, the result of linking cannot be an AOT-compiled module. The dynamically checked invariants are:
return statement without throwing;
ArrayBuffer;
byteLength must be a multiple of 4096;
byteLength must be a power of 2;
byteLength must be no greater than 231 (assuming the engine even allows ArrayBuffers that large);
If any of these conditions is not met, it is generally unsafe to produce an AOT-compiled module object, and the engine should fall back to an interpreted or JIT-compiled implementation.
| Unary Operator | Type |
|---|---|
+ |
(signed) → double ∧( unsigned) → double ∧( doublish) → double
|
- |
(int) → intish ∧( doublish) → double
|
~ |
(intish) → signed |
! |
(int) → int |
| Binary Operator | Type |
|---|---|
+ |
(double, double) → double |
- |
(doublish, doublish) → double |
* |
(doublish, doublish) → double |
/ |
(signed, signed) → intish ∧( unsigned, unsigned) → intish ∧( doublish, doublish) → double
|
% |
(signed, signed) → int ∧( unsigned, unsigned) → int ∧( doublish, doublish) → double
|
|, &, ^, <<, >> |
(intish, intish) → signed |
>>> |
(intish, intish) → unsigned |
<, <=, >, >=, ==, != |
(signed, signed) → int ∧( unsigned, unsigned) → int ∧( double, double) → int
|
| Standard Library | Type |
|---|---|
InfinityNaN
|
double |
Math.acosMath.asinMath.atanMath.cosMath.sinMath.tanMath.ceilMath.floorMath.expMath.logMath.sqrt
|
(doublish) → double |
Math.abs |
(signed) → unsigned ∧( doublish) → double
|
Math.atan2Math.pow
|
(doublish, doublish) → double |
Math.imul |
(int, int) → signed |
Math.EMath.LN10Math.LN2Math.LOG2EMath.LOG10EMath.PIMath.SQRT1_2Math.SQRT2 |
double |
| View Type | Element Size (Bytes) | Element Type |
|---|---|---|
Uint8Array |
1 | intish |
Int8Array |
1 | intish |
Uint16Array |
2 | intish |
Int16Array |
2 | intish |
Uint32Array |
4 | intish |
Int32Array |
4 | intish |
Float32Array |
4 | doublish |
Float64Array |
8 | doublish |
Thanks to Martin Best, Brendan Eich, Andrew McCreight, and Vlad Vukićević for feedback and encouragement.
Thanks to Michael Bebenita for improved diagrams.