Quadratic Micropass Type Inference

For the proof of concept implementation, see this repository
Introduction
Languages that allow plentiful of type inference can often produce confusing error messages. Type inference makes assumptions about what the user intends types to be, and if there’s a type error, there’s a high risk one of those assumptions were wrong. When the compiler then creates an error message it’ll use those types that may have been inferred from false assumptions.
I propose a new kind of type inference algorithm that always prioritises the type unifications the end-user is most likely to care about, independent from the types order in source code.
The goal being to not have the end-user need to reverse engineer what causes the compiler to infer an incorrect type, and instead work with the users pre-existing assumptions.
Background
Languages with simpler inference such as Go can infer inside-out. Meaning we can figure out the type of an expression/statement by looking at its kind plus the types of its subexpressions.
func main() {
x := someFunction() // the type of the right-hand side is always known
}
func someFunction() string {...}
But this approach quickly falls apart.
fn main() {
let f = |x| {
// we don't know the type of `y` as the type of `x` is not yet known
let y = (x, x);
return y;
};
// we don't know the type of `f` either.
f("Hello!"); // it's not until here we know the type of `x`, `y` and f`.
}
A more powerful form of inference is type variable unification. With such a system we can leave types partially uninferred, and have later expressions/statements fill in the gaps.
Lets take the same example but use type variable unification this time. We start inferring types inside-out like in the first example. But; for the types that are not yet known we put a type variable. (annotated as ’N)
fn main() {
let f: fn(ˈ1) -> ˈ1 = |x| {
let y: (ˈ1, ˈ1) = (x, x);
return y;
};
...
}
As the name “type variable” suggests, they are variables and thus can be re-assigned.
...
f("Hello!"); // the literal unifies '1 with string, assigning '1 = string
}
After that type variable assignment the final types are what we’d expect.
fn main() {
let f: fn(string) -> string = |x| {
let y: (string, string) = (x, x);
return y;
};
f("Hello!");
}
However this approach creates a new problem. Since we still unify types inside-out, we can end up making a type variable assignment which later on turns out to be much more clearly described as being intended to be assigned to something else.
If we look at this example.
fn example(x) -> Point[int] {
log(["origin: ", x]);
return Point(x, x);
^ ^ error: expected int, got string
}
By following the type variable unification steps from before x would be inferred as a string since it’s being used in a list with a string eventually creating the error message shown. But when a developer looks at this function they would be much more likely to expect x to be inferred as int since it’s more clearly being used as an int in the return statement.
If we could instead unify types in the order the end-user deems most important, we could create error messages that more strongly correlate to the developers thought process.
Quadratic Micropass Type Inference
Core Concepts
Instead of one large unification pass traversing the code top to bottom and inside-out, inference is split into the following ordered passes.
known_applicationsknown_assignmentsknown_return_typesknown_same_as_unificationsknown_record_fieldsresolve_recordsless_known_functionsdefault_numbersdefault_unknown_to_unit_or_lift
With this ordering the higher up the list you go the closer to the users thought process you are. This way types are inferred by what the user is likely to consider the most important rather than by their position in source code.
Each pass performs its minimal task leaving most types untouched. After a pass is completed, it re-runs all previous passes before it. Hoping the changes it made will facilitate earlier passes to perform their role. With this design it’s logical that the lower down the list you go the more destructive passes are allowed to be. As its implied that less type information is available.
Since each type variable may be unified many times by the same pass, error generation is not reasonably able to work during inference. Therefore passes ignore errors and only touch the types it knows it can and should work with. Type checking is instead deligated to its own step which acts upon entirely static types. This has the additional benefit of avoiding additional complexity during unification.
Examples
Let’s see what happens if we now try the snippet that generated the misleading error message again.
fn example(x) -> Point[int] {
log(["origin: ", x]);
^ error: expected string, got int
return Point(x, x);
// ^ ^ `known_applications` pass runs first,
// assigning the type variable of `x` to `int`
}
Much better!
Let’s walk through a more feature-complete example step-by-step to see how this algorithm reaches its result.
fn main(a, b) -> _ {
let point = Point(a, 200);
let list = [point.x, 300];
let record = { x = 100, y = 200 };
return point.y == record.y;
}
struct Point[T] { x: T, y: T }In this imaginary language number literals can infer into any int size and are not assumed to be int. That’s why the first change isn’t the 200 literal being applied to Point.
- (known application)
Point(a, 200)transforms fromPoint[_]toPoint[{numeric}] - (known assignment)
pointtransforms from_toPoint[_] - (known assignment)
listtransforms from_to[_] - (known assignment)
x = 100transforms from_to{numeric} - (known assignment)
y = 100transforms from_to{numeric} - (known assignment)
mainreturn type transforms from_tobool - (known applications)
atransforms from_to{numeric} - (known application)
point.ytransforms from_to{numeric} - (known application)
record.ytransforms from_to{numeric} - (known same as)
point.xtransforms from_to{numeric} - (resolve record)
{ x = 100, y = 200 }transforms from_toPoint[_] - (known assignment)
recordtransforms from_toPoint[_] - (known record field)
{ x = 100, y = 100 }.xtransforms from_to{numeric} - (default numbers) all
{numeric}defaults toint - (default unit or lift)
maindeclaresTandbdefaults toT
Here’s a visualisation of the inference passes

Passes
known_applications
Infer the type of a parameter given to a function if the parameters type isn’t known but the expected type is.
known_assignments
Infer the type for the target of an assignment if the assigned value has a known type but the target does not.
This is also used when returning from a function, with the expected return type of the function being used as target.
known_return_types
If the same function is called in multiple places, and the return type of the called function is known, then for each call if the returned type is not known infer it to be the called functions known return type.
known_same_as_unifications
Unify all types that are used in the same list or branching expression with each other. Such as [x, y] and if true { x } else { y }.
known_record_fields
If the type of a record is known, then for each instance where a field from that record is accessed, unify the type of the accessed field with the known field type from the record.
resolve_records
If the type of a record is not known, use its field names to figure out its type.
less_known_functions
If a type which has not been inferred is called as a function, infer the type to be a function with the signature matching its first call.
default_numbers
If a type is not known but must be numeric, default it to be the default integer size.
default_unknown_to_unit_or_lift
If a type is not known, infer it to either the unit type if its created from an expression or an implicitly declared generic if its from the functions type signature.
Benchmarks
TODO