Quadratic Micropass Type Inference

Showcase example animation

Introduction

Languages that allow plentiful of type inference can often produce confusing error messages. Type inference makes assumptions about what the user intends types to be, and if there’s a type error, there’s a high risk one of those assumptions were wrong. When the compiler then creates an error message it’ll use those types that may have been inferred from false assumptions.

I propose a new kind of type inference algorithm that always prioritises the type unifications the end-user is most likely to care about, independent from the types order in source code.

The goal being to not have the end-user need to reverse engineer what causes the compiler to infer an incorrect type, and instead work with the users pre-existing assumptions.

Background

Languages with simpler inference such as Go can infer inside-out. Meaning we can figure out the type of an expression/statement by looking at its kind plus the types of its subexpressions.

func main() {
    x := someFunction() // the type of the right-hand side is always known
}
 
func someFunction() string {...}

But this approach quickly falls apart.

fn main() {
    let f = |x| {
        // we don't know the type of `y` as the type of `x` is not yet known
        let y = (x, x);
        return y;
    };
    // we don't know the type of `f` either.

 
    f("Hello!"); // it's not until here we know the type of `x`, `y` and f`.
}

A more powerful form of inference is type variable unification. With such a system we can leave types partially uninferred, and have later expressions/statements fill in the gaps.

Lets take the same example but use type variable unification this time. We start inferring types inside-out like in the first example. But; for the types that are not yet known we put a type variable. (annotated as ’N)

fn main() {
    let f: fn(ˈ1) -> ˈ1 = |x| {
        let y: (ˈ1, ˈ1) = (x, x); 
        return y;
    };

 
    ...
}

As the name “type variable” suggests, they are variables and thus can be re-assigned.

    ...
 
    f("Hello!"); // the literal unifies '1 with string, assigning '1 = string
}

After that type variable assignment the final types are what we’d expect.

fn main() {
    let f: fn(string) -> string = |x| {
        let y: (string, string) = (x, x);
        return y;
    };

 
    f("Hello!");
}

However this approach creates a new problem. Since we still unify types inside-out, we can end up making a type variable assignment which later on turns out to be much more clearly described as being intended to be assigned to something else.

If we look at this example.

fn example(x) -> Point[int] {
    log(["origin: ", x]);
    return Point(x, x);
                 ^  ^   error: expected int, got string
}

By following the type variable unification steps from before x would be inferred as a string since it’s being used in a list with a string eventually creating the error message shown. But when a developer looks at this function they would be much more likely to expect x to be inferred as int since it’s more clearly being used as an int in the return statement.

If we could instead unify types in the order the end-user deems most important, we could create error messages that more strongly correlate to the developers thought process.

Quadratic Micropass Type Inference

Core Concepts

Instead of one large unification pass traversing the code top to bottom and inside-out, inference is split into the following ordered passes.

With this ordering the higher up the list you go the closer to the users thought process you are. This way types are inferred by what the user is likely to consider the most important rather than by their position in source code.

Each pass performs its minimal task leaving most types untouched. After a pass is completed, it re-runs all previous passes before it. Hoping the changes it made will facilitate earlier passes to perform their role. With this design it’s logical that the lower down the list you go the more destructive passes are allowed to be. As its implied that less type information is available.

Since each type variable may be unified many times by the same pass, error generation is not reasonably able to work during inference. Therefore passes ignore errors and only touch the types it knows it can and should work with. Type checking is instead deligated to its own step which acts upon entirely static types. This has the additional benefit of avoiding additional complexity during unification.

Examples

Let’s see what happens if we now try the snippet that generated the misleading error message again.

fn example(x) -> Point[int] {
    log(["origin: ", x]);
                     ^ error: expected string, got int
    return Point(x, x);
    //           ^  ^ `known_applications` pass runs first,
    //                 assigning the type variable of `x` to `int`
}

Much better!

Let’s walk through a more feature-complete example step-by-step to see how this algorithm reaches its result.

fn main(a, b) -> _ {
    let point = Point(a, 200);
    let list = [point.x, 300];
    let record = { x = 100, y = 200 };
    return point.y == record.y;
}

struct Point[T] { x: T, y: T }
  1. (known application) Point(a, 200) transforms from Point[_] to Point[{numeric}]
  2. (known assignment) point transforms from _ to Point[_]
  3. (known assignment) list transforms from _ to [_]
  4. (known assignment) x = 100 transforms from _ to {numeric}
  5. (known assignment) y = 100 transforms from _ to {numeric}
  6. (known assignment) main return type transforms from _ to bool
  7. (known applications) a transforms from _ to {numeric}
  8. (known application) point.y transforms from _ to {numeric}
  9. (known application) record.y transforms from _ to {numeric}
  10. (known same as) point.x transforms from _ to {numeric}
  11. (resolve record) { x = 100, y = 200 } transforms from _ to Point[_]
  12. (known assignment) record transforms from _ to Point[_]
  13. (known record field) { x = 100, y = 100 }.x transforms from _ to {numeric}
  14. (default numbers) all {numeric} defaults to int
  15. (default unit or lift) main declares T and b defaults to T

Here’s a visualisation of the inference passes

TODO: Create another visualisation like the one at the top

Passes

Infer the type of a parameter given to a function if the parameters type isn’t known but the expected type is.

Infer the type for the target of an assignment if the assigned value has a known type but the target does not.

This is also used when returning from a function, with the expected return type of the function being used as target.

If the same function is called in multiple places, and the return type of the called function is known, then for each call if the returned type is not known infer it to be the called functions known return type.

Unify all types that are used in the same list or branching expression with each other. Such as [x, y] and if true { x } else { y }.

If the type of a record is known, then for each instance where a field from that record is accessed, unify the type of the accessed field with the known field type from the record.

If the type of a record is not known, use its field names to figure out its type.

If a type which has not been inferred is called as a function, infer the type to be a function with the signature matching its first call.

If a type is not known but must be numeric, default it to be the default integer size.

If a type is not known, infer it to either the unit type if its created from an expression or an implicitly declared generic if its from the functions type signature.

Benchmarks

TODO