Software Engineering
compiler type-systems dynamic-typing static-typing static-analysis
Updated Tue, 28 Jun 2022 09:16:16 GMT

Where is the boundary between things which can be statically typechecked, and those which must be typechecked dynamically?


I am brain storming on how to create a type system for a programming language, and what the compiler will do with the typing information. Here is what I have found, followed by the main question, which is if I can represent more complicated "validations" as types somehow.

First, it seems there is a difference between the physical record structure of an object stored in memory, and a "type". An object for example can be "typed" such that it is either an odd fibonacci number, or an email formatted string, let's say. That is its "type" in some theoretical type system. But the physical record is either an integer or a string, which have actual ramifications on the physical memory. It's like there is a "data type", and a "validation type".

Given that, a traditional "type" ("validation type") is really a set of constraints over the underlying data type(s). Since it can be over many data types (integer or string in the last example), it is really over an abstract data type which encompasses all data types, the "base" data type. Then on top of this base data type, which is some sort of in-memory record, you have a set of constraints. These constraints can be over individual record fields, or over several fields in the record. The constraints are boolean-resolving functions.

But this poses a problem for "inspecting the type". If we want to see the type for our "odd fibonacci number or email string" object, and it is stored as a set of constraints (isOdd, isFibonacci, or isEmail), then you have to inspect these constraint functions to determine the type. But these functions would be useful for a type-checker, because the type-checker could check all the constraints of the type somehow and use them to infer other types or do typechecking (haven't gotten this far to know how it's implemented yet). Ideally, too, they would be useful for a compile-time type checker, a static type checker.

You could take the boolean constraint functions further so they are extremely complex async processes which run to determine if the data structure is the correct type. For example, say we had a type "if it's a full moon then set the time to midnight, otherwise set the time to regular time". Something crazy. Then it would make an HTTP request (a non-boolean function) to determine the full-moon status, and then check the value based on that status. There can be all kinds of async non-boolean functions embedded into the ultimate "boolean" constraint function, that it is hard to tease them apart and build a DSL that is "only boolean logic functions". When you get into practice, the line between a pure boolean logic function and just an imperative function or set of assembly instructions starts to blur.

So a "type" is really a set of constraints on some underlying data type in memory so to speak.

With this in mind, can all of these sorts of things be done at compile time? I don't think the moon example could be done, it seems some types will have to be checked at runtime dynamically. So then, what are the types of type-checking that can be done statically?

Are languages like Haskell or Coq able to somehow solve this problem of resolving these constraints at compile time? Or is there a fundamental limit to how far type systems can take you? I would like to do as much static compile-time type checking in a custom programming language as possible. But I keep thinking of these complex async function examples of constraints that might pop up in a "type", and it throws the whole system off. I feel like types must be severely restricted and limited in what they can do, and everything else must be done at runtime. But I'm not sure.




Solution

Yes, types are just constraints. Any general type system has top which means this value can be anything and bot (bottom) which is so constrained that no value can satisfy it.

Thats literally chapter one of any type theory book.

What can be determined at compile time depends on your operations. isEmail is trivial if you only allow constants in your language. It gets a little more complex when you allow concatenation. Once you allow casting (or I/O) then your compile time typechecking can no longer prevent all runtime errors.

isFib is pretty simple until you allow addition, while isOdd remains pretty simple.

Haskell and Coq especially can do more at compile time because theyre specifically crafted to limit the things that are only provable at runtime.