Build Systems a la carte
- Goal of the paper is to develop a generic build system which can
be later instantiated.
- It should be minimal
- It should be correct
- Minimalism is achieved with a Scheduler. Correctness with
Rebuilder.
The Scheduler: Respecting the Dependency Order
- Topological (but doesn't handle dynamic dependencies)
- Restarting (abort on encountering fetch and restart once it's fetched)
- Suspending (wait for fetch to resolve the value. Like promises,
OCaml's Lwt etc)
Rebuilder: Determining Out-of-date Keys
Initial thoughts: timestamp of the files.
But this paper is developing a generic framework
A Dirty bit
:CUSTOMID: 4
An additional bit set to true if the key changes. And set to false
after every successful build. Any key in the transitive closure with
bit to true will trigger the build for that key
Examples: Excel, Make
Verifying traces
:CUSTOMID: 5
Recording hashes/values of dependencies and determining if they changed
TODO Constructive traces
:CUSTOMID: 6
example: bazel
TODO Deep constructive traces
:CUSTOMID: 7
example: nix
TODO Concrete implementations of build systems
(very dense. Mostly code explanations)
Partial Stores and Exceptions
v
-> Maybe v
because we often run into file not found etc. Task
is polymorphic enough to handle this change
Parallelism
Purely a scheduler's problem.
- Topological: simplest case. Everything is statically available.
- Restarting: Queue is dequeued with
n
readers
- Suspending: static are resolved parallely. Dynamics ones as in
Marlow et al
Impure Computations
- C compiler version not accounted for. Different version may produce
different errrors.
- Non-determinism from parallelism: different but semantically
identical results because tasks executed in different order
- Volatility: PHONY rules in Make. Random() in Excel
- Mitigation
- Bazel uses sandboxes to makes sure missing dependencies are caught
- Untracked tasks are marked volatile to ensure correctness
- Non-determinism could be handled by tweaking the definition of
correctness from exactly equal to one of the possible values.
- TODO What is MonadPlus. It seems it can be used to address non-determinism
Cloud implementation details overlooked
- Communication protocol details (What is TBs of data transfer is needed)
- Offloading: remotely run a task instead of local machine
- Eviction of old cache entries
Shallow builds: discarding intermediate package builds and only
retaining end products. Then correctness definition will need a
tweak.
We have to define a new store - shallow
which only contains the
subset of key/values we care about and update the equality
condition for this subset
Deep Constructive traces with non-determinism is problematic: eg:
Frankenbuild
profilers can emit different outputs for same executable. Simply
hydrating the local store with a remote one will lead to incorrect
results by all our definitions of correctness (main,
non-determinism, shallow)
Self tracking
When users edit the build rules, a simple approach is to recompute
everything
Self-tracked systems only to compute only the changed rules.
Ex: Excel
sprsh4 "B1" = Just $ Task $ \fetch -> do
formula <- fetch "B1-formula"
evalFormula fetch formula
Loops for computing tasks
Ex: excel and Latex
Polymorphic kv stores
It seems some build systems (Shake) provide ways to specify keys (and
values) with more than one types.
Ex: gcc --version
as a volatile key to pin gcc dependency version
Ex: GHC producing Foo.hi
and Foo.o
could be represented as a
tuple. I dont fully understand this. (Feel free to enlighten!)
Other build systems
- Dune. Uses arrows
- Ninja topological scheduler with verifying traces
- Nix: coarse grained. Deals with packages. Not files.
- Pluto allow cycles in build graph
- Redo: polymorphic dependencies
- TUP: improved dirty-bit checking avoiding entire re-checking of
graph
- Fabricate. Never heard of it. Nor do I understand it
Self adjusting computation (Incr)
- Often a graph of computations that automatically adjust to change
in inputs.
- No concrete implementation explored in the paper yet. Future Work.
Memoization
- Nearly the same as Dynamic programming.
- Ends up being a self-adjusting computation system. Are the
abstractions similar then? Question for further exploration.
Conclusion
- Investigated multiple build systems
- Properties of build systems are a function of two choices: order of
tasks and scheduler
- One of the combinations that hasn't been implemented: Cloud based
suspending monadic build. A discovery!