- recall:
- 1 function = 1 CFG
- 1 CFG = 1 or more BBs in a graph
**local**analysis rewrites the IR instructions*within*a BB

## Flow Analysis

- in this code, can we ever run the else?
`fn uhoh() { if true { println_s("then!"); } else { println_s("else!"); } }`

- is this bad code?
- not necessarily… constant conditions do come up, e.g.
`const FEATURE_ENABLED = false; // ... if FEATURE_ENABLED { ... } else { ... }`

- not necessarily… constant conditions do come up, e.g.
- can we detect this in the AST?
- maybe, but more complex conditions would require implementing constant folding on the AST, which seems like a duplication of effort (since we talked about doing it to the IR last time)

- this is a problem best solved with
**flow analysis**- statically analyzing what parts of the CFG are
*reachable*(another reachability oh boy!)

- statically analyzing what parts of the CFG are
- so let’s start with a simple algorithm:
- mark all BBs “unvisited.”
- do a depth-first traversal of the CFG, starting at the entry bb (bb0):
- mark it as visited.
- for each successor, if it hasn’t been visited, recursively traverse it.

- at the end, any unvisited BBs left are
**unreachable.**- if this feels like reachability from GC… good, cause it basically is.

- this algorithm has an important property: it
**terminates**- that is, it can never get stuck in an infinite loop, no matter how complex the CFG is
- how do we know this? the intuitive proof is something like this:
- every node can only be in one of two sets (visited and unvisited).
- we only look at unvisited notes.
- a node can only move from unvisited to visited.
- therefore, the set of nodes we look at
**monotonically**shrinks. - so, the maximum number of steps (node-visits) in this algorithm is
**bounded above**by the**number of nodes in the CFG.**

- okay. so does this solve the original problem?
- no, but it wouldn’t be hard to modify this algorithm so it does.
- on the step that says “for each successor,” we change that to “for each successor that we can prove will be run”
- for
`if true {} else {}`

, we know the else will never be run, so we only visit the “then” side.

- for
- boom! solved. right?
`if 2 + 2 == 4 {} else {}`

- this is
*logically*the same, but the IR looks something like:`$t0 = 2 + 2 if $t0 == 4 then goto ... else goto ...`

- but we talked about this last time!
- we can do
**constant folding**to get`$t0 = 4`

- and then
**copy propagation**to get`if 4 == 4`

- and that’s pretty obviously true.

- we can do

## From local to global

**global**optimization goes beyond single BBs and works on the CFG as a whole- the local optimizations we talked about last time can
**assist global optimization**

- the local optimizations we talked about last time can
- what’s more, those local optimizations can be
**extended across BB boundaries**`fn foo1(a: bool): int { let ret = 0; if a { println_s("whee"); } return ret; }`

- looking at the CFG for this function, the
`ret = 0`

and`return ret`

occur in different BBs- but intuitively,
`ret`

is acting as a constant here - we should be able to copy-propagate to get
`return 0`

- and dead-code-eliminate to delete the
`ret = 0`

instruction

- but intuitively,
- can we always do this? well:
`fn foo2(a: bool): int { let ret = 0; if a { ret = 10; } return ret; }`

- uh oh. one path has
`ret == 0`

, and another has`ret == 10`

.- we probably don’t know what value
`a`

will be until runtime (`foo(user_input())`

)

- we probably don’t know what value
- so if there are
**multiple paths**from definition to use, we can’t copy-propagate. - …or can we??
`fn foo3(a: bool): int { let ret = 0; if a { ret = 10; } else { ret = 10; } return ret; }`

- multiple paths, but each path assigns it the
**same value.**- aaaAAA

- when you start to get a bunch of special cases like this, it’s a good idea to step back and
**think about the algorithm a different way.**

## Trying to formalize global constant copy propagation

- a
**use**of a variable means we get its value- these call count as uses of
`x`

:`y = x`

,`y = x + 5`

,`if x < 10...`

,`f(x)`

- these call count as uses of
- global constant copy propagation works like this:
- on
**every path**to a use of`x`

, the last assignment to`x`

is`x = C`

, for some constant value`C`

.

- on
- this condition captures the idea of all the examples above:
- in
`foo1`

, there is only one assignment to`ret`

, so the condition is**true.** - in
`foo2`

, there are two assignments to`ret`

, but the constant is*different*in each, so the condition is**false.** - in
`foo3`

, again there are two assignments,*but*the constant is the*same*in both, so the condition is**true.**

- in
- okay, cool, but
**how do we implement this?**- let’s look at it at a micro level first
- let’s say we
**don’t know**what value`x`

could be. then we see`x = 50`

.- well, now we know that
`x`

is 50. duh.

- well, now we know that
- okay. the next instruction says
`x = x * y`

.- oh, shoot. what’s
`y`

? dunno. - so we’re back to
*not knowing*what`x`

is.

- oh, shoot. what’s
- remember: programs are proofs.
- and just like proofs, each step can
*prove*or*disprove*some proposition.

- and just like proofs, each step can
- so we’ll just do it
**one step at a time.**

- remember predecessors and successors?
- they’ll be very helpful when talking about this kind of algorithm…

## Actually formalizing global constant copy propagation

- we’ll define these rules on the level of
**instructions**within a block…- but you can generalize it to a BB by repeatedly applying the rules to its instructions in order.

- each instruction can
*change*our knowledge of what`x`

contains.`x = C`

for some constant`C`

means that*next*instruction,`x`

is a constant.- doesn’t matter what was in
`x`

before, it’s definitely a constant now.

- doesn’t matter what was in
`x = y`

or`x = y op z`

means that*next*instruction, we*don’t know*what’s in`x`

.- doesn’t matter if it was a constant before, now we can’t tell what it is.

- any instruction that
**doesn’t modify x**has no effect on that knowledge.- so for
`y = 10`

, if`x`

was`C`

before, it’ll still be`C`

after.

- so for

- so let’s think about this example:
`// x = ANY (that is, it could be any value) x = 10 // x = C(10) x = 11 // x = C(11) y = 20 // x = C(11) x = f() // x = ANY`

- that’s fine for straight-line code within a block. what about when you get
**edges**involved? - to determine the knowledge of
`x`

for the**first instruction**of a block…- you have to look at that block’s
**predecessors.**

- you have to look at that block’s
- consider this code:
`if cond { x = 10; } else { x = 20; } return x;`

- we have a diamond-shaped CFG with two edges pointing at the
`return x;`

block.- on one edge, we know that
`x = C(10)`

. - on the other, we know that
`x = C(20)`

. - so, before the
`return x;`

, we (sadly) have to say`x = ANY`

once again.

- on one edge, we know that
*however*if both the “then” and “else” sides did`x = 10`

…- then we’d have
`x = C(10)`

coming from both paths. - in that case, we
**would**be able to say`x = C(10)`

before the`return`

!

- then we’d have
- so these intuitions lead us to these rules:
- if
**any**predecessor says`x = ANY`

, then we have to assume`x = ANY`

. - if
**all**predecessors say`x = C(..)`

…- if all of them say it’s the SAME constant, then we can assume it’s that constant.
- otherwise, we have to assume
`x = ANY`

.

- if
- now we have everything we need to implement a
**naive**algorithm:- assume
`x = ANY`

for the input of**all instructions in all blocks.** - then, do a
**breadth-first**traversal starting at bb0:- apply the block-level rules to determine the value of
`x`

at the start of that block.- (the entry block bb0 is a special case.)

- apply the instruction-level rules to determine the value of
`x`

at the end of that block.

- apply the block-level rules to determine the value of

- assume
- boom. let’s try it on the above code first.
- then, let’s try it on this:
`fn evil() { let ret = 10; for i in 0, 10 { print_s("ha"); } return ret }`

- what’s wrong?
- the
`if i < 10`

block has**two predecessors,**but when we visit it,**we don’t know what value**`ret`

is from one of them. - worse, if we follow the predecessors backwards, we
**get back to this same block.** **I TOLD YOU ABOUT CYCLIC GRAPHS, BRO**

- the
- this is (fortunately) not too hard to solve.

## The improved algorithm

- our knowledge of
`x`

can now be one of**three**options:`x = ANY`

and`x = C(..)`

like before, and`x = UNVISITED`

as an “initial” value.

- our instruction-level rules now look like this:
- for the instruction
`x = C`

, the output will be`x = C`

regardless of input. - for the instruction
`x = anything else`

, the output will be`x = ANY`

regardless of input. - for all other instructions, the output will be the input.

- for the instruction
- and our block-level rules look like this:
- if
**any**predecessor says`x = ANY`

, then we have to assume`x = ANY`

. - if
**all**predecessors say`x = UNVISITED`

, then assume`x = UNVISITED`

. - if
**all**predecessors say`x = UNVISITED`

*or*`x = C(..)`

- we can ignore the
`UNVISITED`

predecessors, and apply the constant rule as before (if the constants are all the same, then output will be`x = C(that constant)`

, otherwise output will be`x = ANY`

.

- we can ignore the

- if
- now when we run this algorithm on that loop code…
- we actually
**visit some blocks/instructions more than once!!** - yep! that’s fine.

- we actually
- we can still prove this algorithm terminates.
- there are still a finite number of states our knowledge of
`x`

can be in. - those states change monotonically (
`UNVISITED -> const, const -> ANY, UNVISITED -> ANY`

) - so eventually, everything will “settle” on a final value.

- there are still a finite number of states our knowledge of
- what about this code:
`fn good(): int { let ret = 10; for i in 0, 10 { ret = 20; } return ret; }`

- let’s look at a harder problem:
`let some_global = true; fn foo(): int { let ret = 0; if some_global { ret = 10; } return ret; }`

*maybe*`some_global`

is “a constant” if nobody ever assigns into it… but how would I prove that?- I’d have to look at the
**whole program** - which is a
*whole other layer*of optimization on top of global opt

- I’d have to look at the