This is an automated archive made by the Lemmit Bot.
The original was posted on /r/programminglanguages by /u/Teln0 on 2024-11-09 21:22:51+00:00.
Source code here :
The README.md file contains more resources about the topic.
The idea is to iterate through the code in reverse execution order, and instead of assigning registers to values when they’re written to, we assign registers to values where we expect them to end up. If we run out of registers and need to use one from a previous value, we insert a restore instead of a spill after the current instruction and remove the value from the set of active values. Then, when we’re about to write to that value, we insert a spill to make sure the value ends up in memory, where we expect it to be at that point.
If we see that we need to read a value again that’s currently not active, we find a register for it, then add spill that register to the memory slot for that value, that way the value ends up in memory, where we expect it to be at that point.
This post in particular explained it very well :
Here are, in my opinion, some pros and cons compared to regular LSRA. I might be wrong, or not have considered some parts that would solve some issues with RLSRA, so feedback is very much welcome.
Note : in the following, I am making a distinction between active and live values. A value is live as long as it can still be read from / used. A value is *active* when it’s currently in a register. In the case of RLSRA, to use a live value that’s not active, we need to find a register for it and insert appropriate spills / restores.
PROS :
-
it’s a lot easier to see when a value shouldn’t be live anymore. Values can be read zero or more times, but written to only once, so we can consider a value live until its definition and dead as soon as we get to its definition. It simplifies to some extent live range analysis, especially for pure linear SSA code, but the benefit isn’t that big when using a tree-based IR : we already know that each value a tree generates will only be used once, and that is going to be when reach the parent node of the tree (subtrees are before parent trees in the execution order as we need all the operands before we do the operation). So most of the time, with regular LSRA on a tree based IR, we also know exactly how long values live.
-
handling merges at block boundaries is easier. Since we process code in reverse, we start knowing the set of values are active at the end of the block, and after processing, we can just propagate the set of currently active values to be the set of active values at the beginning of the predecessor blocks.
CONS :
- handling branches gets more difficult, and from what I see, some sort of live range analysis is still required (defeating the promise of RLSRA to avoid having to compute live ranges).
Suppose we have two blocks, A and B that both use the local variable 0 in the register r0. Those blocks both have the predecessor C.
We process the block A, in which we have a write to the local variable 0 before all its uses, so it can consider it dead from its point of view.
We then process the block C, and we select A as the successor to inherit active variables from. The register r0 will contain the value of the local variable 0 at the beginning of block C, and we’d like to know if we can overwrite r0 without having to spill its contents into the memory slot for the local variable 0, since the value of the local variable 0 will be overwritten in A anyway. We could think that it’s the case, but there’s actually no way to know before also processing the block B. Here’s are two things that could happen later on when we process B:
-
In the block B, there are no writes to the local variable 0 is not present, so at the beginning of block B, $0 is expected to be in the register r0. Therefore, the block C should add spills and restores appropriately so that the value of the local variable 0 ends up in r0 before a jump to B
-
The block B writes to the local variable 0 before its uses, so the block B doesn’t need it to be present in r0 at the beginning of it.
To know whether or not to generate spills and restores for the local variable 0, the block C therefore needs to have all its successors processed first. But this is not always possible, in the case of a loop for example, so unless we do live range analysis in a separate pass beforehand, it seems like we will always end up in a situation where needless spills and restores occur just in case a successor block we haven’t processed yet needs a certain value
I wonder if I’m missing something here, and if this problem can be solved using phi nodes and making my IR pure SSA. So far it’s “SSA for everything but local variables” which might not be the best choice. I’m still very much a novice at all this and I’m wondering if I’m about to “discover” the point of phi nodes. But even though I have ideas, I don’t see any obvious solution that comes to my mind that would allow me to avoid doing live range analysis.
Feedback appreciated, sorry if this is incomprehensible.