Hazards
In this section, we will explain the hazards that can occur in our baseline CPU.
The hazards could be classified into: structural, data, and control.
Structural Hazards
Structural hazard occurs when multiple pipeline stages need to access the same hardware resource such as memory, register file, and arithmetic logic unit (ALU) at the same clock cycle. It should be resolved by stalling the stage which trying to access later:
Solution: Stall
I1: lw x2, 0(x1) # write to x2
I2: sub x4, x3, x2 # read from x2 requires several cycle in the decode stage
I3: add x4, x3, x2 # need to wait in the fetch stage
In the example above, the I2
must wait several cycles in the decode stage to read the correct value of x2
from I1
(details on how I1
and I2
communicate will be explained later in here; for now, you can omit I1
in this example).
As a result, structural hazard occurs at cycle T+2 because I3
is ready to move to the decode stage while I2
still needs to occupy the decode stage.
To resolve this hazard, the decode stage asks I3
to stall at cycle T+2.
It is usually done by turning off the ready bit in the valid-ready protocol.
Data Hazards
Data hazard occurs when the processing of a pipeline stage depends on the result of later stages. It should be resolved by stalling the stage if its data dependency is not made available yet; or bypassing the necessary data from later stages in the same clock cycle.
For instance, a data hazard due to read-after-write dependency in CPU core is resolved either by stall the read instruction in the decode stage or by bypassing the result of the write instruction in the later stages to the read instruction in the decode stage.
Solution (1/2): Bypassing
In the decode stage, we need to read the value of source registers. When the RAW dependency happens, we can bypass the values from the later stages to the decode stage.
From the execute stage:
I1: add x3, x2, x1 # write to x3
I2: sub x5, x4, x3 # read from x3
At cycle T+2, I1
can bypass the value of x3
to the I2
.
From the memory stage:
I1: add x3, x2, x1 # write to x3
I2: nop
I3: sub x5, x4, x3 # read from x3
At cycle T+3, I1
can bypass the value of x3
to the I3
.
From the writeback stage:
I1: add x3, x2, x1 # write to x3
I2: nop
I3: nop
I4: sub x5, x4, x3 # read from x3
At cycle T+4, I1
can bypass the value of x3
to the I4
.
Solution (2/2): Stall
When data from a later stage is not yet ready, bypassing is not possible, and a stall becomes necessary. In our baseline CPU, there are two sources of stall: load instruction, and CSR instruction.
From the load instruction:
I1: lw x2, 0(x1) # write to x2
I2: sub x4, x3, x2 # read from x2
- At cycle T+2,
I2
need to be stalled at the decode stage becauseI1
did not reach the memory. - At cycle T+3,
I1
gets the value ofx2
from the memory, now can bypass the value to theI2
.
From the CSR instruction:
I1: csrr x2, mcause # write to x2
I2: bgez x2, 8000003c # read from x2
In our baseline CPU, the CSR is located in the memory stage.
- At T+2,
I2
need to be stalled at the decode stage becuaseI1
did not reach the CSR. - At T+3,
I1
gets the value ofx2
from the CSR, now can bypass the value to theI2
.
For more details on CSR instructions, please refer to the Chapter 2.8 of RISC-V Specifications.
Control Hazards
Control hazard occurs when a pipeline stage makes wrong predictions on which instructions to execute in the next clock cycle. It should be resolved, e.g., when the execute stage detects a misprediction, by discarding the mispredicted instructions at the fetch and decode stages and restarting from the correct next instruction.
Solution: Discarding and Restarting
In our baseline CPU, there are two sources of control hazard: branch misprediction and exception.
Branch Misprediction
I1: beq x2, x1, target # mispredicted to not taken
I2: add x5, x4, x3 # should be killed
I3: lw x5, 0(x4) # should be killed
target:
I4: sub x5, x4, x3 # correct target address
At cycle T+1, the fetch stage speculates that I1
's next instruction is I2
so that it is fetched from the instruction memory.
But at cycle T+2, the execute stage deduces that I1
's next instruction is in fact I4
.
As such, the mispredicted instructions I2
and I3
are discarded at cycle T+2, and the fetch stage is restarted with the correct next instruction I4
at cycle T+3.
Exception
I1: unimp # illegal instruction, redirect to trap vector
I2: add x4, x3, x2 # should be killed
I3: sub x4, x3, x2 # should be killed
I4: lw x4, 0(x3) # should be killed
trap_vector:
I5: csrr x5, mcause # trap handling logic
The illegal instruction I1
generates the exception and should be redirected to the trap vector to handle the exception.
At cycle T+3, the illegal instruction I1
reaches the CSR in the memory stage, and it returns the trap vector address.
As such, the mispredicted instructions I2
, I3
, and I4
are discarded at cycle T+3, and the fetch stage is restarted with the correct next instruction I5
at cycle T+4.