Hazards
In this section, we will explain the hazards that can occur in our baseline CPU.
The hazards could be classified into: structural, data, and control.
Structural Hazards
Structural hazard occurs when multiple pipeline stages need to access the same hardware resource such as memory, register file, and arithmetic logic unit (ALU) at the same clock cycle. It should be resolved by stalling the stage which trying to access later:
Solution: Stall
I1: lw x2, 0(x1) # write to x2
I2: sub x4, x3, x2 # read from x2 requires several cycle in the decode stage
I3: add x4, x3, x2 # need to wait in the fetch stage
In the example above, the I2 must wait several cycles in the decode stage to read the correct value of x2 from I1
(details on how I1 and I2 communicate will be explained later in here; for now, you can omit I1 in this example).
As a result, structural hazard occurs at cycle T+2 because I3 is ready to move to the decode stage while I2 still needs to occupy the decode stage.
To resolve this hazard, the decode stage asks I3 to stall at cycle T+2.
It is usually done by turning off the ready bit in the valid-ready protocol.
Data Hazards
Data hazard occurs when the processing of a pipeline stage depends on the result of later stages. It should be resolved by stalling the stage if its data dependency is not made available yet; or bypassing the necessary data from later stages in the same clock cycle.
For instance, a data hazard due to read-after-write dependency in CPU core is resolved either by stall the read instruction in the decode stage or by bypassing the result of the write instruction in the later stages to the read instruction in the decode stage.
Solution (1/2): Bypassing
In the decode stage, we need to read the value of source registers. When the RAW dependency happens, we can bypass the values from the later stages to the decode stage.
From the execute stage:
I1: add x3, x2, x1 # write to x3
I2: sub x5, x4, x3 # read from x3
At cycle T+2, I1 can bypass the value of x3 to the I2.
From the memory stage:
I1: add x3, x2, x1 # write to x3
I2: nop
I3: sub x5, x4, x3 # read from x3
At cycle T+3, I1 can bypass the value of x3 to the I3.
From the writeback stage:
I1: add x3, x2, x1 # write to x3
I2: nop
I3: nop
I4: sub x5, x4, x3 # read from x3
At cycle T+4, I1 can bypass the value of x3 to the I4.
Solution (2/2): Stall
When data from a later stage is not yet ready, bypassing is not possible, and a stall becomes necessary. In our baseline CPU, there are two sources of stall: load instruction, and CSR instruction.
From the load instruction:
I1: lw x2, 0(x1) # write to x2
I2: sub x4, x3, x2 # read from x2
- At cycle T+2,
I2need to be stalled at the decode stage becauseI1did not reach the memory. - At cycle T+3,
I1gets the value ofx2from the memory, now can bypass the value to theI2.
From the CSR instruction:
I1: csrr x2, mcause # write to x2
I2: bgez x2, 8000003c # read from x2
In our baseline CPU, the CSR is located in the memory stage.
- At T+2,
I2need to be stalled at the decode stage becuaseI1did not reach the CSR. - At T+3,
I1gets the value ofx2from the CSR, now can bypass the value to theI2.
For more details on CSR instructions, please refer to the Chapter 2.8 of RISC-V Specifications.
Control Hazards
Control hazard occurs when a pipeline stage makes wrong predictions on which instructions to execute in the next clock cycle. It should be resolved, e.g., when the execute stage detects a misprediction, by discarding the mispredicted instructions at the fetch and decode stages and restarting from the correct next instruction.
Solution: Discarding and Restarting
In our baseline CPU, there are two sources of control hazard: branch misprediction and exception.
Branch Misprediction
I1: beq x2, x1, target # mispredicted to not taken
I2: add x5, x4, x3 # should be killed
I3: lw x5, 0(x4) # should be killed
target:
I4: sub x5, x4, x3 # correct target address
At cycle T+1, the fetch stage speculates that I1's next instruction is I2 so that it is fetched from the instruction memory.
But at cycle T+2, the execute stage deduces that I1's next instruction is in fact I4.
As such, the mispredicted instructions I2 and I3 are discarded at cycle T+2, and the fetch stage is restarted with the correct next instruction I4 at cycle T+3.
Exception
I1: unimp # illegal instruction, redirect to trap vector
I2: add x4, x3, x2 # should be killed
I3: sub x4, x3, x2 # should be killed
I4: lw x4, 0(x3) # should be killed
trap_vector:
I5: csrr x5, mcause # trap handling logic
The illegal instruction I1 generates the exception and should be redirected to the trap vector to handle the exception.
At cycle T+3, the illegal instruction I1 reaches the CSR in the memory stage, and it returns the trap vector address.
As such, the mispredicted instructions I2, I3, and I4 are discarded at cycle T+3, and the fetch stage is restarted with the correct next instruction I5 at cycle T+4.