Hazards

In this section, we will explain the hazards that can occur in our baseline CPU.

The hazards could be classified into: structural, data, and control.

Structural Hazards

Structural hazard occurs when multiple pipeline stages need to access the same hardware resource such as memory, register file, and arithmetic logic unit (ALU) at the same clock cycle. It should be resolved by stalling the stage which trying to access later:

Solution: Stall

I1:  lw   x2, 0(x1)   # write to x2
I2:  sub  x4, x3, x2  # read from x2 requires several cycle in the decode stage
I3:  add  x4, x3, x2  # need to wait in the fetch stage

In the example above, the I2 must wait several cycles in the decode stage to read the correct value of x2 from I1 (details on how I1 and I2 communicate will be explained later in here; for now, you can omit I1 in this example).

As a result, structural hazard occurs at cycle T+2 because I3 is ready to move to the decode stage while I2 still needs to occupy the decode stage. To resolve this hazard, the decode stage asks I3 to stall at cycle T+2. It is usually done by turning off the ready bit in the valid-ready protocol.

Data Hazards

Data hazard occurs when the processing of a pipeline stage depends on the result of later stages. It should be resolved by stalling the stage if its data dependency is not made available yet; or bypassing the necessary data from later stages in the same clock cycle.

For instance, a data hazard due to read-after-write dependency in CPU core is resolved either by stall the read instruction in the decode stage or by bypassing the result of the write instruction in the later stages to the read instruction in the decode stage.

Solution (1/2): Bypassing

In the decode stage, we need to read the value of source registers. When the RAW dependency happens, we can bypass the values from the later stages to the decode stage.

From the execute stage:

I1:  add  x3, x2, x1  # write to x3
I2:  sub  x5, x4, x3  # read from x3

At cycle T+2, I1 can bypass the value of x3 to the I2.

From the memory stage:

I1:  add  x3, x2, x1  # write to x3
I2:  nop
I3:  sub  x5, x4, x3  # read from x3

At cycle T+3, I1 can bypass the value of x3 to the I3.

From the writeback stage:

I1:  add  x3, x2, x1  # write to x3
I2:  nop
I3:  nop
I4:  sub  x5, x4, x3  # read from x3

At cycle T+4, I1 can bypass the value of x3 to the I4.

Solution (2/2): Stall

When data from a later stage is not yet ready, bypassing is not possible, and a stall becomes necessary. In our baseline CPU, there are two sources of stall: load instruction, and CSR instruction.

From the load instruction:

I1:  lw   x2, 0(x1)   # write to x2
I2:  sub  x4, x3, x2  # read from x2

  • At cycle T+2, I2 need to be stalled at the decode stage because I1 did not reach the memory.
  • At cycle T+3, I1 gets the value of x2 from the memory, now can bypass the value to the I2.

From the CSR instruction:

I1:  csrr  x2, mcause    # write to x2
I2:  bgez  x2, 8000003c  # read from x2

In our baseline CPU, the CSR is located in the memory stage.

  • At T+2, I2 need to be stalled at the decode stage becuase I1 did not reach the CSR.
  • At T+3, I1 gets the value of x2 from the CSR, now can bypass the value to the I2.

For more details on CSR instructions, please refer to the Chapter 2.8 of RISC-V Specifications.

Control Hazards

Control hazard occurs when a pipeline stage makes wrong predictions on which instructions to execute in the next clock cycle. It should be resolved, e.g., when the execute stage detects a misprediction, by discarding the mispredicted instructions at the fetch and decode stages and restarting from the correct next instruction.

Solution: Discarding and Restarting

In our baseline CPU, there are two sources of control hazard: branch misprediction and exception.

Branch Misprediction

I1:  beq  x2, x1, target  # mispredicted to not taken
I2:  add  x5, x4, x3      # should be killed
I3:  lw   x5, 0(x4)       # should be killed

target:
I4:  sub  x5, x4, x3      # correct target address

At cycle T+1, the fetch stage speculates that I1's next instruction is I2 so that it is fetched from the instruction memory. But at cycle T+2, the execute stage deduces that I1's next instruction is in fact I4. As such, the mispredicted instructions I2 and I3 are discarded at cycle T+2, and the fetch stage is restarted with the correct next instruction I4 at cycle T+3.

Exception

I1:  unimp            # illegal instruction, redirect to trap vector
I2:  add  x4, x3, x2  # should be killed
I3:  sub  x4, x3, x2  # should be killed
I4:  lw   x4, 0(x3)   # should be killed

trap_vector:
I5:  csrr x5, mcause  # trap handling logic

The illegal instruction I1 generates the exception and should be redirected to the trap vector to handle the exception. At cycle T+3, the illegal instruction I1 reaches the CSR in the memory stage, and it returns the trap vector address. As such, the mispredicted instructions I2, I3, and I4 are discarded at cycle T+3, and the fetch stage is restarted with the correct next instruction I5 at cycle T+4.