Getting Started
HazardFlow HDL allows users to modularly describe pipelined circuits with hazards, achieving cycle-level accuracy. It provides hazard interfaces and combinators to handle backward dependencies and simplify decomposition. By the end, users to design and compile hardware modules into Verilog.
Installation
Install rustup. After the installation is complete, restart the terminal.
$ curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
Clone the HazardFlow repository:
$ git clone https://github.com/kaist-cp/hazardflow.git
$ cd hazardflow
Compiling HazardFlow Modules to Verilog
First, build the Rust macros:
$ cargo build -p hazardflow-macro
To generate the Verilog code for 5-stage pipelined RISC-V CPU core:
# Generate a separate Verilog file for each submodule.
$ cargo run --release -- --target cpu --deadcode --wire-cache --system-task
# Generate an integrated Verilog file combining all submodules.
$ cargo run --release -- --target cpu --deadcode --wire-cache --merge --system-task
To generate the Verilog code for systolic-array based NPU core:
# Generate a separate Verilog file for each submodule.
$ cargo run --release -- --target gemmini --deadcode --wire-cache --system-task
# Generate an integrated Verilog file combining all submodules.
$ cargo run --release -- --target gemmini --deadcode --wire-cache --merge --system-task
The generated code is located in build
, with each top-level module with a #[synthesize]
attribute in separate directories.
Tutorial
We will introduce some basic concepts in the HazardFlow HDL programming model and use HazardFlow HDL to describe an FIR (finite impulse response) filter and a custom FIFO which does not allow duplicated entries.
HazardFlow HDL's implementation is based on the concept of hazard interface. Here, we will provide a brief overview of the interface and combinator to help you understand how to use them for implementing hardware modules. For more details, please refer to the Concepts section.
Hazard Interface
In HazardFlow HDL, a "hazard protocol" is defined as a communication protocol that includes a payload, a resolver, and a ready condition.
For a given hazard protocol H
, its payload, resolver, and ready condition are represented as H::P
, H::R
, and H::ready
, respectively.
The hazard interface is defined as I<H>
for a given hazard protocol H
.
It will be used as an ingress or egress type for the module.
// Hazard protocol.
trait Hazard {
type P: Copy;
type R: Copy;
fn ready(p: Self::P, r: Self::R) -> bool;
}
// Hazard interface.
struct I<H: Hazard>;
Forward signals: the payload of the interface and its validity.
- The payload signal (
H::P
) is data sent from the sender to the receiver. - The valid signal (
bool
) represents the validity of the payload signal.
Backward signal: the resolver of the interface.
- The resolver signal (
H::R
) is data sent from the receiver to the sender. It includes information used to control transferability (e.g., the receiver cannot accept the incoming payload for now).
Ready condition: the indicator whether the receiver is ready to receive the payload.
- The ready condition (
H::ready
) returns if the receiver is ready to receive with the current payload and resolver pair. - Logically, we say "transfer occurs" when the valid signal is
true
and the ready condition istrue
, which means the receiver is ready to receive the valid payload from the sender.
Example: valid-ready protocol
The valid-ready protocol, which is the most commonly used protocol for communication between modules in hardware, can be represented as follows.
struct ExampleVrH;
impl Hazard for ExampleVrH {
type P = u32;
type R = bool;
fn ready(p: u32, r: bool) -> bool {
r
}
}
Module
A module has a state of type S
and ingress and egress hazard interface types, I<IH>
and I<EH>
.
Conceptually, the module's behavior can be represented as a function of the following signature:
comb: impl Fn(Option<IH::P>, EH::R, S) -> (Option<EH::P>, IH::R, S)
It takes three input signals:
- Ingress interface's forward signal (
Option<IH::P>
) - Egress interface's backward signal (
EH::R
) - Current state (
S
)
and returns three output signals:
- Egress interface's forward signal (
Option<EH::P>
) - Ingress interface's backward signal (
IH::R
) - Next state (
S
)
Here, the forward signals are abstracted as Option<H::P>
, where Some(p)
indicates that the valid signal is true
and the payload signal is p
, while None
indicates that the valid signal is false
.
NOTE: A module can have zero or multiple hazard interfaces for its ingress or egress. For more details, please refer to the Interfaces section.
Combinator
Combinator is the idiomatic mechanism of attaching a module to an interface.
We define the generic combinator as a function fsm
within each hazard interface in the HazardFlow HDL and it will compute the combinational logic each cycle:
impl<IH: Hazard> I<IH> {
fn fsm<S: Copy, EH: Hazard>(
self,
init_state: S,
comb: impl Fn(Option<IH::P>, EH::R, S) -> (Option<EH::P>, IH::R, S),
) -> I<EH> {
..
}
}
It specifies the state type and ingress and egress interface types.
- State type (
S
) - Ingress hazard interface type (
I<IH>
) - Egress hazard interface type (
I<EH>
)
Also, it accepts two arguments for representing the module's behavior:
- Initial state when the circuit is reset (
init_state
) - Combinational logic representing the module's behavior (
comb
, as illustrated in the above)
For reusability, HazardFlow HDL provides a standard combinator library for developers.
It defines many useful combinators (e.g., filter_map
, mux
, etc.), which are implemented as wrapper functions for generic FSM combinator.
For more details, please refer to the Interface Combinators section.
FIR Filter
In this tutorial, we will use HazardFlow HDL to describe an FIR (finite impulse response) filter, which is very commonly used in digital signal processing applications.
Specification
The FIR filter of order N performs the following operation:
It receives input signals from the stream x and outputs the weighted sum of the most recent N+1 input signals to the stream y. It can be expressed with the following formula:
\[ y[n] = b_0 x[n] + b_1 x[n-1] + ... + b_N x[n-N] = \sum^{N}_{i=0} b_i x[n-i] \]
where \( x[n] \) and \( y[n] \) represent the input and output signals, \( N \) represents the filter order, and \( b_i \) represents the i-th filter coefficient.
For example, the IO signals of a FIR filter of order 2 with coefficients [4, 2, 3] are as follows:
n | x[n] | y[n] |
---|---|---|
0 | 1 | 4·1 + 2·0 + 3·0 = 4 |
1 | 4 | 4·4 + 2·1 + 3·0 = 18 |
2 | 3 | 4·3 + 2·4 + 3·1 = 23 |
3 | 2 | 4·2 + 2·3 + 3·4 = 26 |
4 | 7 | 4·7 + 2·2 + 3·3 = 41 |
5 | 0 | 4·0 + 2·7 + 3·2 = 20 |
For more details, please refer to Wikipedia.
Modular Design
We could represent the FIR filter of order 2 in modular way as follows:
As in the above figure, it can be divide into 3 submodules: window
, weight
, and sum
.
window
submodule:
- It serves as a sliding window, always returning the latest 3 valid input signals as an array.
- For example, if 1, 4, 3 are given as input signals,
[1, 0, 0]
,[4, 1, 0]
,[3, 4, 1]
will be returned as output signals.
weight
submodule:
- It keeps the weight vector
[b0, b1, b2]
persistent throughout the program. - It takes the input vector
[v0, v1, v2]
and returns the output vector[b0·v0, b1·v1, b2·v2]
. - This submodule can be simply represented by a
map
combinator.
sum
submodule:
- It takes the input vector and returns the sum of them as an output vector.
Implementation
Based on the above submodules, we can implement the FIR filter in a concise and modular way:
fn fir_filter(input: Valid<u32>) -> Valid<u32> {
let weight = Array::<u32, 3>::from([4, 2, 3]);
input
.window::<3>()
.map(|ip| ip.zip(weight).map(|(e, wt)| e * wt))
.sum()
}
We can describe the FIR filter with window
, map
, and sum
combinators in the HazardFlow HDL and we assume the input interface Valid<u32>
is provided.
Valid<u32>
, which is an alias of I<ValidH<u32, ()>>
is a valid interface where its payload is u32
, the resolver is empty ()
, and its ready
function always returns true
.
In other words, as long as the input interface's forward signal is Some(u32)
at a specific clock cycle, the receiver receives a valid payload.
We can interpret this input interface as a stream of signal values flowing through the wires.
window
combinator:
The window
combinator is defined as follows:
impl<P: Copy + Default> Valid<P> {
fn window<const N: usize>(self) -> Valid<Array<P, N>> {
self.fsm_map(P::default().repeat::<{ N - 1 }>(), |ip, s| {
let ep = ip.repeat::<1>().append(s).resize::<N>();
let s_next = ep.clip_const::<{ N - 1 }>(0);
(ep, s_next)
})
}
}
It takes an Valid<P>
and returns Valid<Array<P, N>>
.
It tracks the latest N
valid input signals.
The fsm_map
interface combinator is provided by the HazardFlow HDL standard library.
It computes the egress payload and the next state based on the ingress payload and the current state, and updates the state when the ingress tranfser happens.
The initial state is defined as P::default().repeat::<{ N - 1 }>()
in our example.
The anonymous function is where we specify the fsm logic from the (ingress payload, current state)
to the (egress payload, next state)
.
map
combinator:
The map
combinator is used to represent the weight
submodule.
map(|ip| ip.zip(weight).map(|(e, wt)| e * wt))
It takes an Valid<Array<u32, N>>
and returns an egress hazard interface Valid<Array<u32, N>>
.
It transforms the i
-th element of ingress payload ip[i]
into ip[i] * weight[i]
, and leaves the resolver as untouched.
The map
interface combinator is provided by the HazardFlow HDL standard library.
We can interpret it as stateless version of fsm_map
.
In the application-specific logic in map
interface combinator, we use zip
and map
methods for manipulating the ingress payload signal.
sum
combinator:
The sum
combinator is defined as follows:
impl<const N: usize> Valid<Array<u32, N>> {
fn sum(self) -> Valid<u32> {
self.map(|ip| ip.fold(0, |acc, e| acc + e))
}
}
It takes an Valid<Array<u32, N>>
and returns Valid<u32>
.
It transforms the ingress payload to sum of them.
In the application-specific logic in map
interface combinator, we use fold
method which aggregates the data within array of signal.
You can find the implementation in fir_filter.rs.
You can generate the Verilog codes with the following commands:
# Generate a separate Verilog file for each submodule.
$ cargo run --release -- --target fir_filter --deadcode --wire-cache
# Generate an integrated Verilog file combining all submodules.
$ cargo run --release -- --target fir_filter --deadcode --wire-cache --merge
FIFO without duplication
In this tutorial, we will implement a custom FIFO which can be used to provide balanced transfers for multiple input streams, thereby offering balanced latency for each input stream.
Specification
The custom FIFO with and N input streams and M entries performs the following operation:
It has M entries and receives payloads from N input streams. The key point is that it does not accept payloads from an input stream that already has an entry in the FIFO.
For example, in the figure above, the FIFO contains P1 and P2, which are payloads received from the in_1
and in_N
, respectively.
Therefore, the next payload will be accepted from an input stream other than in_1
and in_N
.
in_1
and in_N
become selectable again after P1 and P2 are removed from the FIFO via output
(the circles in front of the FIFO indicates whether each input stream can be received; a red circle means cannot, a green circle means can).
Modular Design
We could represent the custom FIFO in modular way as follows (U<M>
is a bitvector of width M
):
masked_merge
submodule:
- It takes N valid-ready interfaces (
Vr<P>
=I<VrH<P, ()>
,VrH
is defined in here). - It returns a valid-ready hazard interface (
I<VrH<(P, U<{ clog2(N) }>), U<N>>>
).- The
U<{ clog2(N) }>
in the payload indicates the index of selected ingress interface. - The
U<N>
in the resolver indicates the mask bits for the selection.
- The
- It will select from the ingress interface which has valid payload with the mask bit has
false
.- For example, if
N
is 4 and the mask bits are[true, true, false, true]
, then it will try to select from the third ingress interface. - If multiple ingress interfaces are selectable, the one with the smallest index is chosen.
- In this example, the mask bits means the origin of the current payloads in the FIFO.
- For example, if
map_resolver
submodule:
- It transforms the inner resolver signal from the FIFO state (
FifoS
) to the mask bits (U<N>
).FifoS
indicates the current state of the FIFO, including elements and head/tail indices.- The
i
-th element of mask bits (U<N>
) indicates that one of the payload currently in the FIFO has been selected from thei
-th ingress interface ofM0
.
- It does not touch the forward signals and the ready signal.
transparent_fifo
submodule:
- It takes a payload from the ingress interface when the FIFO is not full.
- It returns the oldest payload in the FIFO to the egress interface.
- It sends the FIFO's fullness (
Ready
) and the current FIFO state (FifoS
) as the ingress resolver.
map
submodule:
- It drops the unnecessary index information in the payload.
Implementation
Based on the above submodules, we can implement the custom FIFO in a concise and modular way:
fn custom_fifo(ingress: [Vr<u32>; N]) -> Vr<u32> {
ingress
.masked_merge()
.map_resolver_inner::<FifoS<(u32, U<{ clog2(N) }>), M>>(|fifo_s| {
fifo_s.inner_with_valid().fold(false.repeat::<N>(), |acc, i| {
if let Some((_, idx)) = i {
acc.set(idx, true)
} else {
acc
}
})
})
.transparent_fifo()
.map(|(ip, _idx)| ip)
}
- We used
u32
for the payload type,N
for the number of ingress interfaces andM
for the FIFO size. fifo_s.inner_with_valid()
represents the inner elements of the FIFO.- It has type
Array<HOption<u32>, M>
,HOption
represents the validity of the element.
- It has type
- We
fold
the inner elements of the FIFO in themap_resolver_inner
combinator:- The initializer is a boolean array with all elements as
false
of sizeN
- The index of the initializer array indicates the index of the ingress interfaces.
- We iterate through all the elements within the FIFO and set the accumulator's value at the index in each FIFO element to
true
. - The final result indicates which ingress interfaces are present in the current FIFO.
- The initializer is a boolean array with all elements as
- We send back this resolver to the
masked_merge
combinator to make decision for choosing the next ingress interface. - We filter out the unnecessary index information in the last
map
combinator. - The implementation of the
masked_merge()
combinator will be explained in the Implementing Combinators section.
You can find the full implementation in custom_fifo.rs.
You can generate the Verilog codes with the following commands:
# Generate a separate Verilog file for each submodule.
$ cargo run --release -- --target custom_fifo --deadcode --wire-cache
# Generate an integrated Verilog file combining all submodules.
$ cargo run --release -- --target custom_fifo --deadcode --wire-cache --merge
Congratulations! You finished the tutorial!
Concepts
- Signal: The data types that can be transferred through wires.
- Interface: Communication protocol between modules. It contains forward signal, backward signal and its transfer condition.
- Modules: Hardware module. It serves as the building block for a digital circuit.
- Interface Combinators: The mechanism of chaining two interfaces together.
Signal
Signal is a collection of types that can be transferred through wires in HazardFlow HDL. We divide these types into two categories: scalar and compound.
The HazardFlow HDL reuse some data types from the Rust programming language.
Normally we can consider a data type implements Copy
trait in Rust as a signal type in the HazardFlow HDL.
NOTE: The
Copy
trait in Rust and the signal type in HazardFlow are not in a perfect 1:1 relationship. While the signal type in HazardFlow must implement Rust'sCopy
trait, there are some types like function pointers (e.g.,fn() -> i32
) that implement theCopy
trait but are not treated as signal types in HazardFlow.
For more information of the Rust data types, please refer to The Rust Programming Language Book.
Scalar Types
A scalar type represents a single value.
Boolean
We interpret the Boolean type the same as the bool
type in the Rust programming language. In HazardFlow HDL, we also interpret the Boolean value as 0
or 1
and can be sent through the wires when the value is True
or False
.
Unsigned Integer
In the HazardFlow HDL, we support arbitrary bit-width unsigned integers. We define it as U<N>
, where N
is the bit-width. We provide a handful of primitive functions for supporting unsigned integers' arithmetic operations, type conversion, and also ordering between different unsigned integers.
- Arithmetic operations
- Basic arithmetic operations
+
,-
,*
are supported, and logical right shift and logical left shift are supported. - Some of the arithmetic operations might lead to bit-width changes in the final result.
- Basic arithmetic operations
- Type conversion
- We support converting Rust
i32
,u32
,u8
,usize
,u128
, andbool
intoU<N>
. - We can convert
U<N>
intou32
,u8
, and alsobool
.
- We support converting Rust
- Ordering
- We provide ordering functions for developers to easily compare two unsigned integers.
Compound Types
Compound types can group multiple values into one type.
Enum
We interpret the enum
type in HazardFlow HDL the same as Rust, the pattern matching feature is supported. The enum
type gives you a way of saying a value is one of a possible set of values.
Example: HOption
Similar to the Option
type in Rust, the HOption
type in HazardFlow HDL is also an enum
type. We define HOption
type as:
#[derive(Debug, Clone, Copy, HEq)]
enum HOption<T: Copy> {
/// No value.
None,
/// Some value of type `T`.
Some(T),
}
We provide almost the same handy functions as Rust when we operate on the HOption
type.
For example, the map
function in the HOption
type applies a function f
on the inner type of HOption
if the value is Some(T)
else returns None
.
and_then
function often used to chain fallible operations that may return None
and it flattens the result, avoiding nested HOption
.
Tuple
The tuple type is the same as the tuple type in Rust, and as long as the types within a tuple are signals, then we can treat the tuple type as a signal type that can be sent through the wires too. For example, (U<32>, bool, HOption<U<8>>)
can be considered as a signal type.
Struct
Similar to the tuple type, if every field of a struct
is a signal type, we can consider the struct
as a signal type.
For example, the AXI stream can be constructued as follows:
#[derive(Debug, Clone, Copy)]
struct Axis {
data: U<512>, // TDATA signal
keep: U<64>, // TKEEP signal
last: bool, // TLAST signal
user: bool, // TUSER signal
}
Array
The Array
type is primitive in the HazardFlow HDL. We can define an N
size sequence of V
type data as Array<V, N>
. The Array
type comes with a handful of handy functions, including indexing, clipping an array, zipping two arrays together, etc.
Example: Unsigned Integer
In HazardFlow HDL, we represent unsigned integer as an Array
of bool
with bit-width N
. When bool
is true
, we interpret it as a bit with value 1
, false
as 0
.
type U<const N: usize> = Array<bool, N>;
Interface
Interface is a communication protocol between modules. HazardFlow HDL provides a Hazard that abstracts communication protocol with arbitrary transfer conditions (e.g., valid-ready protocol). Like signals, interfaces also support compound types such as tuples and structs.
Hazard
Motivation: Handshake
In hardware semantics, a communication protocol is described as a handshake mechanism.
As an example, the most commonly used valid-ready protocol is described as below:
- The sender generates a 1-bit valid signal and a payload signal every clock cycle.
- The receiver generates a 1-bit ready signal each clock cycle.
- A transfer occurs when both the valid and ready signals are asserted (i.e., both are true).
- It's important to note that the payload is continuously present on the wire, regardless of the valid or ready signals.
Example waveform:
- At cycle 1, the sender turns on the valid bit with payload
0x42
.- Transfer does not happen because the receiver is not ready.
- At cycle 2, the receiver turns on the ready bit.
- Transfer happens because both the valid and ready signals are true.
Specification
In HazardFlow HDL, we abstracted any arbitraty communication protocol into Hazard
trait.
It describes the necessary information for communication: payload, resolver, and ready function.
trait Hazard {
type P: Copy;
type R: Copy;
fn ready(p: Self::P, r: Self::R) -> bool;
}
For any hazard type H
, its member type and functions have the following meaning:
H::P
: Payload signal type.H::R
: Resolver signal type.H::ready
: Returns if the receiver is ready to receive with the current payload and resolver pair.
Examples
We provide a few handy primitive hazard interfaces for developers.
ValidH
ValidH
represents a communication without backpressure (always ready to receive).
It has the following specification:
struct ValidH<P: Copy, R: Copy>;
impl<P: Copy, R: Copy> Hazard for ValidH<P, R> {
type P = P;
type R = R;
fn ready(p: P, r: R) -> bool {
true
}
}
For reusability, we added additional resolver signals that simply flow from the receiver to the sender.
AndH
For a given hazard specification H
, the conjunctive AndH<H>
specification adds to H
's resolver signal an availability bit flag.
Then the receiver is ready if it is available and ready according to the internal specification H
at the same time.
struct AndH<H: Hazard>;
struct Ready<R: Copy> {
ready: bool,
inner: R,
}
impl<H: Hazard> Hazard for AndH<H> {
type P = H::P;
type R = Ready<H::R>;
fn ready(p: H::P, r: Ready<H::R>) -> bool {
r.ready && H::ready(p, r.inner)
}
}
The ready
field of the Ready
struct represents the availability of the receiver.
VrH
We defined the valid-ready hazard VrH<P, R>
as AndH<ValidH<P, R>>
.
type VrH<P: Copy, R: Copy> = AndH<ValidH<P, R>>;
For reusability, we added additional resolver signals that simply flow from the receiver to the sender.
Interface
An interface is an abstraction that represents the IO of a hardware module.
Typically, a single interface is composed of zero, one, or multiple hazard interfaces.
Specification
trait Interface {
type Fwd: Copy;
type Bwd: Copy;
...
}
For any interface type I
, its member types have the following meaning:
I::Fwd
: Forward signal type.I::Bwd
: Backward signal type.
It contains the other methods related to the module and combinator, please refer to these sections for further reading.
Hazard Interface
For an arbitraty hazard specification H
, we define the hazard interface I<H, D>
, where D
is the dependency type. (For more information of the dependency, please refer to the dependency section)
struct I<H: Hazard, D: Dep>;
impl<H: Hazard, const D: Dep> Interface for I<H, D> {
type Fwd = HOption<H::P>,
type Bwd = H::R,
}
- The interface's forward signal is an
HOption
type of hazard payload. - The backward signal is the hazard's resolver.
- When the forward signal is
Some(p)
means the sender is sending a valid payload, else it is sending an invalid payload signal. - When we have
payload.is_some_and(|p| H::ready(p, r))
, the transfer happens.
We define Valid
and Vr
as the hazard interface types for ValidH
and VrH
, respectively.
type Valid<P> = I<ValidH<P, ()>, { Dep::Helpful }>;
type Vr<P> = I<VrH<P, ()>, { Dep::Helpful }>;
Compound Interface
Compound types such as tuple, struct, and array also implement the Interface
trait.
These interfaces are commonly used for IO of 1-to-N or N-to-1 combinators.
Tuple
Tuple of interfaces (If1, If2)
implements Interface
trait as follows:
// In practice, it is implemented as a macro.
impl<If1: Interface, If2: Interface> Interface for (If1, If2) {
type Fwd = (If1::Fwd, If2::Fwd);
type Bwd = (If1::Bwd, If2::Bwd);
}
- The forward signal of the array interface is the tuple of the interfaces' forward signals.
- The backward signal of the array interface is the tuple of the interfaces' backward signals.
Struct
#[derive(Debug, Interface)]
struct<If1: Interface, If2: Interface> StructIf<If1, If2> {
i1: If1,
i2: If2,
}
By applying the Interface
derive macro to a struct in which all fields are interface type, the struct itself can also become an interface type.
Array
Array of interfaces [If; N]
also implements Interface
trait as follows:
impl<If: Interface, const N: usize> Interface for [If; N] {
type Fwd = Array<If::Fwd, N>;
type Bwd = Array<If::Bwd, N>;
}
- The forward signal of the array interface is the array of the interfaces' forward signal.
- The backward signal of the array interface is the array of the interfaces' backward signal.
Modules
In Rust, the FnOnce
trait is automatically implemented for types that can be called once. For example, the primitive function fn(i: Ingress) -> Egress
and closures that might consume captured variables (|i: Ingress| -> Egress { .. }
) implements FnOnce(Ingress) -> Egress
trait.
And in HazardFlow, we treat a function m
that implements the FnOnce(Ingress) -> Egress
trait where Ingress
and Egress
implement the Interface
trait as a module with Ingress
as its ingress interface and Egress
as its egress interface.
The common way to construct a module is chaining interface combinators. Please refer to the Interface Combinators for more information.
Example: FIR Filter
In the tutorial, the FIR filter module was implemented as follows:
#![allow(unused)] fn main() { fn fir_filter(input: Valid<u32>) -> Valid<u32> { let weight = Array::<u32, 3>::from([4, 2, 3]); input .window::<3>() .map(|ip| ip.zip(weight).map(|(e, wt)| e * wt)) .sum() } }
The fir_filter
function implements FnOnce(Valid<u32>) -> Valid<u32>
, so we can treat it as a module with Valid<u32>
as both its ingress and egress interface.
Module Combinators
We provide some convenient module combinators that take a module, modify it, and return a new module.
seq
Generates a 1D systolic array from an array of modules.
#![allow(unused)] fn main() { fn seq<I: Interface, O: Interface, J: Interface, const N: usize>( ms: [fn(I, J) -> (O, J); N], ) -> impl FnOnce([I; N], J) -> ([O; N], J) }
You can construct an array of modules explicitly from elements, or if all the modules have the same behavior, you can use the from_fn
API.
flip
Flips a module's input and output.
#![allow(unused)] fn main() { fn flip<I1: Interface, I2: Interface, O1: Interface, O2: Interface, T>( f: T ) -> impl FnOnce(I2, I1) -> (O2, O1) where T: FnOnce(I1, I2) -> (O1, O2), }
Interface Combinators
We can chain two interfaces by using combinators. We define a combinator as a method within each hazard interface. The combinator specifies the combinational logic of calculating the egress interface payload, ingress interface resolver, and the next state of the combinator.
Note that in this section, you will see the dependency type D
in the combinators' signature, for more information about the dependency, please refer to the dependency section.
Combine Blackbox Module to Interface
The comb
method within the interface trait is used to combine the blackbox module to the given interface and return the egress interface.
trait Interface {
..
fn comb<E: Interface>(self, m: impl FnOnce(Self) -> E) -> E {
m(self)
}
}
It is useful when we want to combine multiple modules together.
For example, we can combine multiple stage modules with comb
in the CPU core.
fn core(
imem: impl FnOnce(Vr<MemReq>) -> Vr<MemRespWithAddr>,
dmem: impl FnOnce(Vr<MemReq>) -> Vr<MemRespWithAddr>,
) {
fetch::<START_ADDR>(imem)
.comb(decode)
.comb(exe)
.comb(move |ingress| mem(ingress, dmem))
.comb(wb)
}
imem
anddmem
are modules for instruction memory and data memory, respectively.- We chain the 5 sub-modules
fetch
,decode
,exe
,mem
, andwb
by using thecomb
method.
Generic FSM Combinator
The essence of the interface combinator is the generic fsm
combinator.
We provide the fsm
combinator that transforms the ingress interface to egress interface with finite state machine.
With this combinator, you can represent an arbitrary FSM.
trait Interface {
type Fwd: Copy;
type Bwd: Copy;
fn fsm<E: Interface, S: Copy>(
self,
init_state: S,
f: impl Fn(Self::Fwd, E::Bwd, S) -> (E::Fwd, Self::Bwd, S),
) -> E {
.. // compiler magic
}
..
}
It accepts two arguments which are the FSM's initial state and the combinational logic.
We represent the combinational logic as a function (f
). It takes three input signals:
- Ingress interface's forward signal (
Self::Fwd
) - Egress interface's backward signal (
E::Bwd
) - Current state (
S
)
and returns three output signals:
- Egress interface's forward signal (
E::Fwd
) - Ingress interface's backward signal (
Self::Bwd
) - Next state (
S
)
Standard Combinator Library
We provide standard combinator library for developers to facilitate their work. We can roughly categorize the library into the following categories.
Category | Description |
---|---|
Mapping | Maintains the 1-to-1 relation between the ingress/egress interface. |
1-to-N | Splits the ingress interface into multiple egress interfaces. |
N-to-1 | Merges multiple ingress interfaces into one egress interface. |
Register | Stores the state into registers and could delay for one or multiple cycles. |
Source/sink | Generates or consumes the interface. |
FSM | Runs a finite state machine with an internal state. |
Conversion | Converts ingress hazard interface into egress hazard interface. |
For more details about all combinators, please refer to the rustdoc.
Mapping
These combinators either transform the payload (ingress to egress) or transform the resolver (egress to ingress).
We demonstrate the two most representative combinators: filter_map
and map_resolver
.
filter_map
As the name suggested, filter_map
combinator has the functionality of filter
and map
.
It filters out the payload not satisfying certain conditions and transforms the ingress payload P
to egress payload EP
.
We demonstrate the filter_map
which takes Vr<P>
and returns Vr<EP>
.
It can be implemented with using fsm
combinator like this:
impl<P: Copy> Vr<P> {
fn filter_map<EP: Copy>(self, f: impl Fn(P) -> HOption<EP>) -> Vr<EP> {
self.fsm::<Vr<EP>, ()>(|ip, er, _| {
let (ep, ir) = (ip.and_then(f), er);
(ep, ir, ())
})
}
}
It is stateless, and egress transfer happens when (1) ingress transfer happens, and (2) f
returns Some
with given ingress payload.
For example, let's consider the following f
:
fn f(i: u32) -> HOption<bool> {
if i == 0 {
// If `i` is zero, filters out the payload.
None
} else if i & 1 == 0 {
// If `i` is even number, returns `true`.
Some(true)
} else {
// If `i` is odd number, returns `false`.
Some(false)
}
}
Then the cycle-level behavior of filter_map
is as follows:
(T
and F
means true
and false
, respectively)
- Cycle 0: transfer happens both at ingress and egress side.
- Cycle 2: transfer happens at ingress side, but it was filtered out by
f
. - Cycle 5: transfer does not happens because egress is not ready to receive.
Note that filter_map
can work with ingress interfaces I<VrH<P, R>, D>
, I<ValidH<P, R>, D>
and I<H, D>
and there are variants of the filter_map
like filter_map_drop_with_r
.
map_resolver
This combinator transforms the egress resolver to the ingress resolver and leaves the payload untouched.
We demonstrate the map_resolver
with ingress interface I<VrH<P, R>, D>
. Similar to filter_map
, map_resolver
has other variants and can also work with other ingress interfaces.
impl<P: Copy, R: Copy, const D: Dep> I<VrH<P, R>, D> {
fn map_resolver<ER: Copy>(self, f: impl Fn(Ready<ER>) -> R) -> I<VrH<P, ER>, D> {
self.fsm::<I<VrH<P, ER>, D>, ()>(|ip, er, _| {
let ep = ip;
let ir = Ready::new(er.ready, f(er));
(ep, ir, ())
})
}
}
- It is stateless.
- It transforms the egress resolver into an ingress resolver with a given
f
within the same clock cycle. - The egress transfer condition is always the same as the ingress transfer condition.
- It leaves the ingress payload untouched.
- This combinator usually being used as a connector for two other combinators whose resolver types are not compatible.
For example, let's consider the following f
:
fn f(i: u32) -> bool {
// Returns the parity of `i`.
i & 1 == 0
}
Then the cycle-level behavior of map_resolver
is as follows:
- It does not touch the forward signals and backward ready signal.
- It transforms the backward inner signal to the parity.
- Transfer happens at both sides at cycle 1 and 5.
1-to-N
These combinators can either duplicate a single ingress interface into multiple egress interfaces or select one from the numerous egress interfaces to transfer the payload.
We demonstrate the two most representative combinators: lfork
and branch
.
lfork
This combinator delivers the ingress payload to all the egress interfaces' egress payload when all the egress interfaces are ready to receive the ingress payload, and also combines all the egress interfaces' resolvers to the ingress resolver.
We demonstrate lfork
with the ingress interface Vr<P, D>
, whose resolver is ()
.
Note that lfork
can work with other ingress interfaces such as I<VrH<P, (R1, R2)>, D>
, I<VrH<P, Array<R, N>>, D>
, etc.
impl<P: Copy, const D: Dep> Vr<P, D> {
fn lfork(self) -> (Vr<P, D>, Vr<P, D>) {
self.fsm::<(Vr<P, D>, Vr<P, D>), ()>(|ip, (er1, er2), _| {
let ep1 = if er2.ready { ip } else { None };
let ep2 = if er1.ready { ip } else { None };
let ir = Ready::new(er1.ready && er2.ready, ());
((ep1, ep2), ir, ())
})
}
}
- It is stateless.
- Ingress, first egress, and second egress transfer happens at same cycle.
The example cycle-level behavior of lfork
is as follows:
- Cycle 1, 4, 5: transfer happens at ingress, first egress, and second egress sides.
branch
This combinator splits a single ingress interface into multiple egress interfaces and only selects one of the egress interfaces to transfer the payload, also combines all the egress interfaces' resolvers into the ingress resolver.
We demonstrate branch
with the ingress interface Vr<P, BoundedU<N>>
.
BoundedU<N>
can be considered as a bounded unsigned integer, if N
is 3, the possible unsigned integers are 0, 1, 2.
For more information of the BoundedU<N>
, please refer to the doc.rs
.
impl<P: Copy, const N: usize> Vr<(P, BoundedU<N>)> {
fn branch(self) -> [Vr<P>; N] {
self.fsm::<[Vr<P>; N], ()>(|ip, ers, _| {
let Some((ip, sel)) = ip else {
// Ingress ready signal is true when valid signal is false.
return (None.repeat::<N>(), Ready::new(true, ers.map(|r| r.inner)), ());
};
let ep = None.repeat::<N>().set(sel.value(), Some(ip));
let ir = Ready::new(ers[sel.value()].ready, ers.map(|r| r.inner));
(ep, ir, ())
})
}
}
self
is the ingress interfaceVr<(P, BoundedU<N>)>
.- We can interpret the
BoundedU<N>
as the selector to choose egress interfaces for transferring the ingress payload. - When the selected egress interface ready signal is
true
, and also the ingress payload is valid, both ingress transfer and egress transfer happen, else both ingress and egress do not happen. - Ingress payload will only be transferred to the selected egress interface when both ingress and egress transfer happen.
- Ingress resolver and all the egress resolvers are
()
.
Let's assume the ingress payload is HOption<(u32, BoundedU<2>)>
.
The payload is u32
, and we split the ingress interface into 2 egress interfaces.
The cycle level behavior of branch
:
- Cycle 2: transfer happens at ingress and first egress side.
- Cycle 5: transfer happens at ingress and second egress side.
N-to-1
These combinators merge multiple ingress interfaces into a single egress interface. The egress interface could contain all the ingress interfaces' payload and resolver or select one of the ingress interfaces.
We demonstrate the two most representative combinators: join
and merge
.
join
This combinator merges the ingress interfaces' payload and resolver.
We demonstrate this combinator with the ingress interfaces (Vr<P1>, Vr<P2>)
.
impl<P1: Copy, P2: Copy> JoinExt for (Vr<P1>, Vr<P2>) {
type E = Vr<(P1, P2)>;
fn join(self) -> Vr<(P1, P2)> {
self.fsm::<Vr<(P1, P2)>, ()>(|(ip1, ip2), er, _| {
let ep = ip1.zip(ip2);
let ir1 = if ip2.is_some() { er } else { Ready::invalid(()) };
let ir2 = if ip1.is_some() { er } else { Ready::invalid(()) };
(ep, (ir1, ir2), ())
})
}
}
- The egress payload is an array of all the ingress payloads.
- The egress payload will be valid only when all the ingress payloads are valid.
- The ingress transfer happens only when all the ingress payloads are valid and also egress interface is ready to receive payload.
The example cycle-level behavior of join
is as follows:
- Cycle 1, 4, 5: transfer happens.
merge
This combinator will select one from the ingress interfaces to deliver the ingress payload to the egress payload and also leave the inner of the egress resolver untouched to the ingress interfaces.
We will demonstrate the cmerge
combinator with 2 ingress interfaces [Vr<u32>, Vr<u32>]
.
Note that in our code base, we implement cmerge
with ingress interfaces [I<AndH<H>, D>; N]
. We can consider the H
as ValidH<P>
and N
as 2.
impl<P: Copy> MergeExt for (Vr<P>, Vr<P>) {
type E = Vr<P>;
fn merge(self) -> Vr<P> {
self.fsm::<Vr<P>, ()>(|(ip1, ip2), er, _| {
let ep = ip1.or(ip2);
let ir1 = er;
let ir2 = if ip1.is_none() { er } else { Ready::invalid(()) };
(ep, (ir1, ir2), ())
})
}
}
- This combinator will select the first ingress interface, whose ingress payload is valid, from the array of the ingress interfaces, when the egress interface is ready to receive the payload.
The example cycle-level behavior of merge
is as follows:
- Cycle 1: first ingress and egress transfer happens.
- Cycle 3: first ingress and egress transfer happens.
- Cycle 5: second ingress and egress transfer happens.
Register slices
These registers can maintain the states in their registers and could delay one or multiple cycles to send out the states.
We demonstrate the two most representative combinators: reg_fwd
and fifo
.
reg_fwd
We can use reg_fwd
to make a pipeline, and reduce the critical path.
Let's assume we have a circuit with ingress interface Vr<P>
:
Transforming from ingress payload from P
to EP2
needs to happen within a cycle, the critical path becomes cpath(f1) + cpath(f2)
.
To resolve this, we add a reg_fwd
combinator between those two map
combinators.
Then the critical path is reduced to max(cpath(f1), cpath(f2))
.
impl<P: Copy> Vr<P> {
fn reg_fwd(self) -> Vr<P> {
self.fsm::<Vr<P>, HOption<P>>(None, |ip, er, s| {
let ep = s;
let et = ep.is_some_and(|p| er.ready);
let ir = Ready::new(s.is_none() || et, (er.inner, s));
let it = ip.is_some_and(|p| ir.ready);
let s_next = if it {
ip
} else if et {
None
} else {
s
};
(ep, ir, s_next)
})
}
}
- The current state is the valid egress payload.
- The ingress interface is ready to receive a valid payload whenever the current register is empty or the egress transfer happens.
- If the ingress transfer happens, then the register stores the new valid payload as the next state.
- If the egress transfer happens, but the ingress transfer does not happen, then the register will be empty in the next cycle.
- If neither ingress transfer nor egress transfer happens, then the next state stays the same as the current state.
- The only difference between
pipe
istrue
orfalse
is the ingress transfer happens only when the current register is empty, delaying one cycle compared to the pipeline.
Let's assume the ingress interface type is Vr<u32>
.
The example cycle-level behavior of reg_fwd
is as follows:
- Cycle 0, 1, 3, 5: Ingress transfer happens.
- Cycle 1, 2, 5: Egress transfer happens.
fifo
This combinator is a pipelined FIFO queue, it can accept a new element every cycle if the queue is not full.
impl<P: Copy> Vr<P> {
fn fifo<const N: usize>(self) -> Vr<P> {
self.fsm::<Vr<P>, FifoS<P, N>>(FifoS::default(), |ip, er, s| {
let FifoS { inner, raddr, waddr, len } = s;
let empty = len == U::from(0);
let full = len == U::from(N);
let enq = ip.is_some() && !full;
let deq = er.ready && !empty;
let ep = if s.len == 0.into_u() { None } else { Some(s.inner[s.raddr]) };
let ir = Ready::new(!full, ());
let inner_next = if enq { inner.set(waddr, ip.unwrap()) } else { inner };
let len_next = (len + U::from(enq).resize() - U::from(deq).resize()).resize();
let raddr_next = if deq { wrapping_inc::<{ clog2(N) }>(raddr, N.into_u()) } else { raddr };
let waddr_next = if enq { wrapping_inc::<{ clog2(N) }>(waddr, N.into_u()) } else { waddr };
let s_next = FifoS { inner: inner_next, raddr: raddr_next, waddr: waddr_next, len: len_next };
(ep, ir, s_next)
})
}
}
Let's assume the ingress interface type is Vr<u32>
with capacity is 3.
The example cycle-level behavior of fifo
is as follows:
- The ingress ready signal is
true
as long as the queue is not full. - It does not give maximum throughput because both ingress transfer and egress transfer cannot happen simultaneously when the FIFO is full.
- Cycle 0, 1, 2, 3, 6: Ingress transfer happens.
- Cycle 1, 5, 6, 7: Egress transfer happens.
Source and sink
These combinators have only either ingress interface or egress interface.
We demonstrate the two most representative combinators: source
and sink
.
source
This combinator immediately returns the data to the payload when the data is coming from resolver.
The source
combinator only has the egress interface.
impl<P: Copy> I<VrH<P, P>, { Dep::Demanding }> {
fn source() -> I<VrH<P, P>, { Dep::Demanding }> {
().fsm::<I<VrH<P, P>, { Dep::Demanding }>, ()>((), |_, er, _| {
let ep = if er.ready { Some(er.inner) } else { None };
(ep, (), ())
})
}
}
- The egress resolver type is the same as the egress payload type
P
. - The egress transfer happens as long as the egress resolver ready signal is true.
- It transfer the resolver to the payload within the same clock cycle with egress transfer happens.
When P
is u32
, the example cycle-level behavior of source
is as follows:
- Cycle 0, 1, 2, 4, 5: Egress transfer happens.
- Cycle 3: Egress transfer does not happen because egress ready signal is
false
.
sink
This combinator maintains a state and calculates the ingress resolver with f
, which takes the current state and ingress payload as inputs.
The sink
combinator only has the ingress interface.
impl<P: Copy> I<VrH<P, HOption<P>>, { Dep::Helpful }> {
fn sink(self) {
self.fsm::<(), ()>((), |ip, _, _| {
let ir = Ready::valid(ip);
((), ir, ())
})
}
}
When P
is u32
, the example cycle-level behavior of sink
is as follows:
- The ingress resolver is calculated every clock cycle.
FSM
FSM combinators run a finite state machine described by the closure given to the combinator. They have an internal state for the FSM, and the closure describes state transitions and how the combinator should behave each state.
The "FSM mapping" combinators such as fsm_map
or fsm_filter_map
are very similar to their regular mapping combinator counterparts.
The difference is that they have an internal state that can be controlled by the given closure.
The other two remaining FSM combinators are fsm_ingress
and fsm_egress
.
Since they have more complex behavior, we demonstrate their usage here.
fsm_ingress
It allows you to accumulate successive ingress payloads into an internal FSM state, then output the resulting state once it is ready. After the resulting state is transferred, the FSM is reset and starts accumulating again.
impl<P: Copy> Vr<P> {
fn fsm_ingress<S: Copy>(self, init: S, f: impl Fn(P, S) -> (S, bool)) -> Vr<S> {
self.fsm::<Vr<S>, (S, bool)>((init, false), |ip, er, (s, done)| {
let ir = Ready::new(!done, er.inner);
let it = ip.is_some() && !done;
let et = er.ready && done;
let ep = if done { Some(s) } else { None };
let s_next = if it {
f(ip.unwrap(), s)
} else if et {
(init, false)
} else {
(s, done)
};
(ep, ir, s_next)
})
}
}
It takes the initial state and the combinational logic which calculate the next state and whether the FSM is done.
For example, let's consider the following sum_until_10
function.
It accumulates the input numbers until it becomes greater or equal to 10, and outputs the result.
fn sum_until_10(input: Vr<u32>) -> Vr<u32> {
input
.fsm_ingress(0, |ip, _, sum| {
let sum_next = sum + ip;
let done = sum >= 10;
(sum_next, done_next)
})
}
The example cycle-level behavior of fsm_ingress
is as follows:
- At cycle 0-1 and cycle 3-4, ingress transfer happens (accumulates the input).
- At cycle 2 and cycle 5, egress transfer happens (outputs the result).
fsm_egress
It runs an FSM for each transferred ingress payload, allowing you to process an ingress payload using multiple FSM states. Only after the FSM is finished, the combinator can accept a new ingress payload.
impl<P: Copy> Vr<P> {
fn fsm_egress<EP: Copy, S: Copy>(self, init: S, flow: bool, f: impl Fn(P, S) -> (EP, S, bool)) -> Vr<EP> {
self.fsm::<Vr<EP>, (HOption<P>, S)>((None, init), |ip, er, (sp, s)| {
let (ep, s_next, is_last) = if let Some(p) = sp {
let (ep, s_next, is_last) = f(p, s);
(Some(ep), s_next, is_last)
} else if flow && ip.is_some() && sp.is_none() {
let (ep, s_next, is_last) = f(ip.unwrap(), s);
(Some(ep), s_next, is_last)
} else {
(None, s, false)
};
let et = ep.is_some() && er.ready;
let ir = Ready::new(sp.is_none() || (et && is_last), ());
let it = ip.is_some() && ir.ready;
let (sp_next, s_next) = if flow && it && et && sp.is_none() {
if is_last {
(None, init)
} else {
(ip, s_next)
}
} else if it {
(ip, init)
} else if et && is_last {
(None, init)
} else if et {
(sp, s_next)
} else {
(sp, s)
};
(ep, ir, (sp_next, s_next))
})
}
}
self
is the ingress interfaceI<VrH<P, R>, D>
.init
is the initial state for the FSM.flow
determines whether the FSM starts immediately or from the next cycle of an ingress transfer.f
takes the current saved ingress payload and the current FSM state. It returns an egress payload, the next state, and whether this is the last state for the FSM.
For example, let's consider the following consecutive_3
example.
It outputs 3 consecutive numbers starting from each input number.
fn consecutive_3(input: Vr<u32>) -> Vr<u32> {
input.fsm_egress(0, true, |p, count| {
let ep = p + count;
let count_next = count + 1;
let is_last = count == 2;
(ep, count_next, is_last)
})
}
The example cycle-level behavior of fsm_egress
is as follows:
- At T0, an ingress transfer happens and an FSM is started immediately (since
flow
is true).- The ingress payload
Some(0)
is saved tosp
, and will be available from the next cycle.
- The ingress payload
- From T0 to T2, the FSM is running with the saved payload
Some(0)
, and it outputs0
,1
,2
. - At T2,
f
returns true foris_last
, signaling that this is the last state for this FSM. This sets the ingress ready signal to true to accept a new ingress payload for the next FSM.- Since the ingress payload
Some(1)
is already available, an ingress transfer happens and it is saved.
- Since the ingress payload
- At T3, the FSM state is reset and a new FSM is started.
- From T3 to T5, the FSM is running with the saved payload
Some(1)
, and it outputs1
,2
,3
.
Implementing Your Own Combinators
The HazardFlow HDL standard library provides some primitive combinators, however we need to implement our custom combinators sometimes for specific logic. In fact, the HazardFlow HDL standard library is also implemented in the same way as what we will introduce in this section.
We introduced the masked_merge
combinator specification in the Tutorial section,
but we did not give out the concrete implementation.
The masked_merge
combinator is a custom combinator which is not provided by the HazardFlow HDL standard library,
thus we need to implement its logic by ourselves.
Implementation
To define a custom combinator for interface type [Vr<P>; N]
, we need to define a custom trait first.
/// Masked merge trait
trait MaskedMergeExt<P: Copy + Default, const N: usize>: Interface
where [(); clog2(N)]:
{
/// Hazard type
type EH: Hazard;
/// Fair Mux
fn masked_merge(self) -> I<Self::EH, { Dep::Demanding }>;
}
- The custom trait's name is
MaskedMergeExt
.- It specifies the general payload type
P
need to be constrained by theCopy
andDefault
traits. const N: usize
specifies the number of ingress interfaces.- The
MaskedMergeExt
need to extend theInterface trait
, since we need toimpl
this trait later for Implementing our custom combinator. where [(); clog2(N)]
is telling the HazardFlow HDL compiler thatclog2(N)
is a constant.
- It specifies the general payload type
- We define the egress hazard as
Hazard
type, which is a hazard protocol with a given payload, resolver, and theready
function. fn masked_merge(self) -> I<Self::EH, { Dep::Demanding }>
defines the combinator's namemasked_merge
and specifies the egress hazard isEH
.
We can define the combinational logic now.
impl<P: Copy + Default, const N: usize> MaskedMergeExt<P, N> for [Vr<P>; N]
where [(); clog2(N)]:
{
type EH = VrH<(P, U<{ clog2(N) }>), Array<bool, N>>;
fn masked_merge(self) -> I<Self::EH, { Dep::Demanding }> {
unsafe {
self.fsm::<I<VrH<(P, U<{ clog2(N) }>), Array<bool, N>>, { Dep::Demanding }>, ()>((), |ip, er, s| {
if !er.ready {
let ir = Ready::new(false, ()).repeat();
let ep = None;
return (ep, ir, s);
}
let ep_idx = ip.zip(er.inner).find_idx(|(p, selected)| p.is_some() && !selected);
let ep = if let Some(idx) = ep_idx { Some((ip[idx].unwrap(), idx)) } else { None };
let ir = Ready::invalid().repeat::<N>().set_cond(ep.is_some(), ep_idx.unwrap(), Ready::valid(()));
(ep, ir, s)
})
}
}
}
- We
impl
the custom traitMaskedMergeExt
for the compound interface type[Vr<P>; N]
. - We define the egress hazard as
VrH<(P, U<{ clog2(N) }>), Array<bool, N>>
- The egress interface is a valid-ready interface with valid-ready hazard.
U<{ clog2(N) }>
is the bit representation of the index of the ingress interfaces.- The inner resolver is
Array<bool, N>
which indicates the index of the ingress interfaces are present in the current queue.
unsafe
code block is necessary for Implementing your own custom combinator, sincefsm
function need to be in theunsafe
code block.- The
fsm
function specifies the egress interface type isI<VrH<(P, U<{ clog2(N) }>), Array<bool, N>
. - The
fsm
function takes two inputs and returns the egress interface.- The initial state of the combinator.
- An anonymous function takes three inputs.
- The ingress payload
ip
. The type isArray<HOption<P>, N>
, which is all the ingress interfaces' payload. - The egress resolver
er
. The type isReady<Array<bool, N>>
, which is a ready signal with an array that indicates the index of the ingress interfaces present in the current queue. - The initial state
s
. The type is simply a()
, since we do not have a state in this combinator.
- The ingress payload
- The anonymous function returns a tuple including:
- The egress payload
ep
. The type isHOption<(P, Array<bool, _>)>
, which is the actual data we want to transfer and also the index of the ingress interface. Note that the bit representation of an unsigned integer is an array ofbool
. - The ingress resolver
ip
. The type isArray<Ready<()>, N>
, which is an array of ready signals with sizeN
. It is a collection of the ready signal for each ingress interface.
- The egress payload
- If the egress resolver ready signal is
false
, which means the egress interface is not ready to transfer the payload, we are not transferring any payload.- We set the ingress resolver's ready signal as
false
- We set the egress payload as
None
.
- We set the ingress resolver's ready signal as
- If the egress resolver ready signal is
true
, which means the egress interface is ready to transfer the payload.- We find the first index of the ingress interfaces whose payload is
Some(P)
and also not in the current queue.zip
function zips two arrays and returns a new array whose element is a tuple of both elements from the two arrays.find_idx
finds the index of first element that satisfies given condition.
- We set the egress payload as a tuple containing the selected ingress interface's actual payload and its index.
- We set the selected ingress resolver ready signal as
true
and the rest of the ingress interfaces' ready signal asfalse
.
- We find the first index of the ingress interfaces whose payload is
Dependency Types
We defined the "dependency type" based on the dependency from the backward signal to the forward signal. The concept was inspired from the BaseJump STL.
Motivation: Combinational Loop
The following circuit is constructed by connecting source
, map_resolver
, and sink
combinators.
fn m() {
I::<VrH<Opt<u32>, Opt<u32>>>::source()
.map_resolver::<Opt<u32>>(|p: Ready<Opt<u32>>| p.inner.unwrap_or(0))
.sink()
}
In this circuit, we can consider the dependency between the f2
and b2
signals from two perspectives:
- Since the
source
combinator forwards the backward signals to the forward signals and themap_resolver
combinator preserves the dependencies between its forward and backward signals, there is a dependencyb2 -> b1 -> f1 -> f2
. - Since the
sink
combinator forwards the forward signals to the backward signals, there is a dependencyf2 -> b2
.
Thus, a cyclic dependency exists between f2
and b2
.
This is commonly referred to as a combinational loop, which makes the circuit never stabilize and causes the simulator to hang.
A combinational loop occurs when the output of some combinational logic is fed back into the input of that same combinational logic without any intervening register.
Dependency Types
To prevent combinational loops in circuit design, we use a type system called "Dependency Type".
We categorize interfaces into two kinds of types: Helpful
and Demanding
.
Helpful
: forward signals does not depend on its backward signals.Demanding
: forward signals depends on its backward signals, and they satisfy the condition that if the payload isSome
, the ready condition is true.
Based on the above definition, in the motivational circuit, both the interface between source
and map_resolver
and the interface between map_resolver
and sink
are of the Demanding
type.
NOTE: In our type system, we do not consider the case where the forward signals depend on the backward signals, the payload is
Some
, but the ready condition is false. This case is guaranteed not to occur when the circuit is designed only with the combinator defiend in our standard combinator library.
In HazardFlow, we defined the dependency type as an enum Dep
:
/// Dependency type of a hazard interface.
#[derive(Debug, Clone, Copy, PartialEq, Eq, PartialOrd, Ord, ConstParamTy)]
enum Dep {
/// The payload (`Fwd`) does not depend on the resolver (`Bwd`).
Helpful = 0,
/// The payload (`Fwd`) depends on the resolver (`Bwd`), and they satisfy the condition that if the payload is
/// `Some`, `Hazard::ready(p, r)` is true.
///
/// It is a bug to make the payload depend on the resolver but break the condition.
Demanding = 1,
}
and we annotated the dependency type to the hazard interface.
/// Hazard interface.
#[derive(Debug)]
struct I<H: Hazard, const D: Dep>;
The benefit of using dependency types is that we can restrict the usage of combinators that introduce a combinational loop, which leads to more productive circuit design. We will explain more with the example of interface combinators.
Examples: Types for Interface Combinators
Let's look into the IO type for the interface combinators in the motivational circuit.
source
impl<P: Copy> I<VrH<P, P>, { Dep::Demanding }> {
fn source() -> I<VrH<P, P>, { Dep::Demanding }> {
().fsm::<I<VrH<P, P>, { Dep::Demanding }>, ()>((), |_, er, _| {
let ep = if er.ready { Some(er.inner) } else { None };
(ep, (), ())
})
}
}
Since the forward signals (ep
) depend on the backward signals (er
), its egress interface has a Demanding
type.
Therefore, it returns an I<VrH<P, P>, { Dep::Demanding }>
.
map_resolver
impl<P: Copy, R: Copy, const D: Dep> I<VrH<P, R>, D> {
fn map_resolver<ER: Copy>(self, f: impl Fn(Ready<ER>) -> R) -> I<VrH<P, ER>, D> {
self.fsm::<I<VrH<P, ER>, D>, ()>(|ip, er, _| {
let ep = ip;
let ir = Ready::new(er.ready, f(er));
(ep, ir, ())
})
}
}
The egress forward signals (ep
) depend on the ingress forward signals (ip
),
while ingress backward signals (ir
) depend on the egress backward signals (er
).
Therefore, if the ingress interface has a Helpful
type (ir -/-> ip
),
then the egress interface will also have a Helpful
type (er -> ir -/-> ip -> ep
).
Similarly, if the ingress interface has a Demanding
type, then the egress interface will also have a Demanding
type.
Consequently, it takes an I<VrH<P, R>, D>
and returns an I<VrH<P, ER>, D>
, where D
can be any type.
sink
impl<P: Copy> I<VrH<P, Opt<P>>, { Dep::Helpful }> {
fn sink(self) {
self.fsm::<(), ()>((), |ip, _, _| {
let ir = Ready::valid(ip);
((), ir, ())
})
}
}
Since the backward signals (ir
) depend on the forward signals (ip
),
a combinational loop occurs when the ingress interface has a Demanding
type (ir
-> ip
-> ir
).
To prevent this, we only allow the ingress interface to have a Helpful
type.
Therefore, it takes only I<VrH<P, Opt<P>>, { Dep::Helpful }>
.
Now let's apply the combinator types introduced above to the motivational circuit.
The egress interface of source
combinator is I<VrH<u32, u32>, { Dep::Demanding }>
,
and the egress interface of map_resolver
combinator is I<VrH<u32, Opt<u32>>, { Dep::Demanding }>
.
However, since the sink
combinator only takes ingress interface of Helpful
type,
we cannot apply the sink
combinator to the egress interface of map_resolver
.
This prevents the occurrence of combinational loops.
To apply the sink
combinator, we need to change the type to Helpful
by using combinators like reg_fwd
.
Limitations
Currently, a limitation exists in that dependency types cannot capture all types of combinational loops.
This is because dependency types only check for intra-interface dependencies and do not consider for inter-interface dependencies.
The most common example of this is the fork-join
pattern.
In the circuit described above, a combinational loop exists with f1 -> b2 -> f1
and f2 -> b1 -> f2
.
However, the current type system cannot detect such loops.
For the lfork
combinator, since there is no dependency from b1 -> f1
or b2 -> f2
, if the ingress interface type is Helpful
, then both egress interface types will also be Helpful
.
And for the join
combinator, it is designed to take two Helpful
ingress interfaces (let's call them i1
and i2
) and return a Helpful
egress interface.
However, a combinational loop occurs if there is a dependency from the backward signals of i1
to the forward signals of i2
, or from the backward signals of i2
to the forward signals of i1
.
Since the current type system cannot represent inter-interface dependencies, this remains as a limitation. For now, we expect that these types of combinational loops will be detected by synthesis tools.
CPU Core (5-Stage Pipelined)
Our baseline CPU implements the RV32I+Zicsr ISA using a 5-stage pipelined design. It is based on the riscv-sodor processor developed by Berkeley, with slight modifications.
Overview: Pipelined Design
The 5-stage pipelined CPU core would ideally be decomposed as follows:
At a high level, each stage performs the following tasks:
-
Fetch: Computes the next PC and fetches the instruction bytecode from memory.
-
Decode: Decodes the bytecode and reads the value of source registers.
-
Execute: Performs arithmetic and bitwise operations based on the instruction.
-
Memory: Accesses memory for load/store instructions.
-
Writeback: Writes the result back to the destination register.
Example: lw
instruction
Let's consider the following program with next PC was computed as 0x8000012c
in the fetch stage.
By the previous auipc
and addi
instructions, the value of x3
becomes 0x80002000
, which is the start address of the data section (we omit the details of auipc
and addi
instructions here).
Disassembly of section .text:
...
80000124: 00002197 auipc x3, 0x2 # x3 <- 80002124
80000128: edc18193 addi x3, x3, -292 # x3 <- 80002000
8000012c: 0040a703 lw x4, 4(x3) # x4 <- mem[x3+4]
...
Disassembly of section .data:
...
80002004: deadbeef
...
Then the lw
instruction will be processed in each stage as follows:
-
Fetch: Accesses memory with the next PC
0x8000012c
and fetch the bytecode0x0040a703
. -
Decode:
-
Decodes the bytecode
0x0040a703
. It contains the operations in the later stages:- In the execute stage, computes the memory access address
x3 + 4
. - In the memory stage, loads data from the memory with the computed address.
- In the writeback stage, writes the load data to the destination register
x4
.
- In the execute stage, computes the memory access address
-
Reads the value of the source register
x3
(=0x80002000
).
-
-
Execute: Computes the memory access address
x3 + 4
(=0x80002004
). -
Memory: Accesses the memory with address
0x80002004
and get the data0xdeadbeef
. -
Writeback: It writes the data
0xdeadbeef
to thex4
.
It is quite intuitive, but it becomes more complex due to hazards between pipeline stages.
We will look deeper into hazards and their handling in the following subsections.
Hazards
In this section, we will explain the hazards that can occur in our baseline CPU.
The hazards could be classified into: structural, data, and control.
Structural Hazards
Structural hazard occurs when multiple pipeline stages need to access the same hardware resource such as memory, register file, and arithmetic logic unit (ALU) at the same clock cycle. It should be resolved by stalling the stage which trying to access later:
Solution: Stall
I1: lw x2, 0(x1) # write to x2
I2: sub x4, x3, x2 # read from x2 requires several cycle in the decode stage
I3: add x4, x3, x2 # need to wait in the fetch stage
In the example above, the I2
must wait several cycles in the decode stage to read the correct value of x2
from I1
(details on how I1
and I2
communicate will be explained later in here; for now, you can omit I1
in this example).
As a result, structural hazard occurs at cycle T+2 because I3
is ready to move to the decode stage while I2
still needs to occupy the decode stage.
To resolve this hazard, the decode stage asks I3
to stall at cycle T+2.
It is usually done by turning off the ready bit in the valid-ready protocol.
Data Hazards
Data hazard occurs when the processing of a pipeline stage depends on the result of later stages. It should be resolved by stalling the stage if its data dependency is not made available yet; or bypassing the necessary data from later stages in the same clock cycle.
For instance, a data hazard due to read-after-write dependency in CPU core is resolved either by stall the read instruction in the decode stage or by bypassing the result of the write instruction in the later stages to the read instruction in the decode stage.
Solution (1/2): Bypassing
In the decode stage, we need to read the value of source registers. When the RAW dependency happens, we can bypass the values from the later stages to the decode stage.
From the execute stage:
I1: add x3, x2, x1 # write to x3
I2: sub x5, x4, x3 # read from x3
At cycle T+2, I1
can bypass the value of x3
to the I2
.
From the memory stage:
I1: add x3, x2, x1 # write to x3
I2: nop
I3: sub x5, x4, x3 # read from x3
At cycle T+3, I1
can bypass the value of x3
to the I3
.
From the writeback stage:
I1: add x3, x2, x1 # write to x3
I2: nop
I3: nop
I4: sub x5, x4, x3 # read from x3
At cycle T+4, I1
can bypass the value of x3
to the I4
.
Solution (2/2): Stall
When data from a later stage is not yet ready, bypassing is not possible, and a stall becomes necessary. In our baseline CPU, there are two sources of stall: load instruction, and CSR instruction.
From the load instruction:
I1: lw x2, 0(x1) # write to x2
I2: sub x4, x3, x2 # read from x2
- At cycle T+2,
I2
need to be stalled at the decode stage becauseI1
did not reach the memory. - At cycle T+3,
I1
gets the value ofx2
from the memory, now can bypass the value to theI2
.
From the CSR instruction:
I1: csrr x2, mcause # write to x2
I2: bgez x2, 8000003c # read from x2
In our baseline CPU, the CSR is located in the memory stage.
- At T+2,
I2
need to be stalled at the decode stage becuaseI1
did not reach the CSR. - At T+3,
I1
gets the value ofx2
from the CSR, now can bypass the value to theI2
.
For more details on CSR instructions, please refer to the Chapter 2.8 of RISC-V Specifications.
Control Hazards
Control hazard occurs when a pipeline stage makes wrong predictions on which instructions to execute in the next clock cycle. It should be resolved, e.g., when the execute stage detects a misprediction, by discarding the mispredicted instructions at the fetch and decode stages and restarting from the correct next instruction.
Solution: Discarding and Restarting
In our baseline CPU, there are two sources of control hazard: branch misprediction and exception.
Branch Misprediction
I1: beq x2, x1, target # mispredicted to not taken
I2: add x5, x4, x3 # should be killed
I3: lw x5, 0(x4) # should be killed
target:
I4: sub x5, x4, x3 # correct target address
At cycle T+1, the fetch stage speculates that I1
's next instruction is I2
so that it is fetched from the instruction memory.
But at cycle T+2, the execute stage deduces that I1
's next instruction is in fact I4
.
As such, the mispredicted instructions I2
and I3
are discarded at cycle T+2, and the fetch stage is restarted with the correct next instruction I4
at cycle T+3.
Exception
I1: unimp # illegal instruction, redirect to trap vector
I2: add x4, x3, x2 # should be killed
I3: sub x4, x3, x2 # should be killed
I4: lw x4, 0(x3) # should be killed
trap_vector:
I5: csrr x5, mcause # trap handling logic
The illegal instruction I1
generates the exception and should be redirected to the trap vector to handle the exception.
At cycle T+3, the illegal instruction I1
reaches the CSR in the memory stage, and it returns the trap vector address.
As such, the mispredicted instructions I2
, I3
, and I4
are discarded at cycle T+3, and the fetch stage is restarted with the correct next instruction I5
at cycle T+4.
Implementation
With the existence of hazards, the 5-stage pipelined CPU core can be decomposed as follows:
with the following implementation (riscv32_5stage.rs):
const START_ADDR: u32 = 0x80000000;
fn core(
imem: impl FnOnce(Vr<MemReq>) -> Vr<MemRespWithAddr>,
dmem: impl FnOnce(Vr<MemReq>) -> Vr<MemRespWithAddr>,
) {
fetch::<START_ADDR>(imem)
.comb(decode)
.comb(exe)
.comb(move |i| mem(i, dmem))
.comb(wb)
}
imem
anddmem
are modules for instruction memory and data memory, respectively.- We chain the 5 sub-modules
fetch
,decode
,exe
,mem
, andwb
by using thecomb
method.
In the following subsections, we will explain the implementation details for each stage.
Fetch stage
The fetch stage mainly do the following things:
- Calculates the next PC for the upcoming instruction.
- Accesses IMEM to retrieve the instruction bytecode.
It can be decomposed into combinators as follows (code):
Input and Output
The IO interface type of the fetch stage is as follows:
Ingress
This is the first stage, it does not take any ingress interface.
Egress
It returns an egress interface with type I<VrH<FetEP, DecR>, { Dep::Demanding }>
.
Each of FetEP
and DecR
is defined as a struct with the following fields:
FetEP (in fetch.rs):
imem_resp
: IMEM response which contains the address (PC) and the data (inst bytecode).
DecR (in decode.rs):
redirect
: Represents the redirection PC when the control hazard occurs.
Behavior
Each combinator do the following things:
Computes the next PC (M0-M2)
M0 (source_drop
):
- Receives the IMEM response and redirection PC as resolver from the later modules:
- The IMEM response comes from M2. It contains the current PC and inst bytecode.
- The redirection PC comes from the execute stage and the memory stage.
- Forwards the IMEM response and redirection PC to the payload to compute the next PC.
M1 (filter_map
):
- Computes the next PC based on the incoming payload:
- If a redirection PC is provided, jump to it.
- Otherwise, proceed to the next sequential address (current PC + 4).
M2 (reg_fwd_with_init
):
- Creates a pipelined stage before accessing IMEM by storing the next PC in a register.
- When the circuit is reset, it is initialized with the designated start address (
START_ADDR
).
Accesses IMEM (M3)
- Constructs the IMEM request with
map
combinator. - Accesses the external IMEM module to fetch the instruction bytecode with
comb
combinator.- We use an asynchronous memory for memory, it provide the response in the same cycle.
- We used
attach_resolver
module combinator to attach additional resolver to the IMEM.
- Deconstructs the IMEM response with
map
combinator.
Discards on misprediction (M4-M5)
M4 (map_resolver_drop_with_p
):
- Attaches the IMEM response to the resolver signal for the next PC calculation.
- Turns on the ready signal when control hazard occurs to extract the payload from M2.
- This allows discarding invalid PC stored in the M2.
M5 (filter_map_drop_with_r_inner
):
- Filters out the payload when the redirection happens.
Decode stage
The decode stage mainly do the following things:
- Decodes the instruction.
- Reads the value of source registers.
It can be decomposed into combinators as follows (code):
Input and Output
The IO interface type of the decode stage is as follows:
Ingress
It takes an ingress interface with type I<VrH<FetEP, DecR>, { Dep::Demanding }>
.
You can check the explanation of FetEP
and DecR
in here.
Egress
It returns an egress interface with type I<VrH<DecEP, ExeR>, { Dep::Demanding }>
.
Each of DecEP
and ExeR
is defined as a struct with the following fields:
DecEP (in decode.rs):
wb_info
: Writeback information which contains the writeback address and selector.br_info
: Branch information which contains the branch type, target address' base and offset.alu_input
: ALU input.mem_info
: Memory information.csr_info
: CSR information.is_illegal
: Indicates that the instruction is illegal or not.pc
: PC.debug_inst
: Instruction (for debugging purpose).
ExeR (in exe.rs):
bypass_from_exe
: Bypassed data from the execute stage.bypass_from_mem
: Bypassed data from the memory stage.bypass_from_wb
: Bypassed data from the writeback stage.stall
: Destination register address of load or CSR instruction in the execute stage.redirect
: Redirection PC.rf
: Register file.
Behavior
Each combinator do the following things:
M0 (map_resolver_inner
):
- Constructs the ingress resolver of the decode stage.
M1 (reg_fwd
):
- Creates a pipelined stage before decoding the instruction.
- Sends a ready signal which indicates it will be free in the next cycle.
M2 (map
):
- Decodes the instruction.
M3 (map_resolver_block
):
- Stalls until the value of source registers are visible.
M4 (filter_map_drop_with_r
):
- Reads the value of source registers and attaches them to the payload.
- Filters out the payload when the redirection happens.
Execute stage
The execute stage mainly do the following things:
- Executes the ALU.
- Resolves the branch misprediction.
It can be decomposed into combinators as follows (code):
Input and Output
The IO interface type of the execute stage is as follows:
Ingress
It takes an ingress interface with type I<VrH<DecEP, ExeR>, { Dep::Demanding }>
.
You can check the explanation of DecEP
and ExeR
in here.
Egress
It returns an egress interface with type I<VrH<ExeEP, MemR>, { Dep::Demanding }>
.
Each of ExeEP
and MemR
is defined as a struct with the following fields:
ExeEP (in exe.rs):
wb_info
: Writeback information which contains the writeback address and selector.alu_out
: ALU output.mem_info
: Memory information.csr_info
: CSR information.is_illegal
: Indicates that the instruction is illegal or not.pc
: PC.debug_inst
: Instruction (for debugging purpose).
MemR (in mem.rs):
bypass_from_mem
: Bypassed data from the memory stage.bypass_from_wb
: Bypassed data from the writeback stage.redirect
: Redirection PC.rf
: Register file.
Behavior
Each combinator do the following things:
M0 (map_resolver_inner
):
- Resolves the branch misprediction based on the branch type and ALU output.
- Constructs the ingress resolver of the execute stage.
- Attaches the bypassed data, stall, and redirection PC for resolving data hazards.
M1 (reg_fwd
):
- Creates a pipelined stage before executing the ALU.
- Sends a ready signal which indicates it will be free in the next cycle.
M2 (map
):
- Executes the ALU.
M3 (map_resolver_block_with_p
):
- Attaches the ALU output to the resolver signal for the redirection PC calculation.
- Stalls until the data hazards have been resolved.
M4 (filter_map_drop_with_r_inner
):
- Attaches the ALU output to the payload.
- Filters out the payload when the redirection happens.
Memory stage
The memory stage mainly do the following things:
- Accesses memory for load/store instructions.
- Accesses CSR for illegal instruction and CSR instructions.
It can be decomposed into combinators as follows (code):
Input and Output
The IO interface type of the memory stage is as follows:
Ingress
It takes an ingress interface with type I<VrH<ExeEP, MemR>, { Dep::Demanding }>
.
You can check the explanation of ExeEP
and MemR
in here.
Egress
It returns an egress interface with type I<VrH<MemEP, WbR>, { Dep::Demanding }>
.
Each of MemEP
and WbR
is defined as a struct with the following fields:
MemEP (in mem.rs):
wb_info
: Writeback information which contains the writeback address and data.debug_pc
: PC (for debugging purpose).debug_inst
: Instruction (for debugging purpose).
WbR (in wb.rs):
bypass_from_wb
: Bypassed data from the writeback stage.rf
: Register file.
Behavior
Each combinator do the following things:
M0 (map_resolver_inner
):
- Constructs the ingress resolver of the memory stage.
- Attaches the bypassed data and redirection PC for resolving data hazards.
M1 (reg_fwd
):
- Creates a pipelined stage before accessing DMEM or CSR.
- Sends a ready signal which indicates it will be free in the next cycle.
- Computes the branch selector with
map
combinator. - Branches the interface into three for accessing different module (DMEM / CSR / None).
- Constructs DMEM request with
map
combinator. - Accesses the external DMEM module with
comb
combinator.- We use an asynchronous memory for memory, it provide the response in the same cycle.
- We used
attach_resolver
andattach_payload
to attach additional resolver/payload to the DMEM.
M4 (map_resolver_with_p
+ map
):
- Attaches the DMEM response to the resolver signal for the bypassing data calculation.
- Constructs the memory stage egress payload with
map
combinator.
- Constructs CSR request with
map
combinator. - Accesses the CSR module with
comb
combinator.- It provide the response in the same cycle.
M6 (map_resolver_with_p
+ map
):
- Attaches the CSR response to the resolver signal for the bypassing data calculation.
- It contains the redirection PC when exception happens.
- Constructs the memory stage egress payload with
map
combinator.
M7 (map_resolver_with_p
+ map
):
- Directly attaches the payload to the resolver signal bypassing data calculation.
- Constructs the memory stage egress payload with
map
combinator.
M8 (merge
):
- Selects one of transferrable egress interface of M4 (DMEM), M6 (CSR), and M7 (None).
- It is guaranteed to be processed in-order manner because the maximum concurrent instruction in the memory stage is limited to one.
Writeback stage
The writeback stage mainly do the following things:
- Write the result back to the destination register.
It can be decomposed into combinators as follows (code):
Input and Output
The IO interface type of the writeback stage is as follows:
Ingress
It takes an ingress interface with type I<VrH<MemEP, WbR>, { Dep::Demanding }>
.
You can check the explanation of MemEP
and WbR
in here.
Egress
This is the last stage, it does not return any egress interface.
Behavior
Each combinator do the following things:
M0 (map_resolver_inner
):
- Constructs the ingress resolver of the writeback stage.
- Attaches the bypassed data and register file for resolving data hazards.
M1 (reg_fwd
):
- Creates a pipelined stage before accessing regfile.
- Sends a ready signal which indicates it will be free in the next cycle.
M2 (sink_fsm_map
):
- Updates the register file.
- Attaches the register file to the resolver for reading value of source registers.
NPU Core (Based on Systolic Array)
TODO @woojin @gieun
Contributors
Here is a list of all the people who have worked on HazardFlow:
Current Contributors
Previous Contributors
- Seunghyeon Jeong
- Tanapoom Sermchaiwong