Migrating from MCU world to FPGA - PART 2

Introduction

In Part 1 we have explained what are FPGAs and how they are different from micro-controllers. In this second part, we’re going to dig deeper into the world of FPGAs and discuss some of the most important aspects that you need to understand in order to get started.

What is RTL and how it works

RTL is one of the most important terms you’ll often hear about in the world of FPGA: it stands for Register Transfer Level. RTL is the most efficient way of building high speed FPGA architectures. Actually, most of the times, it’s the only practical way of getting things done in an FPGA!

Before getting into specifics, let’s first talk about clocks. In an FPGA, clock is paramount. Compared to a microcontroller, where a clock is not something you worry much about, in an FPGA, the clocking is something you’ll need to constantly monitor all along the way. Clock is what gets information moving in your FPGA, from the input, to logic blocks that “process” those inputs, to output pins.

The clock in an FPGA is used to clock “registers”. Registers are simply a bunch of flip-flops in parallel, sharing the same clock source. Now let’s look at the diagram below, this is an RTL representation that you may find in many articles out there, and it’s important to be able to read it. You’ll notice that there are two 8 bit registers R1 and R2, and also, you’ll notice that clock signals are not drawn: This is not a mistake, but we usually assume that all registers in a design share the same clock (sharing the same clock makes things drastically simpler on the long run).

Now, let’s analyze this diagram a little further, here is what it does which each clock cycle (the table below is oversimplified, and not completely correct, but we’ll fix that a few paragraphs later)

Clock cycle Register 1 Register 2
1 A is to transfered to B  
2   C is transferred to D

On the first clock cycle, the logic level on the pins of the FPGA is transferred from point A to point B, and nothing will happen at point B until the next clock cycle. The “bunch of logic gates” will process the data that appeared at point B to create the result at point C. What the bunch of logic gates does is of no importance for the moment. What’s important, is that the result “C” need to be ready before the next clock cycle. The time it takes for the result “C” to be constructed from the input “B” depends on the number of logic gates the signals need to go through, and the propagation delay in each gate. On the second clock cycle, the data at “C” is transferred to D and appears at the output pins (after some propagation delay).

But wait! We talked about what happens to register R2 on the second clock cycle, but what about the register R1? Well, on the second clock cycle, brand new signals were “clocked-in”, so the actual table showing the transfer of signals would look like this:

Clock cycle Register 1 Register 2
1 A1 is to transfered to B1 undefined operation
2 A2 is to transfered to B2 C1 is transferred to D1
3 A3 is to transfered to B3 C2 is transferred to D2

If you understood what happened here, well, congratulations, you understood one of the most important concepts in modern digital systems: “pipelining”.

FPGA pipelining

I won’t focus too much on this, but it’s worth explaining in a few words what is FPGA Pipelining: You’ll notice that to get data from the input all the way to the output, you needed 2 clock cycles, right? But after two clock cycles have elapsed, you’ll get a new processed data byte on each new clock cycle. That’s a pipelined system with a latency of “2”. Pipelining is very powerful, because it allows a design to “break” a huge logic circuit into smaller “bunches of logic gates”, where signals can propagate fast enough. At the end, you may get a design that can run at very high clock rates. Even if you need to wait for 10 or 20 clock cycles until the very first result appears at the output, after this first latency period, you’ll get a new valid output on each new clock cycle. This is what makes it possible to build FPGA designs that process information at speeds that go beyond 200 MHz. I advise you to search the internet for “FPGA pipelined RTL” for many tutorials out there when you feel you’re ready to dig further into this.

Notes about clocks

Before ending this section about RTL, I’d like to talk about clocks, again, and particularly two things: Clock edges and clock domains.

In VHDL - that translates later info logic circuits - you can’t have registers that are clocked on both edges of a clock cycle. You can either synchronize your system with rising edges or falling edges, but never both. (This is not completely true, there are some exceptions to this, like DDR interface - double data rate - but it’s quite sophisticated). A piece of advice: stick to one and only clock edge polarity for your whole design. Mixing rising and falling edge is not for the faint of hearts, and is actually against almost every rule that exist around FPGA designs!

The next subject about clock is “clock domains”. If you have only 1 clock frequency for your whole system, then you have 1 clock domain. If you have two clocks that are not in sync, then you have two clock domains. I won’t say that it’s bad to have different clock domains, because this is common practice, but you need to know that extra care should be taken when mixing two different clocks in the same design. Wanna know what may go wrong with different clocks mixing up? Search the internet for “clock metastability”. There are plenty of nice articles about it, but for now, you just need to remember that it’s bad and you want to avoid it!

Two methods exist to avoid problems that may arise from mixing clocks. The first one is ensuring all clocks are synchronized, that is, ensuring that the active edge of the clock (rising or falling) happen at the same time.

Below is an example of 3 clocks that don’t have the same frequency, but that have their active edges perfectly aligned. In that situation, no metastability problems may occur.

Now, if you can’t guarantee edge alignment, which is often the case, the magic trick is to use FIFO elements. A FIFO is a memory architecture whose role is often to glue two clock domains.

FIFO memory is a very limited and precious resource in an FPGA. In case of xilinx devices and development tools, a wizard is used to generate a FIFO module that can be instantiated in your design. Actually it’s called an IP Core, but from a usage point of view, it’s just like a VHDL module, except that you can’t see the source - you just know that it works and does the job.

What’s next?

In the next part of this article, we’re going to show you how to setup software tools and creating your first FPGA project.