Migrating from MCU world to FPGA - PART 3

Setting up xilinx tools

We’ll be focusing on Spartan FPGA devices, hence we’ll use one of the tools that are compatible with this devices family. It’s not the most recent family of FPGAs from Xilinx, but it’s a good starting point because it’s highly documented (both on official and unofficial channels), and by far one of the most famous FPGA families out there. It’s also one of the less expansive.

I am not gonna get into the step-by-step installation process, but I’ll just add a checklist to guide you in your setup:

  • Connect to Xilinx website and create an account
  • Download the latest version of ISE software (should be V14.7). Careful, we’re talking about several gigabytes.
  • Setup ISE on your computer. If you can, avoid Windows 8, it was never really supported by Xilinx. (I was able to run ISE on windows 8, but the process was tedious).
  • In the setup process, you’ll be asked for a license: follow the steps to create a WebPack license, it’s a very functional license that’ll allow to get quite far until you start hitting its limits.
  • Ensure that ISE does run on your computer and that licenses are correctly detected.

When you get a window like the one below, you’re ready for the next step!

Creating a new project

Ok, now let’s get into more practical stuff. Just like with any IDE, ISE works with “projects”. Go ahead and create a new project (by clicking on File > New project). After choosing a name and a location for your project, you’ll be asked tougher questions, like the exact product family, device name, package, speed, etc. Honestly, it’s not very critical at this stage, if you select everything just like the picture below, you’ll be fine, but if you already know which chip you’ll be using at the end, go ahead and select it.

I’ll let you wander around through the interface and add a new VHDL module that we’ll call “hello_leds.vhd”. This module will have one clock input, 8 data inputs and 8 data outputs. We’re gonna build a (useless) module that takes an 8-bit number, adds 50 to it, inverts the results, subtracts 20, then inverts the result again, then finally adds 10 to output the result as an 8-bit binary code on some LEDs. So, to sum up, the output would be equal to:

That example is not more or less useful than the classical “Hello World” example you find in computer programming, the only aim here is to build a “playground” to test different approaches.

Your new VHDL module wizard should look like the picture above. Next step, a *.vhd file will be created. This is your module. This will automatically be your “top” module since there is only one module in your design. You can edit the code with any text file editor. Notepad++ does VHDL syntax highlighting (but there are many other solutions).

So you should end up with a file that looks like this:

library IEEE;
use IEEE.STD_LOGIC_1164.ALL;

-- Uncomment the following library declaration if using
-- arithmetic functions with Signed or Unsigned values
--use IEEE.NUMERIC_STD.ALL;

-- Uncomment the following library declaration if instantiating
-- any Xilinx primitives in this code.
--library UNISIM;
--use UNISIM.VComponents.all;

entity hello_leds is
    Port ( CLK : in  STD_LOGIC;
           INPUT : in  STD_LOGIC_VECTOR (7 downto 0);
           OUTPUT : out  STD_LOGIC_VECTOR (7 downto 0));
end hello_leds;

architecture Behavioral of hello_leds is

begin


end Behavioral;

As stated before, this is not an exhaustive VHDL tutorial, but rather a quick and minimalistic migration guide, so, here is a quick heads up: You’ll notice that commented lines start with “–” The first block of code includes IEEE library and specifies which package to use from this library. Then comes a part that describes the in/out ports of your design. The word “STD_LOGIC_VECTOR” means a bus of bits. The part “(7 downto 0)” means it’s an 8 bits bus. If you didn’t notice, that’s still a rather inert piece of code. Let’s write some more lines to perform the actual transfer function of that module, which is the following:

Here is the new code:

library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use IEEE.NUMERIC_STD.ALL;

entity hello_leds is
    Port ( CLK : in  STD_LOGIC;
           INPUT : in  STD_LOGIC_VECTOR (7 downto 0);
           OUTPUT : out  STD_LOGIC_VECTOR (7 downto 0));
end hello_leds;

architecture Behavioral of hello_leds is

begin

	my_process : process(CLK) 

	begin		
		if (CLK'event and CLK = '1') then
			OUTPUT <= std_logic_vector(NOT(NOT(unsigned(INPUT) + 50) - 20) + 10);
		end if;	
	end process;
end Behavioral;

Before going any further, please note that this is the worst-ever implementation of the required function (although it is theoretically correct). Nonetheless, the specific implementation will be helpful to describe what you should not do, and why this design is flawed. But, for now, let’s try to understand the lines that were added.

The first added line:

use IEEE.NUMERIC_STD.ALL;

Is just another library we need to work with signed or unsigned numbers. The second block of code is a process, a piece of code (or a part of a circuit) that is clocked by the signal CLK. In other words, all that is described between…

my_process : process(CLK)

…and

end process;

…will only “happen” when the signal CLK changes. The way CLK is written after the word process means that CLK is part of the sensitivity list of that process. They can be several signals in that sensitivity list, but most often, we only find a clock signal and a reset signal. Now if you look at what’s inside that block, you’ll find this line:

if (CLK'event and CLK = '1') then

Which is an if block, just like the ones you regularly use in C programming. The condition here can be translated to: “If clock changed, and if after that change its value is 1”. This could also very simply translate to “If a rising edge was detected on CLK”. So, to recap, this line of code (which actually performs the transfer function of our module):

OUTPUT <= std_logic_vector(NOT(NOT(unsigned(INPUT) + 50) - 20) + 10);

will only happen if there is a rising edge on CLK signal. I am using the word “happen” instead of “execute” because I don’t want you to forget that, at the end, we’re describing a logic circuit, and this code will never actually be “executed” in the FPGA.

There are two new functions used in that line of code. First, std_logic_vector() can be used as a function, to convert another signal type to std_logic_vector. Then, the function unsigned() is used to convert a std_logic_vector to the unsigned type. It’s quite like type cast in C programing: We can’t perform additions and subtractions to a std_logic_vector, unless we explicitely tell the synthesis tool how those bits should be interpreted (i.e. signed or unsigned). So, in our case, we need to convert that vector to unsigned, then, we need to convert the result back to std_logic_vector, which is the same type as OUTPUT.

You may now go ahead and synthesize this circuit by double-clicking on the synthesize line as in the following image:

After that, you can double click on view RTL schematic. I encourage you to wander around the RTL diagram and explore how your design was actually converted in a logic circuit. You should see something like this:

You can even go to the simulation view, and double-click on the hello_leds module to run a simulation. I’ll let you discover iSim tool on your own (you can right-click on signals to apply test signals quickly). You should get a result similar to this:

This all looks pretty good, you would say. And indeed, it looks good, the input is correctly converted to the required output. But we’re not done yet. We’re not even close to it. You’ll discover why very shortly. But for now, let’s assume that everything is fine. (You’re up for a big surprise!)

Let’s move one step further along our design and add some constraints, that is, timing constraints to specify the frequency of the input clock, and location constraints to fix the actual locations of the GPIO pins on the FPGA chip.

Go ahead and switch back to the “implementation view” if you’re still in “Simulation view”. Now let’s add some timing constraints. There are many ways to do that, and I encourage you to explore different possibilities. A very effective way is to manually add an “Implementation constraints file” to the workspace and call it “hello_leds.ucf” then copy paste this content in it:

NET "CLK" TNM_NET = "CLK";
TIMESPEC TS_CLK = PERIOD "CLK" 10 ns HIGH 50 %;
INST "INPUT[0]" TNM = "IN_time_group";
INST "INPUT[1]" TNM = "IN_time_group";
INST "INPUT[2]" TNM = "IN_time_group";
INST "INPUT[3]" TNM = "IN_time_group";
INST "INPUT[4]" TNM = "IN_time_group";
INST "INPUT[5]" TNM = "IN_time_group";
INST "INPUT[6]" TNM = "IN_time_group";
INST "INPUT[7]" TNM = "IN_time_group";
TIMEGRP "IN_time_group" OFFSET = IN 10 ns VALID 10 ns BEFORE "CLK" RISING;
INST "OUTPUT[0]" TNM = "OUT_time_group";
INST "OUTPUT[1]" TNM = "OUT_time_group";
INST "OUTPUT[2]" TNM = "OUT_time_group";
INST "OUTPUT[3]" TNM = "OUT_time_group";
INST "OUTPUT[4]" TNM = "OUT_time_group";
INST "OUTPUT[5]" TNM = "OUT_time_group";
INST "OUTPUT[6]" TNM = "OUT_time_group";
INST "OUTPUT[7]" TNM = "OUT_time_group";
TIMEGRP "OUT_time_group" OFFSET = OUT 10 ns AFTER "CLK";

NET "INPUT[5]" LOC = N3;
NET "INPUT[4]" LOC = R1;
NET "INPUT[3]" LOC = P2;
NET "INPUT[2]" LOC = N1;
NET "INPUT[1]" LOC = M1;
NET "INPUT[0]" LOC = R2;
NET "INPUT[7]" LOC = P1;
NET "INPUT[6]" LOC = M2;
NET "OUTPUT[1]" LOC = K1;
NET "OUTPUT[0]" LOC = K2;
NET "OUTPUT[7]" LOC = G1;
NET "OUTPUT[6]" LOC = G3;
NET "OUTPUT[5]" LOC = J1;
NET "OUTPUT[4]" LOC = H1;
NET "OUTPUT[3]" LOC = J3;
NET "OUTPUT[2]" LOC = H2;
NET "CLK" LOC = F1;

Don’t worry if you don’t understand all the lines of that file for now, ISE includes many tools that will generate those for you. You can find those tools under the user constraints tab as in the image below:

What’s important to note for now is that the *.ucf file you’ve just added, tells ISE where each pin should be bonded on the physical FPGA chip, and at what frequency the input clock is running. In our example, we used 100MHz (10ns period), which is something FPGA should be able to handle easily. We have also placed the clock input in a “clock dedicated site”, to achieve the best possible performance.

Now, let’s move to the next step in the design flow and click on “Implement design”. This should take a few minutes depending on the performance of your machine, and the design goes from Translate, to Map to Place and Route.

The place and route step is the most critical one. As the name implies, that’s where the design tool (ISE) tries to actually allocate actual slices of the FPGA to build the circuit you described in FPGA, then it has to route all the connections between the gates, just like you would do when routing a circuit board.

Place & Route is done? Okay, now check the project summary, you should see this:

There is 1 failing constraint, one that ISE couldn’t meet. You can click on the link to see exactly what’s the root cause, but I can tell you without a doubt that it’s caused by the poorly written VHDL code. No offense, I dragged you into this, to show exactly how you should not write VHDL code. Now, let’s see how to solve this constraint problem.

A solution would be to reduce the clock frequency until we meet all constraints. But that’s barely a solution. It’s a last resort when everything else fails. A real solution would be to “pipe line” the design. Instead of performing the operation in 1 clock cycle, perform it in 2 or 3 clock cycles.

Let’s see how this could be coded in VHDL:

entity hello_leds is
    Port ( CLK : in  STD_LOGIC;
           INPUT : in  STD_LOGIC_VECTOR (7 downto 0);
           OUTPUT : out  STD_LOGIC_VECTOR (7 downto 0));
end hello_leds;

architecture Behavioral of hello_leds is
signal temp1 : unsigned(7 downto 0);
signal temp2 : unsigned(7 downto 0);
begin

	my_process : process(CLK) 

	begin
		
		if (CLK'event and CLK = '1') then
			temp1 <= NOT(unsigned(INPUT) + 50);
			temp2 <= NOT(temp1 - 20);
			OUTPUT <= std_logic_vector(temp2 + 10);
		end if;
		
	end process;

end Behavioral;

You’ll notice that we have added two signals (buses) called temp1 and temp2. Also, the function carried by the module was broken into several steps. Three steps actually. The code above can be drawn as the following RTL diagram where the three steps can be clearly seen.

This architecture is much more efficient, as opposed to our first, flawed design:

Running the “Implement design” step again you notice a few things. First, the place and route runs successfully as seen in the image below, but also you’ll notice that the process is much much faster. That’s normal because a “well designed” VHDL code can be easily implemented. On the other hand, a bad design takes a lot of effort and processing power as the tool tries millions of combinations of placement and routing that could potentially meet the constraints that you defined.

Finally, you can run the simulation again, and notice that we get exactly the expected outputs. You can also see that, as expected, the new code introduced some latency: It takes 3 clock cycles for the result to appear on the OUTPUT port.

Conclusion

This concludes this 3 parts tutorial about migrating from the MCU world to the FPGA world. I hope that you enjoyed reading it and that it helped you to understand the different approaches it takes to apprehend a VHDL design.

FPGA and MCUs are totally different beasts. Amazing things can be accomplished by making both FPGA and MCU coexist in a system, letting FPGA handle very fast signals or highly parallelized systems, and letting the MCU run slower sequential tasks. It’s even possible to embed an MCU into your FPGA, but that’s a totally different story!