[ Content | Sidebar ]

Archives for vhdl

VHDL Compiler Updates

May 5th, 2017

It’s been a long time since I wrote anything about the VHDL compiler and simulator I’m working on, nvc, but I am still developing it, albeit at a slightly slower pace than before.

Since the last update I’ve started making stable releases, and yesterday I released version 1.2.1. Actually I think most people are better off using the master branch: one thing I’m really proud of is the coverage of the regression tests, and they run on every commit using Travis so it should always be stable enough for everyday usage. The “stable” versioning is really to satisfy the requirements of packagers. For instance it is now part of Homebrew for OS X.

The big change in the last year has been the removal of the old hacky constant folding pass and replacing it with one based on the “vcode” intermediate code layer that I wrote about before. The old version used to walk over the AST nodes and attempt to collapse them into a simpler form by evaluating expressions and even evaluating function calls. Unfortunately this was often buggy and sometimes didn’t match the run-time result the code generator would have produced. Now constant folding is done using the same AST-to-vcode lowering pass the code generator uses, to generate vcode “thunks” for compile-time constant expression. These are then evaluated using a new vcode interpreter. This now means almost all side-effect free functions with constant arguments will be folded at compile time. Even those with complex looping, recursion, memory allocation, etc.

Unfortunately this rewrite took way longer than expected, partly due to not having much time to work on it, and partly because all the dependencies into the rest of the code got very complex.

Apart from that I’ve fixed a lot of bugs. And this means you can simulate some quite complex designs in it, for instance the J-Core open source SuperH CPU clone.

Filed in vhdl - Comments closed

New Code Generator

April 6th, 2015

I haven’t written much about my VHDL compiler recently and there has been little public GitHub activity. This is because I’ve spent the last six months completely rewriting the way the back-end code generation works and I finally merged the result back into the master branch last week. This rather lengthy post attempts to explain what I’ve done.

There were numerous problems with the old code generator. Before the rewrite cgen.c was a 7000 line behemoth that took VHDL syntax trees at one end and produced a complete LLVM module at the other. It dealt with everything from the minutiae of memory layout and branch prediction hints, to how to implement displays for nested sub-programs, to high level VHDL rules for things like bounds checking.

It was also bordering unmaintainable as it had started life as a small toy project to learn LLVM and slowly accreted features and optimisations as I implemented more and more of VHDL. There were several bugs in it that were just too painful to fix.

It was also very difficult to test: most of the rest of the modules are tested using the check C unit test framework but I couldn’t find any way to do this with the LLVM output. Instead the code generator is tested indirectly by the regression tests that run through the full compiler and simulator: this makes it hard to test for small specific features, like optimisations.

I also had several other plans for VHDL-specific optimisations and unifying the elaboration-time evaluator with the code generator that just weren’t tractable with the current system. For example the old code generator threw away most VHDL type information early on and reduced variables and signals to LLVM’s 8, 16, 32, or 64-bit integers which misses many opportunities to eliminate bounds checks.

So I finally decided to rewrite it using a scheme I’d been toying with for a while. The back end is now based around a new intermediate form called “V-code” after the P-code of Pascal. A new “lowering” phase in lower.c translates VHDL syntax trees into V-code and the code generator in cgen.c translates V-code into LLVM.

The intention of the lowering phase is to translate VHDL into something executable without getting bogged down with implementation concerns. The code generation phase focusses on generating efficient LLVM bitcode without worrying about high-level features.

I wanted V-code to be higher-level than LLVM – so that I could express VHDL features like signals, processes suspending, bounds checking etc. – but lower level than VHDL so I could hide complexities such as array directions, aggregates, non-zero-based indices, and so on.

I think the V-code language I ended up with fits this pretty well. The model is infinite register single static assignment just like LLVM but simplified in many ways – for example there are no phi nodes. V-code is strongly typed having both “normal” types such as integers, pointers, records, and arrays and VHDL-specific types such as signals, accesses, and files.

Unlike many similar intermediate languages where integer types are a certain bit width, in V-code integer registers have two ranges associated with them: a static “type” range and a dynamic “bound” range. For example, consider the following snippet:

subtype small_int is integer range 1 to 10;
 
function get_bit(x : bit_vector(1 to 20);
                 n : small_int) return bit is
    variable i : integer := n;
begin
    return x(i + 1);
end function;

If we compile this with the --dump-vcode argument the compiler will output the V-code:

Name       WORK.PACK.GET_BIT$JQuWORK.PACK.SMALL_INT;
Kind       function
Context    WORK.PACK-body
Blocks     1
Registers  12
Types      9
Variables  1
  I                                     // -2^31..2^31-1 
Result     0..1
Parameters 2
  r0    X                               // @<0..1> => 0..1 
  r1    N                               // -2^31..2^31-1 => 1..10 
Begin
   0: // Elided bounds check for r1 
      I := store r1
      r3 := const 1                     // -2^31..2^31-1 => 1 
      r4 := add r1 + r3                 // -2^31..2^31-1 => 2..11 
      // Elided bounds check for r4 
      r8 := sub r4 - r3                 // -2^31..2^31-1 => 1..10 
      r9 := cast r8                     // # => 1..10 
      r10 := add r0 + r9                // @<0..1> => 0..1 
      r11 := load indirect r10          // 0..1 
      // Elided bounds check for r11 
      return r11

The comments after each register definition give the type range and the bound range after a “=>” if different. The parameter N represented by register r1 has a type range of -2^31..2^31-1 as it’s a subtype of INTEGER but a bound range of 1..10 which means the compiler can assume its value is always within this range.

Here the old code generator would have inserted three bounds checks: one on the array access and two trivial subtype checks on I‘s initial value and the returned value. The new code generator can eliminate all three as indicated by the “Elided bounds check for …” comments.

The interesting case is the array access: the expression I + 1 ends up in register r4. This has type range -2^31..2^31-1 again so it’ll be represented by a 32-bit integer but the bound range is 2..11 which is calculated from the range of the LHS (1..10) and the RHS (1..1). When we compare this against the array range 1..20 we know this check can never fail and so skip it.

If we change the return statement to this:

return x(i + 11);

Then r4 has range 11..20 and it’s possible for this array access to be out-of-bounds so the lower phase inserts a runtime check:

   0: // Elided bounds check for r1 
      I := store r1
      r3 := const 11                    // -2^31..2^31-1 => 11 
      r4 := add r1 + r3                 // -2^31..2^31-1 => 12..21 
      r5 := const 1                     // -2^31..2^31-1 => 1 
      bounds r4 in 1 .. 20
      r9 := sub r4 - r5                 // -2^31..2^31-1 => 11..20 
      r10 := cast r9                    // # => 11..20 
      r11 := add r0 + r10               // @<0..1> => 0..1 
      r12 := load indirect r11          // 0..1 
      // Elided bounds check for r12 
      return r12

As an aside, I spent a lot of time making the debug output from V-code human readable. The output is colour-coded and all register definitions are annotated with their type and bounds. I’ve wasted too much time trying to parse the LLVM bitcode textual format!

At the moment bounds propagation only occurs for certain arithmetic operations and only across a single basic block. However this seems to be effective for a large number of practical examples. Another neat side-effect of this scheme is that V-code has constant folding almost for free: a “constant” is any register with a bound range containing a single value. The emit_* functions which generate op-codes typically short-circuit and return a constant register when all their inputs are constants.

The following example shows how V-code represents VHDL-specific features like signals and process control flow.

architecture rtl of example is
    signal x : bit;
begin
    process is
    begin
        x <= '1';
        wait for 1 ps;
        x <= '0';
        wait for 2 ps;
    end process;
end architecture;

The wait-for statement here causes the process to suspend for the given amount of simulation time and then resume at the following statement. In NVC this is implemented as a jump table at the start of a function generated for a process. A wait statement stores the index of the next block in the process’s global state object and returns from the function. This works well but trying to debug the control flow of the generated LLVM can be very difficult. Instead V-code has “waiting” as a first-class control flow op-code: it looks just like a normal jump and the implementation details are hidden in the code generator.

Name       :example:line_7
Kind       process
Context    WORK.EXAMPLE.elab
Blocks     4
Registers  14
Types      8
Variables  0
Begin
   0: r0 := nets :example:x             // $<0..1> => 0..1 
      r1 := const 1                     // # => 1 
      alloc driver nets r0+r1 driven r0+r1
      return 
   1: r2 := const 0                     // -2^63..2^63-1 => 0 
      r3 := nets :example:x             // $<0..1> => 0..1 
      r4 := const 1                     // 0..1 => 1 
      // Elided bounds check for r4 
      r5 := const 0                     // -2^63..2^63-1 => 0 
      r6 := const 1                     // # => 1 
      sched waveform r3 count r6 values r4 reject r2 after r5
      r7 := const 1000                  // -2^63..2^63-1 => 1000 
      wait 2 for r7
   2: r8 := const 0                     // -2^63..2^63-1 => 0 
      r9 := nets :example:x             // $<0..1> => 0..1 
      r10 := const 0                    // 0..1 => 0 
      // Elided bounds check for r10 
      r11 := const 0                    // -2^63..2^63-1 => 0 
      r12 := const 1                    // # => 1 
      sched waveform r9 count r12 values r10 reject r8 after r11
      r13 := const 2000                 // -2^63..2^63-1 => 2000 
      wait 3 for r13
   3: jump 1

Operations on signals are also first-class: the type annotation $<0..1> means a signal of values in the range 0..1 (i.e. the bit type). The op-codes “alloc driver” and “sched waveform” handle common operations on signals without confusing the logic with implementation details (both of these are implemented as function calls into the runtime kernel).

The new structure allows me to write much more targeted unit tests. For example the new test_lower.c test suite checks the generated op-codes for a large number of small examples. This means regression tests can be written for micro-optimisations and similar tweaks that were impossible to test before.

Overall the performance is slightly better than with the previous code generator: the bigram.vhd micro-benchmark runs about 10% faster now. This test originally spent most of its time in the runtime kernel but after I optimised that it is now dominated by the to_unsigned library function so there’s room for code generation improvements there. However, I want to spend the next month or so fixing the large number of GitHub issues that have accumulated.

Filed in vhdl - Comments closed

Writes Your Makefiles For You

December 31st, 2013

I’ve been dogfooding my VHDL compiler for a project at work and now that it’s gotten to the point where it can simulate non-trivial designs, compile times are becoming significant. Especially when files use Xilinx primitives as the vcomponents library takes a while to load.

At the moment I’m re-analysing every source file for each change I make, so obviously an improvement would be to write a makefile and only re-analyse the files that have changed and those that depend on them (e.g. an architecture must be re-analysed if the corresponding entity changed). But figuring out and maintaining the dependencies by hand is tedious and error prone so I’ve written a makefile generator that recursively finds the dependencies for previously analysed or elaborated units in a library. Invoke it like this:

nvc --make my_top_level >Makefile

Then running make will do the minimum amount of work to rebuild all the out of date files. If the argument to --make is an elaborated design then two convenience targets run and wave will be added to run the simulation and run with waveform output respectively.

The code is fairly compact: only 400 or so lines of C.

This also turned out to be handy for solving a long-standing problem of not being able to bootstrap the standard and IEEE libraries with parallel make (make -j). Previously the dependencies in the automake input file were incomplete, but now these are generated automatically by a gen-deps target. The output (e.g. here) is then mangled with sed and committed into the git repository (this solves a chicken-and-egg problem where the gen-deps target can only be run in an already built tree).

VHDL Code Coverage

November 24th, 2013

I recently added a code coverage option to the VHDL compiler, nvc, I’m working on. I tend to find code coverage a really useful tool when I’m writing RTL, especially the sort of control-dominated designs I do in my day job. I find Modelsim’s HTML coverage reports a bit frustrating so I’m trying to do something more user-friendly in my simulator.

If you elaborate your design with the --cover option the generated code will be annotated to gather the following kinds of coverage:

  • Statement – A counter is added for each executable statement in the design. A statement must be executed at least once to be “covered”.
  • Branch – A branch is covered if it is both taken and not-taken at least once during execution
  • Condition – A condition here is a Boolean sub-expression of branch test and it is covered by evaluating to both TRUE and FALSE at least once. For example if A and B then contains one branch but two sub-conditions.

After a run with coverage enabled the statistics are automatically reported:

** Note: coverage report generated in /tmp/work/WORK.TEST.cover/
            282/289 statements covered
            71/94 branches covered
            85/108 conditions covered

A HTML report is then generated which contains a top-level summary and a detailed report for each source file:

coverage

You can mouse over a non-covered branch or condition to get a hint as to why it was not covered.

The implementation is currently a work-in-progress but functions well enough for light usage. The biggest limitation at the moment is that the report only contains aggregated statistics per-file rather than per-instance statistics.

Filed in vhdl - Comments closed

Improved Waveform Output

October 27th, 2013

I’ve spent a lot of time recently improving the waveform output of my VHDL compiler / simulator. Previously only the simple VCD format was supported: this only allows Verilog-style 4-value logic types to be dumped so doesn’t map very well onto VHDL types. The implementation was also very inefficient resulting in a 3-4x slowdown in simulation speed.

After experimenting with LXT for a while, NVC now uses GtkWave’s FST format by default. With some help from GtkWave’s author it can now dump full 9-value logic as well as most common VHDL types (enumerations, strings, integers, etc.).

I’ve put some work into improving the performance of waveform output and now dumping every signal to a FST incurs around 30% overhead. The VCD dumper has also been rewritten giving around 90% overhead. This format should generally be avoided unless you do not have access to GtkWave.

The screen shot below shows some of the new features:

gtkwave

VHDL compiler improvements

April 15th, 2012

A while ago I posted about a VHDL compiler I’d started writing. Well I’ve been working on it a bit during the evenings and weekends and it’s acquired several new features. Probably the most significant is that it can now compile the standard IEEE std_logic_1164 and numeric_std packages as well the Synopsys std_logic_arith and std_logic_unsigned packages. If you clone the latest version from GitHub these will be built and installed for you automatically. Note that the original IEEE sources cannot be redistributed due to copyright restrictions so you’ll have to faff about downloading them from the IEEE standards website first – see lib/ieee/README for details.

NVC also now supports a wider range of concurrent statements, including selected and conditional assignments.

This means we can rewrite the counter example from before in a more normal way:

library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
 
entity counter is
    generic ( WIDTH : integer );
    port (
        clk   : in std_logic;
        reset : in std_logic;
        count : out unsigned(WIDTH - 1 downto 0) );
end entity;
 
architecture rtl of counter is
    signal count_r : unsigned(WIDTH - 1 downto 0);
begin
    count <= count_r;
 
    process (clk) is
    begin
        if rising_edge(clk) then
            if reset = '1' then
                count_r <= (others => '0');
            else
                count_r <= count_r + 1;
            end if;
        end if;            
    end process;
end architecture;

And similarly for the top-level test bench:

library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
 
entity counter_tb is end entity;
 
architecture test of counter_tb is
    constant WIDTH : integer := 16;
    signal clk     : std_logic := '0';
    signal reset   : std_logic := '1';
    signal count   : unsigned(WIDTH - 1 downto 0);
begin
    clk   <= not clk after 5 ns;
    reset <= '0' after 10 ns;
 
    uut: entity work.counter
        generic map ( WIDTH )
        port map ( clk, reset, count );    
end architecture;

Next we have to analyse and elaborate the design:

$ nvc -a counter.vhd
$ nvc -e counter_tb
/usr/lib/llvm-3.0/bin/llvm-ld -r -b /home/nick/nvc/build/work/_WORK.COUNTER_TB.final.bc /home/nick/nvc/build/work/_WORK.COUNTER_TB.elab.bc /home/nick/share/nvc/ieee/_IEEE.NUMERIC_STD-body.bc /home/nick/share/nvc/ieee/_IEEE.STD_LOGIC_1164-body.bc

The long llvm-ld line at the end is a new stage that links together the LLVM bitcode for the elaborated design with the bitcode for any referenced packages – the IEEE standard libraries in this case. This allows LLVM’s link time optimisation to optimise across package boundaries. For example, inlining trivial functions like rising_edge directly into the process.

$ nvc -r --stop-time=1ms --stats counter_tb
** Note: setup:28ms run:104ms maxrss:17872kB

LLVM JIT compilation accounts for most of the memory usage and 28ms setup time. However this overhead should be insignificant for any long-running simulation.

Just running the above simulation is fairly boring so I’ve also started adding a basic VCD dumper. This only works for a small set of data types but includes std_logic and std_logic_vector so should hopefully be quite useful in practice.

$ nvc -r --stop-time=100ns --vcd=out.vcd counter_tb

The output can then be opened in a VCD viewer such as GTKWave.

Note that writing out a VCD will slow the simulation considerably. In the future I’d like to be able to selectively dump signals and support other formats such as GHW or LXT2.

Writing a VHDL compiler

November 5th, 2011

I haven’t posted much about any of my projects for a some time. This is because I’ve spent the past few months squirrelling away on something new: a while ago I decided a good way to learn more about VHDL would be to write a parser/checker for it. I had a vague plan of using it to write a linting tool or indexer. I’ve also wanted to learn more about LLVM so I started hacking together a code generator on the back end of the linting tool that generated LLVM IR for sequential VHDL processes. After that I thought I might as well write a simulation kernel too and it snowballed into something approximating an embryonic VHDL compiler/simulator.

The tool is currently called “nvc” which might stand for “Nick’s VHDL Compiler” or perhaps “New VHDL Compiler”. You can get it from GitHub here: github.com/nickg/nvc. See the README file for instructions on how to build and run it.

The eventual goal is full IEEE 1076-1993 support, or at least as much as possible until I get bored. Currently its capabilities are very limited. In particular it doesn’t implement enough of VHDL to compile the IEEE std_logic_1164 package which makes it almost useless for any real-world designs. The following should give an example of what it can do at the moment though.

entity counter is
    port (
        clk   : in bit;
        count : out integer );
end entity;
 
architecture behav of counter is
begin
    process (clk) is
        variable count_var : integer := 0;
    begin
        if clk'event and clk = '1' then
            count_var := count_var + 1;
            count <= count_var;
        end if;
    end process;
end architecture;

This is a very simple behavioural model of a counter that increments on rising edges of clk. We can also add a top-level entity to generate the clock signal:

entity top is end entity;
 
architecture test of top is
    signal clk   : bit := '0';
    signal count : integer := 0;
begin
    clkgen: process is
    begin
        wait for 5 ns;
        clk <= not clk;
    end process;
 
    uut: entity work.counter
        port map (
            clk   => clk,
            count => count );
 
    process (count) is
    begin
        report integer'image(count);
    end process;
end architecture;

The final process will print the value of the counter every time it changes. You may normally have written the clkgen process with a concurrent assignment:

clk <= not clk after 5 ns;

Unfortunately processes and instantiations and the only sort of concurrent statements implemented in nvc at the moment. This is not as restrictive as it sounds: apart from generate statements all others can be trivially rewritten as processes. This is in fact how they are specified in the standard.

First we need to run the analysis step which parses the code into an internal format, performs type checking, and runs some simplification passes such as constant folding. With the above code in a single file counter.vhd we can run:

$ nvc -a counter.vhd
[gc: freed 425 trees; 297 allocated]

The work library now contains the serialised abstract syntax tree for each analysed design unit:

$ ls -lh work
total 24K
-rw-r--r-- 1 nick nick    8 Nov  5 13:37 _NVC_LIB
-rw-r--r-- 1 nick nick  682 Nov  5 13:37 WORK.COUNTER
-rw-r--r-- 1 nick nick 1.6K Nov  5 13:37 WORK.COUNTER-BEHAV
-rw-r--r-- 1 nick nick   33 Nov  5 13:37 WORK.TOP
-rw-r--r-- 1 nick nick 7.6K Nov  5 13:37 WORK.TOP-TEST

Next we run the elaboration step which builds the full design hierarchy starting from a top level entity. It also runs the LLVM IR code generation step to produce a bitcode file.

$ nvc -e top

(This step produces a large amount of debug output which should be ignored.) If you look inside the work library again you should see the generated LLVM bitcode and a serialised AST for the elaborated design – this is used for debugging later.

$ ls -lhtr work | tail -2
-rw-r--r-- 1 nick nick 8.2K Nov  5 13:39 WORK.TOP.elab
-rw-r--r-- 1 nick nick 1.3K Nov  5 13:39 _WORK.TOP.elab.bc

The elaboration step runs a number of simple LLVM optimisation passes over the bitcode before writing it out but you can run more if you like using the standard LLVM opt tool.

Now we can finally simulate the model! The simplest way to do this is in batch mode as follows:

$ nvc -r --stop-time=50ns top
0ms+0: Report Note: 0
5ns+2: Report Note: 1
15ns+2: Report Note: 2
25ns+2: Report Note: 3
35ns+2: Report Note: 4
45ns+2: Report Note: 5

Normally nvc will run until there are no more events scheduled but as the clkgen process will schedule events forever we have to specify an explicit stop time. The bitcode file is transparently converted into native machine code using LLVM’s JIT engine when the simulation is loaded.

Commercial simulators usually provide an interactive debugging environment as well as a batch mode and nvc tries to emulate this. Use the -c switch to the run command to enter interactive mode:

$ nvc -r -c top
%

This mode uses TCL for command processing, which is the de-facto standard for scripting in the EDA industry. If you have readline or libedit installed you’ll also get fancy line-editing capabilities. The features here are currently pretty limited but we can see the state of all the signals in the design:

% show signals
:top(test):clk                STD.STANDARD.BIT         '0'
:top(test):count              STD.STANDARD.INTEGER     0

Note that nvc has collapsed the ports in counter with the signals in top as they are identical: this might be confusing if you’re searching for a particular name. Now we can run the simulation and verify the signal values change:

% run 45 ns
% 0ms+0: Report Note: 0
5ns+2: Report Note: 1
15ns+2: Report Note: 2
25ns+2: Report Note: 3
35ns+2: Report Note: 4
45ns+2: Report Note: 5
 
% show signals
:top(test):clk                STD.STANDARD.BIT         '1'
:top(test):count              STD.STANDARD.INTEGER     5

Notice that the run command returns straight away and the print outs happen in the background: this is because the interactive mode forks off another process to run the simulation.

A legacy of the project’s origins as a linter, the error messages try to be more helpful than most VHDL compilers:

counter.vhd:33: no suitable overload for operator 
"+"(STD.STANDARD.BIT, universal integer)
            clk <= not (clk + 1);
                        ^^^^^^^

This is about the limit of what nvc can do at the moment. I’m not sure how far I’ll develop it but it’s been good fun hacking thus far. So obviously don’t start using it for real designs or reporting bugs just yet ;-).