Improving timing for FIFO by adding registers

Every now and then we find that the design's FIFO signals in the critical paths. There are several reasons why a FIFO may be involved:
  • The FIFO's rd_en signal is used in combinatorical logic within the FIFO, which calculates the next read address, which may be time-consuming.
  • The driver of the FIFO's rd_en is commonly a not-so-trivial combinatoric function in our own design. Many times this function consists of state registers and several flags, coming from different (physical) parts of our design, resulting in a difficult timing problem.
  • There is an inherent delay between the rd_clk and the actual appearance of data at the FIFO's output, which (usually) results from the RAM's delay from clock to output. This creates an unfavorable start for the timing budget, which may turn into the critical path, if the FIFO's output is used further in combinatorical logic (such as determining the next state of some state machine).
  • Physical distance between the FIFO and its data sink: On large devices, the data source and data sink may be physically far away, causing a placement problem.
Problems may also occur in the write-related logic, but the solution is pretty trivial: Delay both data (typically "din") and the write enable signal (typically "wr_en") by the same number of write clocks ("wr_clk"?) with registers, and you're done. Ah, of course, you may want to modify your "FIFO is soon full" watermark ("almost_full" or "prog_full") to enlarge the safety margin, since you're going to keep on writing a few extra clock cylces after it goes high.

So, let's stick to the real problem: We want the rd_en signal to be sampled by a flip-flop in the FIFO as soon as possible, and we want a low clock-to-output on the data lines. It's clear that the answer is adding registers, but there's some logic to add as well...

The Verilog code for a FIFO wrapper is given below, and is released to the public domain. This means that I don't claim any copyrights for it.

How to use it: This is a functional drop-in wrapper for a typical FIFO (such as the one generated by Xilinx' Fifo Generator). This means that if you wrap your existing FIFO with this module, your design will functionally "do the same", with possibly better static timing. But you can not expect identical simulations, because the data appears slightly later on the FIFO's output. A non-buggy and fairly sensible design should not care about such a difference.

Assuming that you have a non-FWFT FIFO to begin with, with a pretty standard read interface, you should instantiate it in the module below as "orig_fifo", and set the "width" parameter to the number of data bits you have. If you need some other control signal, pass them though with care. For example, "almost empty" may ring the bell slightly too soon. Signals regarding the write path can be passed on as is, since the write path is not touched by this wrapper.

The term "FWFT" is explained below, in case you're not familiar with it.

So here's the module. It's explained in detail after the listing.


module reg_fifo(rst, 
                rd_clk, rd_en, dout, empty,                 
                wr_clk, wr_en, din, full, prog_full);
   
   // Eli Billauer, 12.4.06
   // This module is released to the public domain; Any use is allowed.

   parameter width = 8;
   
   input                 rst;
   input                 rd_clk;
   input                 rd_en;
   input                 wr_clk;
   input                 wr_en;
   input [(width-1):0]   din;
   output                empty;
   output                full;
   output                prog_full;
   output [(width-1):0]  dout;

   reg                   fifo_valid, middle_valid;
   reg [(width-1):0]     dout, middle_dout;

   wire [(width-1):0]    fifo_dout;
   wire                  fifo_empty, fifo_rd_en;
   wire                  will_update_middle, will_update_dout;

   // orig_fifo is just a normal (non-FWFT) synchronous or asynchronous FIFO
   fifo orig_fifo
      (
       .rst(rst),       
       .rd_clk(rd_clk),
       .rd_en(fifo_rd_en),
       .dout(fifo_dout),
       .empty(fifo_empty),
       .wr_clk(wr_clk),
       .wr_en(wr_en),
       .din(din),
       .full(full),
       .prog_full(prog_full)
       );

   assign will_update_middle = fifo_valid && (middle_valid == will_update_dout);
   assign will_update_dout = (middle_valid || fifo_valid) && rd_en;
   assign fifo_rd_en = (!fifo_empty) && !(middle_valid && fifo_valid);
   assign empty = !(fifo_valid || middle_valid);

   always @(posedge rd_clk)
      if (rst)
         begin
            fifo_valid <= 0;
            middle_valid <= 0;
            dout <= 0;
            middle_dout <= 0;
         end
      else
         begin
            if (will_update_middle)
               middle_dout <= fifo_dout;
            
            if (will_update_dout)
               dout <= middle_valid ? middle_dout : fifo_dout;
            
            if (fifo_rd_en)
               fifo_valid <= 1;
            else if (will_update_middle || will_update_dout)
               fifo_valid <= 0;
            
            if (will_update_middle)
               middle_valid <= 1;
            else if (will_update_dout)
               middle_valid <= 0;
         end 
endmodule

OK, let's see how this thing works.

The first thing to realize, is that if we want that the FIFO's rd_en will be detached from the original rd_en, two registers are necessary. Consider the following drawing:

Register flow

The FIFO's output is marked as fifo_dout. This modules output is dout. Now suppose that we didn't have the middle register. We could activate the original FIFO's rd_en so that the first data entry appears on fifo_dout, and hold our "empty" signal low, telling the application to come and get data. But whenever the application came to get some, it would raise the rd_en. The wrapper would then sample fifo_dout into dout, and all is fine so far.

But now what? Since we don't want the FIFO's fifo_rd_en to combinatorically depend on the application's rd_en, there is no possibility to update fifo_dout during the same clock cycle, so we've run out of data. A direct response to rd_en is therefore to raise the "empty" flag, raise the FIFO's fifo_rd_en, and wait a clock. Only then can we clear the "empty" flag again.

This means that the wrapped FIFO works in half rate. And if we're ready for a FIFO to work in half rate, there are simpler solutions.

So the solution is to have another register, middle_dout. The FIFO wrapper always tries to keep both fifo_dout and middle_dout with data. Just to get an idea of how this works, let's consider the case of a well-filled FIFO and that we're after a long period with no reads. That means that fifo_dout and middle_dout have valid data. At this point the application's rd_en is raised. dout samples middle_dout. The "empty" flag is kept low, since data can still be sampled from fifo_dout. If rd_en is kept high, data will continously be sampled from fifo_dout to dout.

But all this requires the FIFO's fifo_rd_en to be correctly driven. We shall now see how it's all done.

Let's get acquainted with a few internal signals in the Verilog code above. We'll start with will_update_middle and will_update_dout:

  if (will_update_middle)
    middle_dout <= fifo_dout;
            
  if (will_update_dout)
    dout <= middle_valid ? middle_dout : fifo_dout;

Evidently from the Verilog code, the middle_dout and dout registers (respectively) sample something when these signals are high. middle_dout samples fifo_dout (it doesn't have much choice), but dout will sample middle_dout if middle_valid is high, otherwise it goes for fifo_dout.

The two *_valid flags tell us whether the respective register containts valid data.

 if (fifo_rd_en)
   fifo_valid <= 1;
 else if (will_update_middle || will_update_dout)
   fifo_valid <= 0;
            
 if (will_update_middle)
   middle_valid <= 1;
 else if (will_update_dout)
   middle_valid <= 0;

When fifo_rd_en is high during a clock rise, fifo_valid goes high as well: If we just pulled data from the FIFO, its current output is considered valid. But if fifo_rd_en was low, and data was sampled by one of middle_dout or dout, we don't consider fifo_dout valid anymore, since its data has been consumed.

middle_valid goes by the same logic: When will_update_middle is high, data is sampled into middle_dout, so middle_valid goes high as well. If not, it will go low when dout samples data from it.

Now let's turn to how the FIFO's fifo_rd_en signal is generated:

 assign fifo_rd_en = (!fifo_empty) && !(middle_valid && fifo_valid);

We will read from the FIFO if it has data (i.e. not fifo_empty) and if any of fifo_dout and/or middle_dout is not valid.

Note that fifo_rd_en driven by the state machine's urge to prepare data for reading, and not the application's reading. As a result, fifo_rd_en is derived from a three-input function, which depends on two flip-flops (fifo_valid and middle_valid) and the FIFO's own fifo_empty signal. On a Xilinx FPGA, this is done with a single LUT, which is as fast as you can get for any flow-controlled read from a FIFO.

The "empty" flag is generated from a function of these two flip-flops as well, so it's very fast as well.

 assign empty = !(fifo_valid || middle_valid);

We are now left with the generation of the will_update_* signals:

 assign will_update_middle = fifo_valid && (middle_valid == will_update_dout);
 assign will_update_dout = (middle_valid || fifo_valid) && rd_en;

will_update_dout essentially says (!empty && rd_en). Since we mimic a FIFO, it's pretty natural that the output is updated when it's not empty, and data is requested by rd_en.

will_update_middle is trickier: It requires fifo_valid (otherwise there is nothing to sample) and then we have this (middle_valid == will_update_dout). This is actually a shortcut for saying the following:

  • If middle_valid is low, then we need to update. But we'll do that only if will_update_dout is low as well, because if both dout and middle_dout "want" to update, dout has precedence.
  • If middle_valid is high, we don't need to update. But if will_update_dout is high and middle_valid is high, it means that dout is going to sample from middle_dout. Not updating now means becoming invalid, and since fifo_dout is waiting with data, why hold it?

The combinatoric functions which generate will_update_middle, will_update_dout and the next values of fifo_valid and middle_valid may seem complicated, but they eventually depend only fifo_empty, rd_en and two registers: fifo_valid and middle_valid. A fairly clever synthesizer should detect this, and create a single logic level on these important paths on a Xilinx FPGA architecture.

A FWFT (First Word Fall Through) FIFO conversion

In a "classic" FIFO, a deasserted "empty" flag means that valid data will be presented on the FIFO's outputs after rd_en is sampled high on clock. A FWFT FIFO presents the data on the outputs as soon as it's available, so a deasserted "empty" flag means that the data on the outputs is valid.

The meaning of rd_en could therefore be interpreted somewhat differently: For a "classic" FIFO it means "bring me data". On a FWFT it's something like "I just grabbed the data, bring me the next one if you have it".

We shall now see a module, which uses a "classic" FIFO to implement a FWFT. Note that it does not improve timing: The FIFO's dout is passed through, and the FIFO's rd_en depends on the one coming from the application, so its timing is slightly worse than a the bare FIFO it uses.

As a matter of fact, this wrapper merely fiddles with the rd_en and empty signals.

If timing improvement is necessary, see the end of this page.


module basic_fwft_fifo(rst, 
                       rd_clk, rd_en, dout, empty,                 
                       wr_clk, wr_en, din, full, prog_full);

   // Eli Billauer, 12.4.06
   // This module is released to the public domain; Any use is allowed.

   parameter width = 8;
   
   input                 rst;
   input                 rd_clk;
   input                 rd_en;
   input                 wr_clk;
   input                 wr_en;
   input [(width-1):0]   din;
   output                empty;
   output                full;
   output                prog_full;
   output [(width-1):0]  dout;

   reg                   dout_valid;
   wire                  fifo_rd_en, fifo_empty;

   // orig_fifo is just a normal (non-FWFT) synchronous or asynchronous FIFO
   fifo orig_fifo
      (
       .rst(rst),       
       .rd_clk(rd_clk),
       .rd_en(fifo_rd_en),
       .dout(dout),
       .empty(fifo_empty),
       .wr_clk(wr_clk),
       .wr_en(wr_en),
       .din(din),
       .full(full),
       .prog_full(prog_full)
       );

   assign fifo_rd_en = !fifo_empty && (!dout_valid || rd_en);
   assign empty = !dout_valid;

   always @(posedge rd_clk)
      if (rst)
         dout_valid <= 0;
      else
         begin
            if (fifo_rd_en)
               dout_valid <= 1;
            else if (rd_en)
               dout_valid <= 0;
         end 
endmodule

A FWFT with improved timing

We shall finish with a combination of the two: A FWFT module with extra registers for improved timing. It is based on the first module of this page, with some changes that makes it a FWFT FIFO. Only few changes are required: To make dout sample data not only on rd_en, but also when it's invalid. Also, the "empty" flag (which is a misnomer for a FWFT) should be active when dout is not valid, rather than depending on the validity of other registers.

The module for a conversion from a non-FWFT FIFO to a FWFT FIFO, including the timing advantages, is given below.


module fwft_fifo(rst, 
                 rd_clk, rd_en, dout, empty,                 
                 wr_clk, wr_en, din, full, prog_full);

   parameter width = 8;
   
   input                 rst;
   input                 rd_clk;
   input                 rd_en;
   input                 wr_clk;
   input                 wr_en;
   input [(width-1):0]   din;
   output                empty;
   output                full;
   output                prog_full;
   output [(width-1):0]  dout;

   reg                   fifo_valid, middle_valid, dout_valid;
   reg [(width-1):0]     dout, middle_dout;

   wire [(width-1):0]    fifo_dout;
   wire                  fifo_empty, fifo_rd_en;
   wire                  will_update_middle, will_update_dout;

   // orig_fifo is just a normal (non-FWFT) synchronous or asynchronous FIFO
   fifo orig_fifo
      (
       .rst(rst),       
       .rd_clk(rd_clk),
       .rd_en(fifo_rd_en),
       .dout(fifo_dout),
       .empty(fifo_empty),
       .wr_clk(wr_clk),
       .wr_en(wr_en),
       .din(din),
       .full(full),
       .prog_full(prog_full)
       );

   assign will_update_middle = fifo_valid && (middle_valid == will_update_dout);
   assign will_update_dout = (middle_valid || fifo_valid) && (rd_en || !dout_valid);
   assign fifo_rd_en = (!fifo_empty) && !(middle_valid && dout_valid && fifo_valid);
   assign empty = !dout_valid;

   always @(posedge rd_clk)
      if (rst)
         begin
            fifo_valid <= 0;
            middle_valid <= 0;
            dout_valid <= 0;
            dout <= 0;
            middle_dout <= 0;
         end
      else
         begin
            if (will_update_middle)
               middle_dout <= fifo_dout;
            
            if (will_update_dout)
               dout <= middle_valid ? middle_dout : fifo_dout;
            
            if (fifo_rd_en)
               fifo_valid <= 1;
            else if (will_update_middle || will_update_dout)
               fifo_valid <= 0;
            
            if (will_update_middle)
               middle_valid <= 1;
            else if (will_update_dout)
               middle_valid <= 0;
            
            if (will_update_dout)
               dout_valid <= 1;
            else if (rd_en)
               dout_valid <= 0;
         end 
endmodule
Last modified on Thu May 17 17:30:00 2012. E-mail: