I’ve seen plenty of articles about this topic already, but I’d like to talk about it on my blog as well. It’s a pretty important topic in my opinion.

Let’s say you have a microcontroller peripheral that sends and/or receives a bunch of data. A good example of this type of microcontroller peripheral is a UART (universal asynchronous receiver/transmitter). That’s essentially a fancy name for a serial port–you know, the old 9-pin connectors you’d find on PCs. They were commonly used for connecting mice and modems back before PS/2, USB, and broadband took over. A UART basically does two things: receives data and transmits data. When you want to transmit data, you write a byte to a register in the UART peripheral. The UART converts the 8 bits into a serial data stream and spits it out over a single wire at the baud rate you have configured. Likewise, when it detects an incoming serial data stream on a wire, it converts the stream into a byte and gives you a chance to read it.

I’m actually not going to go into detail about UARTs yet because I feel it’s really important to understand a different concept that we will need to know when we learn about how to use UARTs. That concept is an interrupt-safe circular buffer. This lesson is dedicated to understanding this very important data structure. The next post I write will then go into more detail about UARTs, and that post will use a concept that you will learn about in this post. OK, let’s get started with circular buffers.

A common problem when you are receiving or transmitting a lot of data is that you need to buffer it up. For example, let’s say you want to transmit 50 bytes of data. You have to write the bytes one-by-one into the peripheral’s data register to send it out. Every time you write a byte, you have to wait until it has transmitted before writing the next byte. (Note: Some microcontroller peripherals have a hardware buffer so you can write up to 16 [for example] bytes before you have to wait, but that hardware buffer can still fill up, so this concept I’m describing still applies)

A typical busy loop sending the data would look like this:

char data[50]; // filled with data you're transmitting

for (int x = 0; x < 50; x++) {
    while (peripheral_is_busy());
    peripheral_send_byte(data[x]);
}

This loop is simple. For each of the 50 bytes, it first waits until the peripheral isn’t busy, then tells the peripheral to send it. You can imagine what implementations of the peripheral_is_busy() and peripheral_send_byte() functions might look like.

While you’re transmitting these 50 bytes, the rest of your program can’t run because you’re busy in this loop making sure all of the bytes are sent correctly. What a waste, especially if the data transmission rate is much slower than your microcontroller! (Typically, that will be the case.) There are so many more important tasks your microcontroller could be doing in the meantime than sitting in a loop waiting for a slow transmission to complete. The solution is to buffer the data and allow it to be sent in the background while the rest of your program does other things.

So how do you buffer the data? You create a buffer that will store data waiting to be transmitted. If the peripheral is busy, rather than waiting around for it to finish, you put your data into the buffer. When the peripheral finishes transmitting a byte, it fires an interrupt. Your interrupt handler takes the next byte from the buffer and sends it to the peripheral, then immediately returns back to your program. Your program can then continue to do other things while the peripheral is transmitting. You will periodically be interrupted to send another byte, but it will be a very short period of time — all the interrupt handler has to do is grab the next byte waiting to be transmitted and tell the peripheral to send it off. Then your program can get back to doing more important stuff. This is called interrupt-driven I/O, and it’s awesome. The original code I showed above is called polled I/O.

You can probably imagine how useful this idea is. It will make your program much more efficient. If you’re a computer science person, you’re probably thinking,  “Doug, the abstract data type you should use for your buffer is a queue!” You would be absolutely correct. A queue is a first-in, first-out (FIFO) data structure. Everything will come out of the queue in the same order it went in. There’s no cutting in line! (Well, unless you have a bug in your code, which I’m not afraid to admit has happened to me enough times that I finally bothered to write up this article so I’ll have a reference for myself.) Anyway, that’s exactly how we want it to be. If I queue up “BRITNEYSPEARS” I don’t want it to be transmitted as “PRESBYTERIANS”. (Yes, amazingly enough, that is an anagram.)

A really easy to way to implement a queue is by creating a ring buffer, also called a circular buffer or a circular queue. It’s a regular old array, but when you reach the end of the array, you wrap back around to the beginning. You keep two indexes: head and tail. The head is updated when an item is inserted into the queue, and it is the index of the next free location in the ring buffer. The tail is updated when an item is removed from the queue, and it is the index of the next item available for reading from the buffer. When the head and tail are the same, the buffer is empty. As you add things to the buffer, the head index increases. If the head wraps all the way back around to the point where it’s right behind the tail, the buffer is considered full and there is no room to add any more items until something is removed. As items are removed, the tail index increases until it reaches the head and it’s empty again. The head and tail endlessly follow this circular pattern–the tail is always trying to catch up with the head–and it will catch up, unless you’re constantly transmitting new data so quickly that the tail is always busy chasing the head.

Anyway, we’ve determined that you need three things:

  1. An array
  2. A head
  3. A tail

These will all be accessed by both the main loop and the interrupt handler, so they should all be declared as volatile. Also, updates to the head and updates to the tail each need to be an atomic operation, so they should be the native size of your architecture. For example, if you’re on an 8-bit processor like an AVR, it should be a uint8_t (which also means the maximum possible size of the queue is 256 items). On a 16-bit processor it can be a uint16_t, and so on. Let’s assume we’re on an 8-bit processor here, so ring_pos_t in the code below is defined to be a uint8_t.

#define RING_SIZE   64
typedef uint8_t ring_pos_t;
volatile ring_pos_t ring_head;
volatile ring_pos_t ring_tail;
volatile char ring_data[RING_SIZE];

One final thing before I give you code: it’s a really good idea to use a power of two for your ring size (16, 32, 64, 128, etc.). The reason for this is because the wrapping operation (where index 63 wraps back around to index 0, for example) is much quicker if it’s a power of two. I’ll explain why. Normally a programmer would use the modulo (%) operator to do the wrapping. For example:

ring_tail = (ring_tail + 1) % 64;

If your tail began at 60 and you repeated this line above multiple times, the tail would do the following:

61 -> 62 -> 63 -> 0 -> 1 -> …

That works perfectly, but the problem with this approach is that modulo is pretty slow because it’s a divide operation. Division is a pretty slow operation on computers. It turns out when you have a power of two, you can do the equivalent of a modulo by doing a bitwise AND, which is a much quicker operation. It works because if you take a power of two and subtract one, you get a number which can be represented in binary as a string of all 1 bits. In the case of a queue of size 64, bitwise ANDing the head or tail with 63 will always keep the index between 0 and 63. So you can do the wrap-around like so:

ring_tail = (ring_tail + 1) & 63;

A good compiler will automatically convert the modulo to the faster AND operation if it’s a power of two, so you can just use my first example with the “% 64” since it makes the intent of the code clearer. I’ve been told I have “a lot of faith in compilers” but it’s true–if you compile with optimizations enabled, GCC will correctly optimize my first example into the assembly equivalent of my second example. Looking at the assembly code that your compiler generates is a very valuable tool for you to have available.

OK. Now that I’ve explained everything, here are my add() and remove() functions for the queue:

int add(char c) {
    ring_pos_t next_head = (ring_head + 1) % RING_SIZE;
    if (next_head != ring_tail) {
        /* there is room */
        ring_data[ring_head] = c;
        ring_head = next_head;
        return 0;
    } else {
        /* no room left in the buffer */
        return -1;
    }
}

int remove(void) {
    if (ring_head != ring_tail) {
        int c = ring_data[ring_tail];
        ring_tail = (ring_tail + 1) % RING_SIZE;
        return c;
    } else {
        return -1;
    }
}

The add function calculates the next position for the head, ensures that there’s room in the buffer, writes the character to the end of the queue, and finally updates the head. The remove function ensures the buffer isn’t empty, grabs the first character waiting to be read out of the queue, and finally updates the tail. The code is pretty straightforward, but there are a couple of details you should be aware of:

  • I only modify the head index in the add function, and I only modify the tail index in the remove function.
  • I only modify the index after reading/writing the data in the buffer.

By doing both of the above, I have successfully ensured that this is an interrupt-safe, lock-free ring buffer (if there is a single consumer and a single producer). What I mean is you can add to the queue from an interrupt handler and remove from the queue in your main loop (or vice-versa), and you don’t have to worry about interrupt safety in your main loop. This is a really cool thing! It means that you don’t have to temporarily disable interrupts in order to update the buffer from your main loop. It’s all because of the two bullets above. If the interrupt only modifies the head, and the main loop only modifies the tail, there’s no conflict between the two of them that would require disabling interrupts. As for my second bullet point, updating the head or tail index only after the read or write is only necessary in whatever side of the queue is working in the main loop — it doesn’t matter in the interrupt handler. But if I follow that convention in both the add() and remove() functions, they can be swapped around as needed — in one program add() could be used in an interrupt handler and remove() could be used in the main loop, but in a different program, remove() would be in the interrupt handler and add() would be in the main loop.

OK, so back to the original example of transmitting 50 bytes. In this case, the main loop will use the add() function. You will add 50 bytes to the queue by calling add() 50 times. In the meantime, the microcontroller peripheral will interrupt every time it successfully sends a byte. The interrupt handler will call remove(), tell the peripheral to transmit the character it just removed, and give control back to the main loop. Later on, the transmission of that character will complete, the interrupt will occur again, it will send the next character, and so on until all 50 bytes have been sent, at which point the queue will be empty. It works like a charm!

The only problem remaining is that the first time you need to send a character, the transmitter isn’t running yet. So you have to “prime the pump” by telling the transmitter to start. Sometimes a peripheral will have a way to induce the first interrupt to get the ball rolling. Otherwise, you may have to add some extra logic that forces the first transmission to occur from the main loop. Since this technically breaks the “single consumer and single producer” rule, this one exception to the rule may still require some careful interrupt safety. I’ll talk more about that in a later post. The main purpose of this post is just to get you familiar with ring buffers and their inherent interrupt safety (aside from that one “gotcha”). They are very, very useful and very, very common. I’ve used them in drivers for UARTs and CANbus, and as I understand it, they are very common in audio applications as well.

I hope this made some sense. If it didn’t, I hope my next posting about UARTs will help clear things up.

Note: The information I have given may not be completely correct if you’re dealing with a system that has more than one processor or core, or other weird stuff like that. In that case you may need something extra such as a memory barrier between writing the queue data and updating the index. That’s beyond the scope of this article, which is targeted toward microcontrollers. This technique should work fine on microcontrollers with a single processor core.

These SIMMs (and programmers) are available for purchase here.

As you may have seen from my previous posts, I have developed a programmable ROM SIMM for older Macintosh computers and a programmer to go with it. Everything from the hardware design to the firmware is completely open source. The combination is pretty nifty and enables people to do all kinds of crazy things that previously weren’t possible.

I always figured that the 2 MB size of the SIMM was plenty — I never really envisioned using the custom ROM SIMM for anything other than customizing the startup chime. However, bbraun came up with an even better idea — a ROM disk driver! He has patched the IIsi ROM (which should be compatible with the SE/30, IIx, IIcx, IIci, IIfx, and of course the IIsi itself) to enable booting from a ROM disk that uses up the remaining space on the SIMM and also has some other tweaks like disabling the slow RAM test that the previously-listed machines perform on every cold boot. It works perfectly and allows you to create a 1.5 MB bootable ROM disk to append to the 512 KB IIsi ROM. It boots up extremely quickly!

After he came up with that brilliant idea, I figured I should maximize the potential size of the disk image. The 64-pin ROM SIMM slot has 23 address pins labeled A0 through A22. However, A0 and A1 are unused due to the fact that the data bus is four bytes wide, so there are really only 21 usable address pins. That provides for a total of 221 = 2,097,152 = 2*1024*1024 usable addresses. Each address points to 4 bytes of data, which provides for a total maximum ROM SIMM size of 8 MB. After taking into account the 512 KB ROM size, that would leave 7.5 MB of usable space for a ROM disk.

You can probably see where I’m going with this–I designed a bigger ROM SIMM!

First of all, I had to find NOR flash chips that ran on 5V and were big enough. It turns out that 5V chips are becoming harder and harder to find, especially in larger sizes. Everyone is moving toward 3.3V and lower. After a ton of searching, I settled on the Micron M29F160FB5AN6E2, a 5V 16-megabit chip that can be accessed in parallel using either a 16-bit data bus or an 8-bit data bus. It’s designed with automotive applications in mind, which probably explains why it still exists as a 5V chip. 16 megabits (Mb) is equal to 2 megabytes (MB). In order to reach the full 8 MB size, I need four of them per SIMM. I use them in 8-bit mode so each of the four chips provides one of the four bytes per address, just like how I designed the 2 MB version of the SIMM.

The chips are available in a fine-pitch TSOP package, which pretty much eliminates the possibility of socketing them. These chips really need to be soldered down, so this new SIMM requires my programmer board in order to put a disk image onto it. Here is the final product:

8MBSIMM

Quickly after I finished the new SIMM, bbraun updated his driver to work with a 7.5 MB ROM disk. So now it is possible to create a 7.5 MB ROM disk and boot from it. If you’re interested in buying one of the SIMMs and programmers (believe me, I have plenty to go around), you can purchase them here. As always, the 8 MB ROM SIMM is completely open source so you can make one yourself if you’d rather do it that way.

Some newer machines like Quadras have pads on their motherboards for these SIMMs–you just have to solder on your own socket. Unfortunately, it’s looking like they are hardwired to only address 1 MB of ROM space. Unless someone discovers a way to get around that, those machines will probably not be able to take advantage of the ROM disk. So for now, this ROM disk is restricted to the SE/30, IIx, IIcx, IIci, IIfx, and IIsi.